51
|
Outhred AC, Gurjav U, Jelfs P, McCallum N, Wang Q, Hill-Cawthorne GA, Marais BJ, Sintchenko V. Extensive Homoplasy but No Evidence of Convergent Evolution of Repeat Numbers at MIRU Loci in Modern Mycobacterium tuberculosis Lineages. Front Public Health 2020; 8:455. [PMID: 32974265 PMCID: PMC7481465 DOI: 10.3389/fpubh.2020.00455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2019] [Accepted: 07/22/2020] [Indexed: 11/13/2022] Open
Abstract
More human deaths have been attributable to Mycobacterium tuberculosis than any other pathogen, and the epidemic is sustained by ongoing transmission. Various typing schemes have been developed to identify strain-specific differences and track transmission dynamics in affected communities, with recent introduction of whole genome sequencing providing the most accurate assessment. Mycobacterial interspersed repetitive unit (MIRU) typing is a family of variable number tandem repeat schemes that have been widely used to study the molecular epidemiology of M. tuberculosis. MIRU typing was used in most well-resourced settings to perform routine molecular epidemiology. Instances of MIRU homoplasy have been observed in comparison with sequence-based phylogenies, limiting its discriminatory value. A fundamental question is whether the observed homoplasy arises purely through stochastic processes, or whether there is evidence of natural selection. We compared repeat numbers at 24 MIRU loci with a whole genome sequence-based phylogeny of 245 isolates representing three modern M. tuberculosis lineages. This analysis demonstrated extensive homoplasy of repeat numbers, but did not detect any evidence of natural selection of repeat numbers, at least since the ancestral branching of the three modern lineages of M. tuberculosis. In addition, we observed good sensitivity but poor specificity and positive predictive values of MIRU-24 to detect clusters of recent transmission, as defined by whole-genome single nucleotide polymorphism analysis. These findings provide mechanistic insight, and support a transition away from VNTR-based typing toward sequence-based typing schemes for both research and public health purposes.
Collapse
Affiliation(s)
- Alexander C. Outhred
- Marie Bashir Institute for Infectious Diseases and Biosecurity, The University of Sydney, Sydney, NSW, Australia
- Children's Hospital at Westmead, Sydney, NSW, Australia
- Center for Infectious Diseases and Microbiology—Public Health, Westmead Hospital, Sydney, NSW, Australia
| | - Ulziijargal Gurjav
- Department of Microbiology and Immunology, Mongolian National University of Medical Sciences, Ulaanbaatar, Mongolia
| | - Peter Jelfs
- Center for Infectious Diseases and Microbiology—Public Health, Westmead Hospital, Sydney, NSW, Australia
- NSW Mycobacterium Reference Laboratory, Center for Infectious Diseases and Microbiology Laboratory Services, Institute of Clinical Pathology and Medical Research—NSW Health Pathology, Sydney, NSW, Australia
| | - Nadine McCallum
- Deep Seq Lab, Queen's Medical Center, University of Nottingham, Nottingham, United Kingdom
| | - Qinning Wang
- Center for Infectious Diseases and Microbiology—Public Health, Westmead Hospital, Sydney, NSW, Australia
| | - Grant A. Hill-Cawthorne
- Marie Bashir Institute for Infectious Diseases and Biosecurity, The University of Sydney, Sydney, NSW, Australia
- School of Public Health, University of Sydney, Sydney, NSW, Australia
| | - Ben J. Marais
- Marie Bashir Institute for Infectious Diseases and Biosecurity, The University of Sydney, Sydney, NSW, Australia
- Children's Hospital at Westmead, Sydney, NSW, Australia
| | - Vitali Sintchenko
- Marie Bashir Institute for Infectious Diseases and Biosecurity, The University of Sydney, Sydney, NSW, Australia
- Center for Infectious Diseases and Microbiology—Public Health, Westmead Hospital, Sydney, NSW, Australia
- NSW Mycobacterium Reference Laboratory, Center for Infectious Diseases and Microbiology Laboratory Services, Institute of Clinical Pathology and Medical Research—NSW Health Pathology, Sydney, NSW, Australia
| |
Collapse
|
52
|
Abstract
A unique environment at Borup Fiord Pass is characterized by a sulfur-enriched glacial ecosystem in the low-temperature Canadian High Arctic. BFP represents one of the best terrestrial analog sites for studying icy, sulfur-rich worlds outside our own, such as Europa and Mars. The site also allows investigation of sulfur-based microbial metabolisms in cold environments here on Earth. Here, we report whole-genome sequencing data that suggest that sulfur cycling metabolisms at BFP are more widely used across bacterial taxa than predicted. From our analyses, the metabolic capability of sulfur oxidation among multiple community members appears likely due to functional redundancy present in their genomes. Functional redundancy, with respect to sulfur-oxidation at the BFP sulfur-ice environment, may indicate that this dynamic ecosystem hosts microorganisms that are able to use multiple sulfur electron donors alongside other metabolic pathways, including those for carbon and nitrogen. Biological sulfur cycling in polar, low-temperature ecosystems is an understudied phenomenon in part due to difficulty of access and the dynamic nature of glacial environments. One such environment where sulfur cycling is known to play an important role in microbial metabolisms is located at Borup Fiord Pass (BFP) in the Canadian High Arctic. Here, transient springs emerge from ice near the terminus of a glacier, creating a large area of proglacial aufeis (spring-derived ice) that is often covered in bright yellow/white sulfur, sulfate, and carbonate mineral precipitates accompanied by a strong odor of hydrogen sulfide. Metagenomic sequencing of samples from multiple sites and of various sample types across the BFP glacial system produced 31 metagenome-assembled genomes (MAGs) that were queried for sulfur, nitrogen, and carbon cycling/metabolism genes. An abundance of sulfur cycling genes was widespread across the isolated MAGs and sample metagenomes taxonomically associated with the bacterial classes Alphaproteobacteria and Gammaproteobacteria and Campylobacteria (formerly the Epsilonproteobacteria). This corroborates previous research from BFP implicating Campylobacteria as the primary class responsible for sulfur oxidation; however, data reported here suggested putative sulfur oxidation by organisms in both the alphaproteobacterial and gammaproteobacterial classes that was not predicted by previous work. These findings indicate that in low-temperature, sulfur-based environments, functional redundancy may be a key mechanism that microorganisms use to enable coexistence whenever energy is limited and/or focused by redox chemistry. IMPORTANCE A unique environment at Borup Fiord Pass is characterized by a sulfur-enriched glacial ecosystem in the low-temperature Canadian High Arctic. BFP represents one of the best terrestrial analog sites for studying icy, sulfur-rich worlds outside our own, such as Europa and Mars. The site also allows investigation of sulfur-based microbial metabolisms in cold environments here on Earth. Here, we report whole-genome sequencing data that suggest that sulfur cycling metabolisms at BFP are more widely used across bacterial taxa than predicted. From our analyses, the metabolic capability of sulfur oxidation among multiple community members appears likely due to functional redundancy present in their genomes. Functional redundancy, with respect to sulfur-oxidation at the BFP sulfur-ice environment, may indicate that this dynamic ecosystem hosts microorganisms that are able to use multiple sulfur electron donors alongside other metabolic pathways, including those for carbon and nitrogen.
Collapse
|
53
|
Jarett JK, Džunková M, Schulz F, Roux S, Paez-Espino D, Eloe-Fadrosh E, Jungbluth SP, Ivanova N, Spear JR, Carr SA, Trivedi CB, Corsetti FA, Johnson HA, Becraft E, Kyrpides N, Stepanauskas R, Woyke T. Insights into the dynamics between viruses and their hosts in a hot spring microbial mat. ISME JOURNAL 2020; 14:2527-2541. [PMID: 32661357 PMCID: PMC7490370 DOI: 10.1038/s41396-020-0705-4] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Revised: 06/03/2020] [Accepted: 06/11/2020] [Indexed: 12/28/2022]
Abstract
Our current knowledge of host-virus interactions in biofilms is limited to computational predictions based on laboratory experiments with a small number of cultured bacteria. However, natural biofilms are diverse and chiefly composed of uncultured bacteria and archaea with no viral infection patterns and lifestyle predictions described to date. Herein, we predict the first DNA sequence-based host-virus interactions in a natural biofilm. Using single-cell genomics and metagenomics applied to a hot spring mat of the Cone Pool in Mono County, California, we provide insights into virus-host range, lifestyle and distribution across different mat layers. Thirty-four out of 130 single cells contained at least one viral contig (26%), which, together with the metagenome-assembled genomes, resulted in detection of 59 viruses linked to 34 host species. Analysis of single-cell amplification kinetics revealed a lack of active viral replication on the single-cell level. These findings were further supported by mapping metagenomic reads from different mat layers to the obtained host-virus pairs, which indicated a low copy number of viral genomes compared to their hosts. Lastly, the metagenomic data revealed high layer specificity of viruses, suggesting limited diffusion to other mat layers. Taken together, these observations indicate that in low mobility environments with high microbial abundance, lysogeny is the predominant viral lifestyle, in line with the previously proposed "Piggyback-the-Winner" theory.
Collapse
Affiliation(s)
- Jessica K Jarett
- Department of Energy Joint Genome Institute, Berkeley, CA, USA.,Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,AnimalBiome, Oakland, CA, USA
| | - Mária Džunková
- Department of Energy Joint Genome Institute, Berkeley, CA, USA. .,Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| | - Frederik Schulz
- Department of Energy Joint Genome Institute, Berkeley, CA, USA.,Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Simon Roux
- Department of Energy Joint Genome Institute, Berkeley, CA, USA.,Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - David Paez-Espino
- Department of Energy Joint Genome Institute, Berkeley, CA, USA.,Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Emiley Eloe-Fadrosh
- Department of Energy Joint Genome Institute, Berkeley, CA, USA.,Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Sean P Jungbluth
- Department of Energy Joint Genome Institute, Berkeley, CA, USA.,Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Natalia Ivanova
- Department of Energy Joint Genome Institute, Berkeley, CA, USA.,Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - John R Spear
- Civil and Environmental Engineering, Colorado School of Mines, Golden, CO, USA
| | | | | | | | - Hope A Johnson
- California State University Fullerton, Fullerton, CA, USA
| | - Eric Becraft
- University of North Alabama, Florence, AL, USA.,Bigelow Laboratory for Ocean Sciences, East Boothbay, ME, USA
| | - Nikos Kyrpides
- Department of Energy Joint Genome Institute, Berkeley, CA, USA.,Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | | | - Tanja Woyke
- Department of Energy Joint Genome Institute, Berkeley, CA, USA. .,Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA. .,University of California, Merced, CA, USA.
| |
Collapse
|
54
|
Yan J, Li J, Bai W, Yu L, Nie D, Xiang Z, Wu S. The complete chloroplast genome of Prunus conradinae (Rosaceae), a wild flowering cherry from China. Mitochondrial DNA B Resour 2020; 5:2153-2154. [PMID: 33457763 PMCID: PMC7782140 DOI: 10.1080/23802359.2020.1768934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022] Open
Abstract
Prunus conradinae is a flowering cherry species with high ornamental value. In this study, the complete chloroplast (cp) genome of P. conradinae was obtained using a genome skimming approach. The cp genome was 158,019 bp long, with a large single-copy region of 85,910 bp and a small single-copy region of 19,247 bp separated by two inverted repeats of 26,431 bp. It encodes 130 genes, including 85 protein-coding genes, 37 tRNA genes, and eight ribosomal RNA genes. The phylogenetic analysis indicated that P. conradinae is closely related to the congeners P. maximowiczii, P. takesimensis, P. speciosa, P. serrulata var. spontanea, P. discoidea, and P. matuurai.
Collapse
Affiliation(s)
- Jiawen Yan
- Institute of Economic Botany, Hunan Forest Botanical Garden, Changsha, China
| | - Jianhui Li
- Institute of Economic Botany, Hunan Forest Botanical Garden, Changsha, China
| | - Wenfu Bai
- Institute of Economic Botany, Hunan Forest Botanical Garden, Changsha, China
| | - Lin Yu
- Institute of Economic Botany, Hunan Forest Botanical Garden, Changsha, China
| | - Dongling Nie
- Institute of Economic Botany, Hunan Forest Botanical Garden, Changsha, China
| | - Zuheng Xiang
- Forestry Bureau of Longshan County, Longshan, China
| | - Sizheng Wu
- Institute of Economic Botany, Hunan Forest Botanical Garden, Changsha, China
| |
Collapse
|
55
|
Mason AS, Lund AR, Hocking PM, Fulton JE, Burt DW. Identification and characterisation of endogenous Avian Leukosis Virus subgroup E (ALVE) insertions in chicken whole genome sequencing data. Mob DNA 2020; 11:22. [PMID: 32617122 PMCID: PMC7325683 DOI: 10.1186/s13100-020-00216-w] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 06/17/2020] [Indexed: 12/12/2022] Open
Abstract
Background Endogenous retroviruses (ERVs) are the remnants of retroviral infections which can elicit prolonged genomic and immunological stress on their host organism. In chickens, endogenous Avian Leukosis Virus subgroup E (ALVE) expression has been associated with reductions in muscle growth rate and egg production, as well as providing the potential for novel recombinant viruses. However, ALVEs can remain in commercial stock due to their incomplete identification and association with desirable traits, such as ALVE21 and slow feathering. The availability of whole genome sequencing (WGS) data facilitates high-throughput identification and characterisation of these retroviral remnants. Results We have developed obsERVer, a new bioinformatic ERV identification pipeline which can identify ALVEs in WGS data without further sequencing. With this pipeline, 20 ALVEs were identified across eight elite layer lines from Hy-Line International, including four novel integrations and characterisation of a fast feathered phenotypic revertant that still contained ALVE21. These bioinformatically detected sites were subsequently validated using new high-throughput KASP assays, which showed that obsERVer was highly precise and exhibited a 0% false discovery rate. A further fifty-seven diverse chicken WGS datasets were analysed for their ALVE content, identifying a total of 322 integration sites, over 80% of which were novel. Like exogenous ALV, ALVEs show site preference for proximity to protein-coding genes, but also exhibit signs of selection against deleterious integrations within genes. Conclusions obsERVer is a highly precise and broadly applicable pipeline for identifying retroviral integrations in WGS data. ALVE identification in commercial layers has aided development of high-throughput diagnostic assays which will aid ALVE management, with the aim to eventually eradicate ALVEs from high performance lines. Analysis of non-commercial chicken datasets with obsERVer has revealed broad ALVE diversity and facilitates the study of the biological effects of these ERVs in wild and domesticated populations.
Collapse
Affiliation(s)
- Andrew S Mason
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, EH25 9RG UK.,York Biomedical Research Institute, The Department of Biology, The University of York, York, YO10 5DD UK
| | - Ashlee R Lund
- Hy-Line International, 2583 240th Street, Dallas Center, Iowa, 50063 USA
| | - Paul M Hocking
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, EH25 9RG UK
| | - Janet E Fulton
- Hy-Line International, 2583 240th Street, Dallas Center, Iowa, 50063 USA
| | - David W Burt
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, EH25 9RG UK.,The University of Queensland, Brisbane, Queensland 4072 Australia
| |
Collapse
|
56
|
Reji L, Francis CA. Metagenome-assembled genomes reveal unique metabolic adaptations of a basal marine Thaumarchaeota lineage. ISME JOURNAL 2020; 14:2105-2115. [PMID: 32405026 DOI: 10.1038/s41396-020-0675-6] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2019] [Revised: 04/29/2020] [Accepted: 04/29/2020] [Indexed: 12/18/2022]
Abstract
Thaumarchaeota constitute an abundant and ubiquitous phylum of Archaea that play critical roles in the global nitrogen and carbon cycles. Most well-characterized members of the phylum are chemolithoautotrophic ammonia-oxidizing archaea (AOA), which comprise up to 5 and 20% of the total single-celled life in soil and marine systems, respectively. Using two high-quality metagenome-assembled genomes (MAGs), here we describe a divergent marine thaumarchaeal clade that is devoid of the ammonia-oxidation machinery and the AOA-specific carbon-fixation pathway. Phylogenomic analyses placed these genomes within the uncultivated and largely understudied marine pSL12-like thaumarchaeal clade. The predominant mode of nutrient acquisition appears to be aerobic heterotrophy, evidenced by the presence of respiratory complexes and various organic carbon degradation pathways. Both genomes encoded several pyrroloquinoline quinone (PQQ)-dependent alcohol dehydrogenases, as well as a form III RuBisCO. Metabolic reconstructions suggest anaplerotic CO2 assimilation mediated by RuBisCO, which may be linked to the central carbon metabolism. We conclude that these genomes represent a hitherto unrecognized evolutionary link between predominantly anaerobic basal thaumarchaeal lineages and mesophilic marine AOA, with important implications for diversification within the phylum Thaumarchaeota.
Collapse
Affiliation(s)
- Linta Reji
- Earth System Science, Stanford University, Stanford, CA, USA
| | | |
Collapse
|
57
|
Hurley D, Hoffmann M, Muruvanda T, Allard MW, Brown EW, Martins M, Fanning S. Atypical Salmonella enterica Serovars in Murine and Human Macrophage Infection Models. Infect Immun 2020; 88:e00353-19. [PMID: 32014897 PMCID: PMC7093118 DOI: 10.1128/iai.00353-19] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Accepted: 01/28/2020] [Indexed: 11/20/2022] Open
Abstract
Nontyphoidal Salmonella species are globally disseminated pathogens and are the predominant cause of gastroenteritis. The pathogenesis of salmonellosis has been extensively studied using in vivo murine models and cell lines, typically challenged with Salmonella enterica serovar Typhimurium. Although S. enterica serovars Enteritidis and Typhimurium are responsible for most of the human infections reported to the Centers for Disease Control and Prevention (CDC), several other serovars also contribute to clinical cases of salmonellosis. Despite their epidemiological importance, little is known about their infection phenotypes. Here, we report the virulence characteristics and genomes of 10 atypical S. enterica serovars linked to multistate foodborne outbreaks in the United States. We show that the murine RAW 264.7 macrophage model of infection is unsuitable for inferring human-relevant differences in nontyphoidal Salmonella infections, whereas differentiated human THP-1 macrophages allowed these isolates to be further characterized in a more human-relevant context.
Collapse
Affiliation(s)
- Daniel Hurley
- UCD Centre for Food Safety, School of Public Health, Physiotherapy and Sports Science, University College Dublin, Belfield, Dublin, Ireland
- School of Agriculture and Food Science, University College Dublin, Belfield, Dublin, Ireland
| | - Maria Hoffmann
- Center for Food Safety and Nutrition, Division of Microbiology, Office of Regulatory Science, U.S. Food and Drug Administration, College Park, Maryland, USA
| | - Tim Muruvanda
- Center for Food Safety and Nutrition, Division of Microbiology, Office of Regulatory Science, U.S. Food and Drug Administration, College Park, Maryland, USA
| | - Marc W Allard
- Center for Food Safety and Nutrition, Division of Microbiology, Office of Regulatory Science, U.S. Food and Drug Administration, College Park, Maryland, USA
| | - Eric W Brown
- Center for Food Safety and Nutrition, Division of Microbiology, Office of Regulatory Science, U.S. Food and Drug Administration, College Park, Maryland, USA
| | - Marta Martins
- UCD Centre for Food Safety, School of Public Health, Physiotherapy and Sports Science, University College Dublin, Belfield, Dublin, Ireland
| | - Séamus Fanning
- UCD Centre for Food Safety, School of Public Health, Physiotherapy and Sports Science, University College Dublin, Belfield, Dublin, Ireland
| |
Collapse
|
58
|
Mitchell K, Brito JJ, Mandric I, Wu Q, Knyazev S, Chang S, Martin LS, Karlsberg A, Gerasimov E, Littman R, Hill BL, Wu NC, Yang HT, Hsieh K, Chen L, Littman E, Shabani T, Enik G, Yao D, Sun R, Schroeder J, Eskin E, Zelikovsky A, Skums P, Pop M, Mangul S. Benchmarking of computational error-correction methods for next-generation sequencing data. Genome Biol 2020; 21:71. [PMID: 32183840 PMCID: PMC7079412 DOI: 10.1186/s13059-020-01988-3] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2019] [Accepted: 03/06/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. RESULTS In this paper, we evaluate the ability of error correction algorithms to fix errors across different types of datasets that contain various levels of heterogeneity. We highlight the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology. To demonstrate the efficacy of our technique, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. We then perform a realistic evaluation of error-correction methods. CONCLUSIONS In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. Finally, we also identify the techniques that offer a good balance between precision and sensitivity.
Collapse
Affiliation(s)
- Keith Mitchell
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Jaqueline J Brito
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA
| | - Igor Mandric
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
| | - Qiaozhen Wu
- Department of Mathematics, University of California Los Angeles, 520 Portola Plaza, Los Angeles, CA, 90095, USA
| | - Sergey Knyazev
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
| | - Sei Chang
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Lana S Martin
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA
| | - Aaron Karlsberg
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA
| | - Ekaterina Gerasimov
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
| | - Russell Littman
- UCLA Bioinformatics, 621 Charles E Young Dr S, Los Angeles, CA, 90024, USA
| | - Brian L Hill
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Nicholas C Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Harry Taegyun Yang
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Kevin Hsieh
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Linus Chen
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Eli Littman
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Taylor Shabani
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - German Enik
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Douglas Yao
- Department of Molecular, Cell, and Developmental Biology, University of California Los Angeles, 650 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
| | - Ren Sun
- Department of Molecular and Medical Pharmacology, University of California Los Angeles, 650 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
| | - Jan Schroeder
- Epigenetics & Reprogramming Laboratory, Monash University, 15 Innovation Walk, Melbourne, VIC, 3800, Australia
| | - Eleazar Eskin
- Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
- The Laboratory of Bioinformatics, I.M, Sechenov First Moscow State Medical University, Moscow, Russia, 119991
| | - Pavel Skums
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA
| | - Mihai Pop
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, 20742, USA
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA.
| |
Collapse
|
59
|
Metagenomes in the Borderline Ecosystems of the Antarctic Cryptoendolithic Communities. Microbiol Resour Announc 2020; 9:9/10/e01599-19. [PMID: 32139564 PMCID: PMC7171226 DOI: 10.1128/mra.01599-19] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Antarctic cryptoendolithic communities are microbial ecosystems dwelling inside rocks of the Antarctic desert. We present the first 18 shotgun metagenomes from these communities to further characterize their composition, biodiversity, functionality, and adaptation. Future studies will integrate taxonomic and functional annotations to examine the pathways necessary for life to evolve in the extremes. Antarctic cryptoendolithic communities are microbial ecosystems dwelling inside rocks of the Antarctic desert. We present the first 18 shotgun metagenomes from these communities to further characterize their composition, biodiversity, functionality, and adaptation. Future studies will integrate taxonomic and functional annotations to examine the pathways necessary for life to evolve in the extremes.
Collapse
|
60
|
Skarzyńska A, Pawełkowicz M, Pląder W. Genome-wide discovery of DNA variants in cucumber somaclonal lines. Gene 2020; 736:144412. [PMID: 32007586 DOI: 10.1016/j.gene.2020.144412] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Revised: 01/24/2020] [Accepted: 01/27/2020] [Indexed: 01/30/2023]
Abstract
The emergence of somaclonal variability in in vitro cultures is undesirable during micropropagation, but this phenomenon may be a source of genetic variability sought by breeders. The main factors that affect the appearance of variability are known, but the exact mechanism has not yet been determined. In this paper, we used next-generation sequencing and comparative genomics to study changes in the genomes of cucumber lines resulting from in vitro regeneration and somaclonal mutation in comparison to a reference, the highly inbred B10 line. The total number of obtained polymorphisms differed between the three somaclonal lines S1, S2 and S3, with 8369, 7591 and 44510, respectively. Polymorphisms occurred most frequently in non-coding regions and were mainly SNPs. High-impact changes accounted for 1%-3% of all polymorphisms and most often caused an open reading frame shift. Functional analysis of genes affected by high impact variants showed that they were related to transport, biosynthetic processes, nucleotide-containing compounds and cellular protein modification processes. The obtained results indicated significant factors affecting somaclonal variability and the appearance of changes in the genome, and demonstrated a lack of dependence between phenotype and the number of genomic polymorphisms.
Collapse
Affiliation(s)
- Agnieszka Skarzyńska
- Department of Plant Genetics, Breeding and Biotechnology, Institute of Biology, Warsaw, University of Life Sciences, Nowoursynowska 166, 02-787 Warsaw, Poland
| | - Magdalena Pawełkowicz
- Department of Plant Genetics, Breeding and Biotechnology, Institute of Biology, Warsaw, University of Life Sciences, Nowoursynowska 166, 02-787 Warsaw, Poland.
| | - Wojciech Pląder
- Department of Plant Genetics, Breeding and Biotechnology, Institute of Biology, Warsaw, University of Life Sciences, Nowoursynowska 166, 02-787 Warsaw, Poland.
| |
Collapse
|
61
|
Lubośny M, Śmietanka B, Przyłucka A, Burzyński A. Highly divergent mitogenomes ofGeukensia demissa(Bivalvia, Mytilidae) with extreme AT content. J ZOOL SYST EVOL RES 2020. [DOI: 10.1111/jzs.12354] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Marek Lubośny
- Department of Genetics and Marine Biotechnology Institute of Oceanology Polish Academy of Sciences Sopot Poland
| | - Beata Śmietanka
- Department of Genetics and Marine Biotechnology Institute of Oceanology Polish Academy of Sciences Sopot Poland
| | - Aleksandra Przyłucka
- Department of Genetics and Marine Biotechnology Institute of Oceanology Polish Academy of Sciences Sopot Poland
| | - Artur Burzyński
- Department of Genetics and Marine Biotechnology Institute of Oceanology Polish Academy of Sciences Sopot Poland
| |
Collapse
|
62
|
Liao X, Li M, Luo J, Zou Y, Wu FX, Pan Y, Luo F, Wang J. Improving de novo Assembly Based on Read Classification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:177-188. [PMID: 30059317 DOI: 10.1109/tcbb.2018.2861380] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Due to sequencing bias, sequencing error, and repeat problems, the genome assemblies usually contain misarrangements and gaps. When tackling these problems, current assemblers commonly consider the read libraries as a whole and adopt the same strategy to deal with them. However, if we can divide reads into different categories and take different assembly strategies for different read categories, we expect to reduce the mutual effects on problems in genome assembly and facilitate to produce satisfactory assemblies. In this paper, we present a new pipeline for genome assembly based on read classification (ARC). ARC classifies reads into three categories according to the frequencies of k-mers they contain. The three categories refer to (1) low depth reads, which contain a certain low frequency k-mers and are often caused by sequencing errors or bias; (2) high depth reads, which contain a certain high frequency k-mers and usually come from repetitive regions; and (3) normal depth reads, which are the rest of reads. After read classification, an existing assembler is used to assemble different read categories separately, which is beneficial to resolve problems in the genome assembly. ARC adopts loose assembly parameters for low depth reads, and strict assembly parameters for normal depth and high depth reads. We test ARC using five datasets. The experimental results show that, assemblers combining with ARC can generate better assemblies in terms of NA50, NGA50, and genome fraction.
Collapse
|
63
|
Monat C, Padmarasu S, Lux T, Wicker T, Gundlach H, Himmelbach A, Ens J, Li C, Muehlbauer GJ, Schulman AH, Waugh R, Braumann I, Pozniak C, Scholz U, Mayer KFX, Spannagl M, Stein N, Mascher M. TRITEX: chromosome-scale sequence assembly of Triticeae genomes with open-source tools. Genome Biol 2019; 20:284. [PMID: 31849336 DOI: 10.1101/631648] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Accepted: 11/25/2019] [Indexed: 05/29/2023] Open
Abstract
Chromosome-scale genome sequence assemblies underpin pan-genomic studies. Recent genome assembly efforts in the large-genome Triticeae crops wheat and barley have relied on the commercial closed-source assembly algorithm DeNovoMagic. We present TRITEX, an open-source computational workflow that combines paired-end, mate-pair, 10X Genomics linked-read with chromosome conformation capture sequencing data to construct sequence scaffolds with megabase-scale contiguity ordered into chromosomal pseudomolecules. We evaluate the performance of TRITEX on publicly available sequence data of tetraploid wild emmer and hexaploid bread wheat, and construct an improved annotated reference genome sequence assembly of the barley cultivar Morex as a community resource.
Collapse
Affiliation(s)
- Cécile Monat
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Sudharsan Padmarasu
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Thomas Lux
- PGSB - Plant Genome and Systems Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Germany
| | - Thomas Wicker
- Department of Plant and Microbial Biology, University of Zurich, Zurich, Switzerland
| | - Heidrun Gundlach
- PGSB - Plant Genome and Systems Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Germany
| | - Axel Himmelbach
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Jennifer Ens
- Department of Plant Sciences, University of Saskatchewan, Saskatoon, Canada
| | - Chengdao Li
- Western Barley Genetics Alliance, School of Veterinary and Life Sciences (VLS), Murdoch University, Murdoch, WA, Australia
- Hubei Collaborative Innovation Center for Grain Industry/School of Agriculture, Yangtze University, Jingzhou, China
| | - Gary J Muehlbauer
- Department of Agronomy and Plant Genetics & Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN, USA
| | - Alan H Schulman
- Green Technology, Natural Resources Institute (Luke), Viikki Plant Science Centre, and Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Robbie Waugh
- The James Hutton Institute, Dundee, UK
- School of Life Sciences, University of Dundee, Dundee, UK
| | | | - Curtis Pozniak
- Department of Plant Sciences, University of Saskatchewan, Saskatoon, Canada
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Klaus F X Mayer
- PGSB - Plant Genome and Systems Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Manuel Spannagl
- PGSB - Plant Genome and Systems Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Germany
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany.
- Department of Crop Sciences, Center for Integrated Breeding Research (CiBreed), Georg-August-University Göttingen, Göttingen, Germany.
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.
| |
Collapse
|
64
|
Monat C, Padmarasu S, Lux T, Wicker T, Gundlach H, Himmelbach A, Ens J, Li C, Muehlbauer GJ, Schulman AH, Waugh R, Braumann I, Pozniak C, Scholz U, Mayer KFX, Spannagl M, Stein N, Mascher M. TRITEX: chromosome-scale sequence assembly of Triticeae genomes with open-source tools. Genome Biol 2019; 20:284. [PMID: 31849336 PMCID: PMC6918601 DOI: 10.1186/s13059-019-1899-5] [Citation(s) in RCA: 141] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Accepted: 11/25/2019] [Indexed: 11/24/2022] Open
Abstract
Chromosome-scale genome sequence assemblies underpin pan-genomic studies. Recent genome assembly efforts in the large-genome Triticeae crops wheat and barley have relied on the commercial closed-source assembly algorithm DeNovoMagic. We present TRITEX, an open-source computational workflow that combines paired-end, mate-pair, 10X Genomics linked-read with chromosome conformation capture sequencing data to construct sequence scaffolds with megabase-scale contiguity ordered into chromosomal pseudomolecules. We evaluate the performance of TRITEX on publicly available sequence data of tetraploid wild emmer and hexaploid bread wheat, and construct an improved annotated reference genome sequence assembly of the barley cultivar Morex as a community resource.
Collapse
Affiliation(s)
- Cécile Monat
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Sudharsan Padmarasu
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Thomas Lux
- PGSB - Plant Genome and Systems Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Germany
| | - Thomas Wicker
- Department of Plant and Microbial Biology, University of Zurich, Zurich, Switzerland
| | - Heidrun Gundlach
- PGSB - Plant Genome and Systems Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Germany
| | - Axel Himmelbach
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Jennifer Ens
- Department of Plant Sciences, University of Saskatchewan, Saskatoon, Canada
| | - Chengdao Li
- Western Barley Genetics Alliance, School of Veterinary and Life Sciences (VLS), Murdoch University, Murdoch, WA, Australia
- Hubei Collaborative Innovation Center for Grain Industry/School of Agriculture, Yangtze University, Jingzhou, China
| | - Gary J Muehlbauer
- Department of Agronomy and Plant Genetics & Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN, USA
| | - Alan H Schulman
- Green Technology, Natural Resources Institute (Luke), Viikki Plant Science Centre, and Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Robbie Waugh
- The James Hutton Institute, Dundee, UK
- School of Life Sciences, University of Dundee, Dundee, UK
| | | | - Curtis Pozniak
- Department of Plant Sciences, University of Saskatchewan, Saskatoon, Canada
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Klaus F X Mayer
- PGSB - Plant Genome and Systems Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Germany
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Manuel Spannagl
- PGSB - Plant Genome and Systems Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Germany
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany.
- Department of Crop Sciences, Center for Integrated Breeding Research (CiBreed), Georg-August-University Göttingen, Göttingen, Germany.
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.
| |
Collapse
|
65
|
Ignacio-Espinoza JC, Ahlgren NA, Fuhrman JA. Long-term stability and Red Queen-like strain dynamics in marine viruses. Nat Microbiol 2019; 5:265-271. [PMID: 31819214 DOI: 10.1038/s41564-019-0628-x] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Accepted: 11/04/2019] [Indexed: 11/09/2022]
Abstract
Viruses that infect microorganisms dominate marine microbial communities numerically, with impacts ranging from host evolution to global biogeochemical cycles1,2. However, virus community dynamics, necessary for conceptual and mechanistic model development, remains difficult to assess. Here, we describe the long-term stability of a viral community by analysing the metagenomes of near-surface 0.02-0.2 μm samples from the San Pedro Ocean Time-series3 that were sampled monthly over 5 years. Of 19,907 assembled viral contigs (>5 kb, mean 15 kb), 97% were found in each sample (by >98% ID metagenomic read recruitment) to have relative abundances that ranged over seven orders of magnitude, with limited temporal reordering of rank abundances along with little change in richness. Seasonal variations in viral community composition were superimposed on the overall stability; maximum community similarity occurred at 12-month intervals. Despite the stability of viral genotypic clusters that had 98% sequence identity, viral sequences showed transient variations in single-nucleotide polymorphisms (SNPs) and constant turnover of minor population variants, each rising and falling over a few months, reminiscent of Red Queen dynamics4. The rise and fall of variants within populations, interpreted through the perspective of known virus-host interactions5, is consistent with the hypothesis that fluctuating selection acts on a microdiverse cloud of strains, and this succession is associated with ever-shifting virus-host defences and counterdefences. This results in long-term virus-host coexistence that is facilitated by perpetually changing minor variants.
Collapse
Affiliation(s)
| | - Nathan A Ahlgren
- Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA.,Department of Biology, Clark University, Worcester, MA, USA
| | - Jed A Fuhrman
- Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
66
|
Sevim V, Lee J, Egan R, Clum A, Hundley H, Lee J, Everroad RC, Detweiler AM, Bebout BM, Pett-Ridge J, Göker M, Murray AE, Lindemann SR, Klenk HP, O'Malley R, Zane M, Cheng JF, Copeland A, Daum C, Singer E, Woyke T. Shotgun metagenome data of a defined mock community using Oxford Nanopore, PacBio and Illumina technologies. Sci Data 2019; 6:285. [PMID: 31772173 PMCID: PMC6879543 DOI: 10.1038/s41597-019-0287-z] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Accepted: 10/31/2019] [Indexed: 11/17/2022] Open
Abstract
Metagenomic sequence data from defined mock communities is crucial for the assessment of sequencing platform performance and downstream analyses, including assembly, binning and taxonomic assignment. We report a comparison of shotgun metagenome sequencing and assembly metrics of a defined microbial mock community using the Oxford Nanopore Technologies (ONT) MinION, PacBio and Illumina sequencing platforms. Our synthetic microbial community BMock12 consists of 12 bacterial strains with genome sizes spanning 3.2–7.2 Mbp, 40–73% GC content, and 1.5–7.3% repeats. Size selection of both PacBio and ONT sequencing libraries prior to sequencing was essential to yield comparable relative abundances of organisms among all sequencing technologies. While the Illumina-based metagenome assembly yielded good coverage with few misassemblies, contiguity was greatly improved by both, Illumina + ONT and Illumina + PacBio hybrid assemblies but increased misassemblies, most notably in genomes with high sequence similarity to each other. Our resulting datasets allow evaluation and benchmarking of bioinformatics software on Illumina, PacBio and ONT platforms in parallel. Measurement(s) | metagenomic data • sequence_assembly | Technology Type(s) | ONT MinION • Illumina sequencing • PacBio RS II | Factor Type(s) | sequencing platform | Sample Characteristic - Organism | Bacteria | Sample Characteristic - Environment | mock community |
Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.10260740
Collapse
Affiliation(s)
- Volkan Sevim
- DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA
| | - Juna Lee
- DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA
| | - Robert Egan
- DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA
| | - Alicia Clum
- DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA
| | - Hope Hundley
- DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA
| | - Janey Lee
- DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA
| | - R Craig Everroad
- NASA Ames Research Center, Exobiology Branch, Moffett Field, CA, 94035, USA
| | - Angela M Detweiler
- NASA Ames Research Center, Exobiology Branch, Moffett Field, CA, 94035, USA.,Bay Area Environmental Research Institute, Moffett Field, CA, 94035, USA
| | - Brad M Bebout
- NASA Ames Research Center, Exobiology Branch, Moffett Field, CA, 94035, USA
| | - Jennifer Pett-Ridge
- Lawrence Livermore National Laboratory, Nuclear and Chemical Science Division, 7000 East Ave, Livermore, CA, 94550-9234, USA
| | - Markus Göker
- Leibniz-Institut DSMZ-Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH, Inhoffenstraße 7B, 38124, Braunschweig, Germany
| | - Alison E Murray
- Desert Research Institute, Division of Earth and Ecosystem Sciences, 2215 Raggio Pkwy, Reno, NV, 89512, USA
| | | | - Hans-Peter Klenk
- Newcastle University, School of Natural and Environmental Sciences, Ridley Building 2, Newcastle upon Tyne, NE1 7RU, UK
| | - Ronan O'Malley
- DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA
| | - Matthew Zane
- DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA
| | - Jan-Fang Cheng
- DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA
| | - Alex Copeland
- DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA
| | - Christopher Daum
- DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA
| | - Esther Singer
- DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA. .,Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA.
| | - Tanja Woyke
- DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, 94598, USA.
| |
Collapse
|
67
|
A high-quality cucumber genome assembly enhances computational comparative genomics. Mol Genet Genomics 2019; 295:177-193. [PMID: 31620884 DOI: 10.1007/s00438-019-01614-3] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Accepted: 09/30/2019] [Indexed: 01/12/2023]
Abstract
Genetic variation is expressed by the presence of polymorphisms in compared genomes of individuals that can be transferred to next generations. The aim of this work was to reveal genome dynamics by predicting polymorphisms among the genomes of three individuals of the highly inbred B10 cucumber (Cucumis sativus L.) line. In this study, bioinformatic comparative genomics was used to uncover cucumber genome dynamics (also called real-time evolution). We obtained a new genome draft assembly from long single molecule real-time (SMRT) sequencing reads and used short paired-end read data from three individuals to analyse the polymorphisms. Using this approach, we uncovered differentiation aspects in the genomes of the inbred B10 line. The newly assembled genome sequence (B10v3) has the highest contiguity and quality characteristics among the currently available cucumber genome draft sequences. Standard and newly designed approaches were used to predict single nucleotide and structural variants that were unique among the three individual genomes. Some of the variant predictions spanned protein-coding genes and their promoters, and some were in the neighbourhood of annotated interspersed repetitive elements, indicating that the highly inbred homozygous plants remained genetically dynamic. This is the first bioinformatic comparative genomics study of a single highly inbred plant line. For this project, we developed a polymorphism prediction method with optimized precision parameters, which allowed the effective detection of small nucleotide variants (SNVs). This methodology could significantly improve bioinformatic pipelines for comparative genomics and thus has great practical potential in genomic metadata handling.
Collapse
|
68
|
The Parauncinula polyspora Draft Genome Provides Insights into Patterns of Gene Erosion and Genome Expansion in Powdery Mildew Fungi. mBio 2019; 10:mBio.01692-19. [PMID: 31551331 PMCID: PMC6759760 DOI: 10.1128/mbio.01692-19] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Powdery mildew fungi are widespread and agronomically relevant phytopathogens causing major yield losses. Their genomes have disproportionately large numbers of mobile genetic elements, and they have experienced a significant loss of highly conserved fungal genes. In order to learn more about the evolutionary history of this fungal group, we explored the genome of an Asian oak tree pathogen, Parauncinula polyspora, a species that diverged early during evolution from the remaining powdery mildew fungi. We found that the P. polyspora draft genome is comparatively compact, has a low number of protein-coding genes, and, despite the absence of a dedicated genome defense system, lacks the massive proliferation of repetitive sequences. Based on these findings, we infer an evolutionary trajectory that shaped the genomes of powdery mildew fungi. Due to their comparatively small genome size and short generation time, fungi are exquisite model systems to study eukaryotic genome evolution. Powdery mildew fungi present an exceptional case because of their strict host dependency (termed obligate biotrophy) and the atypical size of their genomes (>100 Mb). This size expansion is largely due to the pervasiveness of transposable elements on 70% of the genome and is associated with the loss of multiple conserved ascomycete genes required for a free-living lifestyle. To date, little is known about the mechanisms that drove these changes, and information on ancestral powdery mildew genomes is lacking. We report genome analysis of the early-diverged and exclusively sexually reproducing powdery mildew fungus Parauncinula polyspora, which we performed on the basis of a natural leaf epiphytic metapopulation sample. In contrast to other sequenced species of this taxonomic group, the assembled P. polyspora draft genome is surprisingly small (<30 Mb), has a higher content of conserved ascomycete genes, and is sparsely equipped with transposons (<10%), despite the conserved absence of a common defense mechanism involved in constraining repetitive elements. We speculate that transposable element spread might have been limited by this pathogen’s unique reproduction strategy and host features and further hypothesize that the loss of conserved ascomycete genes may promote the evolutionary isolation and host niche specialization of powdery mildew fungi. Limitations associated with this evolutionary trajectory might have been in part counteracted by the evolution of plastic, transposon-rich genomes and/or the expansion of gene families encoding secreted virulence proteins.
Collapse
|
69
|
Meccariello A, Salvemini M, Primo P, Hall B, Koskinioti P, Dalíková M, Gravina A, Gucciardino MA, Forlenza F, Gregoriou ME, Ippolito D, Monti SM, Petrella V, Perrotta MM, Schmeing S, Ruggiero A, Scolari F, Giordano E, Tsoumani KT, Marec F, Windbichler N, Arunkumar KP, Bourtzis K, Mathiopoulos KD, Ragoussis J, Vitagliano L, Tu Z, Papathanos PA, Robinson MD, Saccone G. Maleness-on-the-Y ( MoY) orchestrates male sex determination in major agricultural fruit fly pests. Science 2019; 365:1457-1460. [PMID: 31467189 DOI: 10.1126/science.aax1318] [Citation(s) in RCA: 74] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Accepted: 08/16/2019] [Indexed: 12/16/2022]
Abstract
In insects, rapidly evolving primary sex-determining signals are transduced by a conserved regulatory module controlling sexual differentiation. In the agricultural pest Ceratitis capitata (Mediterranean fruit fly, or Medfly), we identified a Y-linked gene, Maleness-on-the-Y (MoY), encoding a small protein that is necessary and sufficient for male development. Silencing or disruption of MoY in XY embryos causes feminization, whereas overexpression of MoY in XX embryos induces masculinization. Crosses between transformed XY females and XX males give rise to males and females, indicating that a Y chromosome can be transmitted by XY females. MoY is Y-linked and functionally conserved in other species of the Tephritidae family, highlighting its potential to serve as a tool for developing more effective control strategies against these major agricultural insect pests.
Collapse
Affiliation(s)
- Angela Meccariello
- Department of Biology, University of Naples "Federico II," 80126 Napoli, Italy
| | - Marco Salvemini
- Department of Biology, University of Naples "Federico II," 80126 Napoli, Italy
| | - Pasquale Primo
- Department of Biology, University of Naples "Federico II," 80126 Napoli, Italy
| | - Brantley Hall
- Department of Biochemistry, Virginia Tech, Blacksburg, VA 24061, USA
| | - Panagiota Koskinioti
- Insect Pest Control Laboratory, Joint FAO/IAEA Division of Nuclear Techniques in Food and Agriculture, A-1400 Vienna, Austria.,Department of Biochemistry and Biotechnology, University of Thessaly, 41500 Larissa, Greece
| | - Martina Dalíková
- Institute of Entomology, Biology Centre of the Czech Academy of Sciences, 370 05 České Budějovice, Czech Republic.,Faculty of Science, University of South Bohemia, 370 05 České Budějovice, Czech Republic
| | - Andrea Gravina
- Department of Biology, University of Naples "Federico II," 80126 Napoli, Italy
| | | | - Federica Forlenza
- Department of Biology, University of Naples "Federico II," 80126 Napoli, Italy
| | - Maria-Eleni Gregoriou
- Department of Biochemistry and Biotechnology, University of Thessaly, 41500 Larissa, Greece
| | - Domenica Ippolito
- Department of Biology, University of Naples "Federico II," 80126 Napoli, Italy
| | - Simona Maria Monti
- Institute of Biostructures and Bioimaging (IBB), CNR, 80134 Naples, Italy
| | - Valeria Petrella
- Department of Biology, University of Naples "Federico II," 80126 Napoli, Italy
| | | | - Stephan Schmeing
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Alessia Ruggiero
- Institute of Biostructures and Bioimaging (IBB), CNR, 80134 Naples, Italy
| | - Francesca Scolari
- Department of Biology and Biotechnology, University of Pavia, 27100 Pavia, Italy
| | - Ennio Giordano
- Department of Biology, University of Naples "Federico II," 80126 Napoli, Italy
| | - Konstantina T Tsoumani
- Department of Biochemistry and Biotechnology, University of Thessaly, 41500 Larissa, Greece
| | - František Marec
- Institute of Entomology, Biology Centre of the Czech Academy of Sciences, 370 05 České Budějovice, Czech Republic
| | - Nikolai Windbichler
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Kallare P Arunkumar
- Centre of Excellence for Genetics and Genomics of Silkmoths, Laboratory of Molecular Genetics, Centre for DNA Fingerprinting and Diagnostics, Hyderabad 500 039, India
| | - Kostas Bourtzis
- Insect Pest Control Laboratory, Joint FAO/IAEA Division of Nuclear Techniques in Food and Agriculture, A-1400 Vienna, Austria
| | - Kostas D Mathiopoulos
- Department of Biochemistry and Biotechnology, University of Thessaly, 41500 Larissa, Greece
| | - Jiannis Ragoussis
- Department of Human Genetics and Bioengineering, McGill University and Genome Quebec Innovation Centre, Montreal, QC H3A 0G1, Canada
| | - Luigi Vitagliano
- Institute of Biostructures and Bioimaging (IBB), CNR, 80134 Naples, Italy
| | - Zhijian Tu
- Department of Biochemistry, Virginia Tech, Blacksburg, VA 24061, USA
| | - Philippos Aris Papathanos
- Section of Genomics and Genetics, Department of Experimental Medicine, University of Perugia, 06132 Perugia, Italy. .,Department of Entomology, The Robert H. Smith Faculty of Agriculture, Food and Environment, Hebrew University of Jerusalem, Rehovot 76100, Israel
| | - Mark D Robinson
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland.
| | - Giuseppe Saccone
- Department of Biology, University of Naples "Federico II," 80126 Napoli, Italy.
| |
Collapse
|
70
|
Chen X, Dong Z, Liu G, He J, Zhao R, Wang W, Peng Y, Li X. Phylogenetic analysis provides insights into the evolution of Asian fireflies and adult bioluminescence. Mol Phylogenet Evol 2019; 140:106600. [PMID: 31445200 DOI: 10.1016/j.ympev.2019.106600] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2018] [Revised: 08/09/2019] [Accepted: 08/20/2019] [Indexed: 02/04/2023]
Abstract
Fireflies are one of the best-known examples of luminescent organisms. The limited geographic distribution and rarity of some firefly genera have hindered molecular phylogenetic analysis, resulting in uncertainty in regard to firefly phylogeny. Here, using genome skimming next-generation sequencing, we sequenced 23 Asian firefly species from 15 genera (Lampyridae: 14; Rhagophthalmidae: one) and assembled their mitochondrial genomes (mitogenomes) and nuclear ribosomal DNA (rDNA) repeat unit. The mitogenomes (including 15 mitochondrial genes: COX1-3, ATP6&8, ND1-6&4L, CYTB, 12S, and 16S) were recovered for almost all 23 species; furthermore, three regions of the nuclear rDNA repeat unit (18S, 28S, and 5.8S) were recovered for 22 out of the 23 species. The mitogenomes of 11 genera and 22 species as well as the complete rDNA from 22 species are reported here for the first time. Combined with previously published sequences of mitochondrial and rDNA coding regions, 166 species (170 populations with four overlapping in Lampyridae) were included in the current analyses. We selected different species groups and coding regions to infer phylogenies, and then employed tree certainty (TC) and internode certainty (IC) to quantify any phylogenetic incongruence. Phylogenetic analysis of 18 coding regions (15 mitochondrial genes and three regions of the nuclear rDNA repeat unit) from different species groups showed that the 144-species selection group (excluding 22 species outside Lampyridae) had relatively high TC (101.39). Further phylogenetic analysis of the 144 species using different coding regions indicated that the phylogeny of the 13 coding regions (10 mitochondrial genes: COX1-2, ATP6&8, ND1, ND4-5, CYTB, 12S and 16S; three rDNA regions: 18S, 5.8S, and 28S) demonstrated higher TC (103.02) than the phylogenies based on the 18 coding regions (TC = 101.39), conserved-regions (c-regions, i.e., 12S, 16S, COX1, 18S, and 28S) (TC = 95.11), or conserved-sites (c-sites, TC = 92.31) for the mitochondrial genes. In contrast, the c-sites strengthened the deeper nodes of the 144-species phylogeny compared to the c-regions. All of the 144-species phylogenies using different coding regions (except the c-regions) consistently recovered the monophyly of each of the three luminous families and their combination (Lampyridae, Rhagophthalmidae, and Phengodidae) with high IC support. Our phylogenetic analyses clarified the position of firefly genera Lamprigera, Vesta, Stenocladius, Pyrocoelia, Diaphanes, Abscondita, Pygoluciola, Emeia, Pristolycus, and Menghuoius. We also inferred the evolutionary pattern of adult bioluminescence in Lampyridae based on the phylogenies of 166 and 144 species. Our data suggest that the common ancestor of Lampyridae possessed adult bioluminescence, with a higher loss rate than gain rate of bioluminescence during its lineage evolution. Our results provide insight into Asian firefly phylogeny, and also enrich mitogenome and rDNA data resources for further study.
Collapse
Affiliation(s)
- Xing Chen
- CAS Key Laboratory of Tropical Forest Ecology, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Mengla, Yunnan 666303, China
| | - Zhiwei Dong
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
| | - Guichun Liu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China; Center for Ecological and Environmental Sciences, Northwestern Polytechnical University, Xi'an, Shaanxi 710072, China
| | - Jinwu He
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China; Center for Ecological and Environmental Sciences, Northwestern Polytechnical University, Xi'an, Shaanxi 710072, China
| | - Ruoping Zhao
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
| | - Wen Wang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China; Center for Excellence in Animal Evolution and Genetics, Kunming, Yunnan 650223, China; Center for Ecological and Environmental Sciences, Northwestern Polytechnical University, Xi'an, Shaanxi 710072, China.
| | - Yanqiong Peng
- CAS Key Laboratory of Tropical Forest Ecology, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Mengla, Yunnan 666303, China.
| | - Xueyan Li
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China.
| |
Collapse
|
71
|
Reji L, Tolar BB, Smith JM, Chavez FP, Francis CA. Depth distributions of nitrite reductase (nirK) gene variants reveal spatial dynamics of thaumarchaeal ecotype populations in coastal Monterey Bay. Environ Microbiol 2019; 21:4032-4045. [PMID: 31330081 DOI: 10.1111/1462-2920.14753] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Revised: 07/16/2019] [Accepted: 07/16/2019] [Indexed: 11/29/2022]
Abstract
Ammonia-oxidizing archaea (AOA) of the phylum Thaumarchaeota are key players in nutrient cycling, yet large gaps remain in our understanding of their ecology and metabolism. Despite multiple lines of evidence pointing to a central role for copper-containing nitrite reductase (NirK) in AOA metabolism, the thaumarchaeal nirK gene is rarely studied in the environment. In this study, we examine the diversity of nirK in the marine pelagic environment, in light of previously described ecological patterns of pelagic thaumarchaeal populations. Phylogenetic analyses show that nirK better resolves diversification patterns of marine Thaumarchaeota, compared to the conventionally used marker gene amoA. Specifically, we demonstrate that the three major phylogenetic clusters of marine nirK correspond to the three 'ecotype' populations of pelagic Thaumarchaeota. In this context, we further examine the relative distributions of the three variant groups in metagenomes and metatranscriptomes representing two depth profiles in coastal Monterey Bay. Our results reveal that nirK effectively tracks the dynamics of thaumarchaeal ecotype populations, particularly finer-scale diversification patterns within major lineages. We also find evidence for multiple copies of nirK per genome in a fraction of thaumarchaeal cells in the water column, which must be taken into account when using it as a molecular marker.
Collapse
Affiliation(s)
- Linta Reji
- Department of Earth System Science, Stanford University, Stanford, CA
| | - Bradley B Tolar
- Department of Earth System Science, Stanford University, Stanford, CA
| | - Jason M Smith
- Monterey Bay Aquarium Research Institute, Moss Landing, CA.,Marine Science Institute, University of California Santa Barbara, Santa Barbara, CA
| | | | | |
Collapse
|
72
|
Microbial metagenomes and metatranscriptomes during a coastal phytoplankton bloom. Sci Data 2019; 6:129. [PMID: 31332186 PMCID: PMC6646334 DOI: 10.1038/s41597-019-0132-4] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2019] [Accepted: 06/19/2019] [Indexed: 11/09/2022] Open
Abstract
Metagenomic and metatranscriptomic time-series data covering a 52-day period in the fall of 2016 provide an inventory of bacterial and archaeal community genes, transcripts, and taxonomy during an intense dinoflagellate bloom in Monterey Bay, CA, USA. The dataset comprises 84 metagenomes (0.8 terabases), 82 metatranscriptomes (1.1 terabases), and 88 16S rRNA amplicon libraries from samples collected on 41 dates. The dataset also includes 88 18S rRNA amplicon libraries, characterizing the taxonomy of the eukaryotic community during the bloom. Accompanying the sequence data are chemical and biological measurements associated with each sample. These datasets will facilitate studies of the structure and function of marine bacterial communities during episodic phytoplankton blooms.
Collapse
|
73
|
Bartaula R, Melo ATO, Kingan S, Jin Y, Hale I. Mapping non-host resistance to the stem rust pathogen in an interspecific barberry hybrid. BMC PLANT BIOLOGY 2019; 19:319. [PMID: 31311507 PMCID: PMC6636152 DOI: 10.1186/s12870-019-1893-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Accepted: 06/19/2019] [Indexed: 05/17/2023]
Abstract
BACKGROUND Non-host resistance (NHR) presents a compelling long-term plant protection strategy for global food security, yet the genetic basis of NHR remains poorly understood. For many diseases, including stem rust of wheat [causal organism Puccinia graminis (Pg)], NHR is largely unexplored due to the inherent challenge of developing a genetically tractable system within which the resistance segregates. The present study turns to the pathogen's alternate host, barberry (Berberis spp.), to overcome this challenge. RESULTS In this study, an interspecific mapping population derived from a cross between Pg-resistant Berberis thunbergii (Bt) and Pg-susceptible B. vulgaris was developed to investigate the Pg-NHR exhibited by Bt. To facilitate QTL analysis and subsequent trait dissection, the first genetic linkage maps for the two parental species were constructed and a chromosome-scale reference genome for Bt was assembled (PacBio + Hi-C). QTL analysis resulted in the identification of a single 13 cM region (~ 5.1 Mbp spanning 13 physical contigs) on the short arm of Bt chromosome 3. Differential gene expression analysis, combined with sequence variation analysis between the two parental species, led to the prioritization of several candidate genes within the QTL region, some of which belong to gene families previously implicated in disease resistance. CONCLUSIONS Foundational genetic and genomic resources developed for Berberis spp. enabled the identification and annotation of a QTL associated with Pg-NHR. Although subsequent validation and fine mapping studies are needed, this study demonstrates the feasibility of and lays the groundwork for dissecting Pg-NHR in the alternate host of one of agriculture's most devastating pathogens.
Collapse
Affiliation(s)
- Radhika Bartaula
- Department of Molecular, Cellular, and Biomedical Sciences, University of New Hampshire, Durham, NH 03824 USA
| | - Arthur T. O. Melo
- Department of Agriculture, Nutrition, and Food Systems, University of New Hampshire, Durham, NH 03824 USA
| | | | - Yue Jin
- USDA-ARS Cereal Disease Laboratory, St. Paul, MN 55108 USA
| | - Iago Hale
- Department of Agriculture, Nutrition, and Food Systems, University of New Hampshire, Durham, NH 03824 USA
| |
Collapse
|
74
|
Ellison C, Bachtrog D. Recurrent gene co-amplification on Drosophila X and Y chromosomes. PLoS Genet 2019; 15:e1008251. [PMID: 31329593 PMCID: PMC6690552 DOI: 10.1371/journal.pgen.1008251] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Revised: 08/12/2019] [Accepted: 06/18/2019] [Indexed: 12/19/2022] Open
Abstract
Y chromosomes often contain amplified genes which can increase dosage of male fertility genes and counteract degeneration via gene conversion. Here we identify genes with increased copy number on both X and Y chromosomes in various species of Drosophila, a pattern that has previously been associated with sex chromosome drive involving the Slx and Sly gene families in mice. We show that recurrent X/Y co-amplification appears to be an important evolutionary force that has shaped gene content evolution of sex chromosomes in Drosophila. We demonstrate that convergent acquisition and amplification of testis expressed gene families are common on Drosophila sex chromosomes, and especially on recently formed ones, and we carefully characterize one putative novel X/Y co-amplification system. We find that co-amplification of the S-Lap1/GAPsec gene pair on both the X and the Y chromosome occurred independently several times in members of the D. obscura group, where this normally autosomal gene pair is sex-linked due to a sex chromosome-autosome fusion. We explore several evolutionary scenarios that would explain this pattern of co-amplification. Investigation of gene expression and short RNA profiles at the S-Lap1/GAPsec system suggest that, like Slx/Sly in mice, these genes may be remnants of a cryptic sex chromosome drive system, however additional transgenic experiments will be necessary to validate this model. Regardless of whether sex chromosome drive is responsible for this co-amplification, our findings suggest that recurrent gene duplications between X and Y sex chromosomes could have a widespread effect on genomic and evolutionary patterns, including the epigenetic regulation of sex chromosomes, the distribution of sex-biased genes, and the evolution of hybrid sterility.
Collapse
Affiliation(s)
- Christopher Ellison
- Department of Integrative Biology, University of California Berkeley, Berkeley, California, United States of America
| | - Doris Bachtrog
- Department of Integrative Biology, University of California Berkeley, Berkeley, California, United States of America
| |
Collapse
|
75
|
Nash MV, Anesio AM, Barker G, Tranter M, Varliero G, Eloe-Fadrosh EA, Nielsen T, Turpin-Jelfs T, Benning LG, Sánchez-Baracaldo P. Metagenomic insights into diazotrophic communities across Arctic glacier forefields. FEMS Microbiol Ecol 2019; 94:5036517. [PMID: 29901729 PMCID: PMC6054269 DOI: 10.1093/femsec/fiy114] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Accepted: 06/11/2018] [Indexed: 11/30/2022] Open
Abstract
Microbial nitrogen fixation is crucial for building labile nitrogen stocks and facilitating higher plant colonisation in oligotrophic glacier forefield soils. Here, the diazotrophic bacterial community structure across four Arctic glacier forefields was investigated using metagenomic analysis. In total, 70 soil metagenomes were used for taxonomic interpretation based on 185 nitrogenase (nif) sequences, extracted from assembled contigs. The low number of recovered genes highlights the need for deeper sequencing in some diverse samples, to uncover the complete microbial populations. A key group of forefield diazotrophs, found throughout the forefields, was identified using a nifH phylogeny, associated with nifH Cluster I and III. Sequences related most closely to groups including Alphaproteobacteria, Betaproteobacteria, Cyanobacteria and Firmicutes. Using multiple nif genes in a Last Common Ancestor analysis revealed a diverse range of diazotrophs across the forefields. Key organisms identified across the forefields included Nostoc, Geobacter, Polaromonas and Frankia. Nitrogen fixers that are symbiotic with plants were also identified, through the presence of root associated diazotrophs, which fix nitrogen in return for reduced carbon. Additional nitrogen fixers identified in forefield soils were metabolically diverse, including fermentative and sulphur cycling bacteria, halophiles and anaerobes.
Collapse
Affiliation(s)
- Maisie V Nash
- School of Geographical Sciences, University of Bristol, UK
| | | | - Gary Barker
- School of Life Sciences, University of Bristol, UK
| | - Martyn Tranter
- School of Geographical Sciences, University of Bristol, UK
| | | | | | - Torben Nielsen
- DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, US
| | | | - Liane G Benning
- GFZ German Research Centre for Geosciences, Telegrafenenberg, 14473 Potsdam, Germany.,School of Earth and Environment, University of Leeds, LS2 9JT, Leeds, UK.,Department of Earth Sciences, Free University of Berlin, Malteserstr, 74-100, Building A, 12249, Berlin, Germany
| | | |
Collapse
|
76
|
Heydari M, Miclotte G, Van de Peer Y, Fostier J. Illumina error correction near highly repetitive DNA regions improves de novo genome assembly. BMC Bioinformatics 2019; 20:298. [PMID: 31159722 PMCID: PMC6545690 DOI: 10.1186/s12859-019-2906-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 05/17/2019] [Indexed: 11/10/2022] Open
Abstract
Background Several standalone error correction tools have been proposed to correct sequencing errors in Illumina data in order to facilitate de novo genome assembly. However, in a recent survey, we showed that state-of-the-art assemblers often did not benefit from this pre-correction step. We found that many error correction tools introduce new errors in reads that overlap highly repetitive DNA regions such as low-complexity patterns or short homopolymers, ultimately leading to a more fragmented assembly. Results We propose BrownieCorrector, an error correction tool for Illumina sequencing data that focuses on the correction of only those reads that overlap short DNA patterns that are highly repetitive in the genome. BrownieCorrector extracts all reads that contain such a pattern and clusters them into different groups using a community detection algorithm that takes into account both the sequence similarity between overlapping reads and their respective paired-end reads. Each cluster holds reads that originate from the same genomic region and hence each cluster can be corrected individually, thus providing a consistent correction for all reads within that cluster. Conclusions BrownieCorrector is benchmarked using six real Illumina datasets for different eukaryotic genomes. The prior use of BrownieCorrector improves assembly results over the use of uncorrected reads in all cases. In comparison with other error correction tools, BrownieCorrector leads to the best assembly results in most cases even though less than 2% of the reads within a dataset are corrected. Additionally, we investigate the impact of error correction on hybrid assembly where the corrected Illumina reads are supplemented with PacBio data. Our results confirm that BrownieCorrector improves the quality of hybrid genome assembly as well. BrownieCorrector is written in standard C++11 and released under GPL license. BrownieCorrector relies on multithreading to take advantage of multi-core/multi-CPU systems. The source code is available at https://github.com/biointec/browniecorrector. Electronic supplementary material The online version of this article (10.1186/s12859-019-2906-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mahdi Heydari
- Department of Information Technology, Ghent University-imec, IDLab, Ghent, B-9052, Belgium.,Bioinformatics Institute Ghent, Ghent, B-9052, Belgium
| | - Giles Miclotte
- Department of Information Technology, Ghent University-imec, IDLab, Ghent, B-9052, Belgium.,Bioinformatics Institute Ghent, Ghent, B-9052, Belgium
| | - Yves Van de Peer
- Bioinformatics Institute Ghent, Ghent, B-9052, Belgium.,Center for Plant Systems Biology, VIB, Ghent, B-9052, Belgium.,Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, B-9052, Belgium.,Department of Genetics, Genome Research Institute, University of Pretoria, Pretoria, South Africa
| | - Jan Fostier
- Department of Information Technology, Ghent University-imec, IDLab, Ghent, B-9052, Belgium. .,Bioinformatics Institute Ghent, Ghent, B-9052, Belgium.
| |
Collapse
|
77
|
Casey JM, Meyer CP, Morat F, Brandl SJ, Planes S, Parravicini V. Reconstructing hyperdiverse food webs: Gut content metabarcoding as a tool to disentangle trophic interactions on coral reefs. Methods Ecol Evol 2019. [DOI: 10.1111/2041-210x.13206] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Jordan M. Casey
- PSL Université Paris: EPHE‐UPVD‐CNRS, USR 3278 CRIOBE Université de Perpignan Perpignan France
- Laboratoire d'Excellence “CORAIL” Perpignan France
- Department of Invertebrate Zoology National Museum of Natural History, Smithsonian Institution Washington District of Columbia USA
| | - Christopher P. Meyer
- Department of Invertebrate Zoology National Museum of Natural History, Smithsonian Institution Washington District of Columbia USA
| | - Fabien Morat
- PSL Université Paris: EPHE‐UPVD‐CNRS, USR 3278 CRIOBE Université de Perpignan Perpignan France
- Laboratoire d'Excellence “CORAIL” Perpignan France
| | - Simon J. Brandl
- Department of Biological Sciences Simon Fraser University Burnaby BC Canada
| | - Serge Planes
- PSL Université Paris: EPHE‐UPVD‐CNRS, USR 3278 CRIOBE Université de Perpignan Perpignan France
- Laboratoire d'Excellence “CORAIL” Perpignan France
| | - Valeriano Parravicini
- PSL Université Paris: EPHE‐UPVD‐CNRS, USR 3278 CRIOBE Université de Perpignan Perpignan France
- Laboratoire d'Excellence “CORAIL” Perpignan France
| |
Collapse
|
78
|
Roux S, Trubl G, Goudeau D, Nath N, Couradeau E, Ahlgren NA, Zhan Y, Marsan D, Chen F, Fuhrman JA, Northen TR, Sullivan MB, Rich VI, Malmstrom RR, Eloe-Fadrosh EA. Optimizing de novo genome assembly from PCR-amplified metagenomes. PeerJ 2019; 7:e6902. [PMID: 31119088 PMCID: PMC6511391 DOI: 10.7717/peerj.6902] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2018] [Accepted: 04/03/2019] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Metagenomics has transformed our understanding of microbial diversity across ecosystems, with recent advances enabling de novo assembly of genomes from metagenomes. These metagenome-assembled genomes are critical to provide ecological, evolutionary, and metabolic context for all the microbes and viruses yet to be cultivated. Metagenomes can now be generated from nanogram to subnanogram amounts of DNA. However, these libraries require several rounds of PCR amplification before sequencing, and recent data suggest these typically yield smaller and more fragmented assemblies than regular metagenomes. METHODS Here we evaluate de novo assembly methods of 169 PCR-amplified metagenomes, including 25 for which an unamplified counterpart is available, to optimize specific assembly approaches for PCR-amplified libraries. We first evaluated coverage bias by mapping reads from PCR-amplified metagenomes onto reference contigs obtained from unamplified metagenomes of the same samples. Then, we compared different assembly pipelines in terms of assembly size (number of bp in contigs ≥ 10 kb) and error rates to evaluate which are the best suited for PCR-amplified metagenomes. RESULTS Read mapping analyses revealed that the depth of coverage within individual genomes is significantly more uneven in PCR-amplified datasets versus unamplified metagenomes, with regions of high depth of coverage enriched in short inserts. This enrichment scales with the number of PCR cycles performed, and is presumably due to preferential amplification of short inserts. Standard assembly pipelines are confounded by this type of coverage unevenness, so we evaluated other assembly options to mitigate these issues. We found that a pipeline combining read deduplication and an assembly algorithm originally designed to recover genomes from libraries generated after whole genome amplification (single-cell SPAdes) frequently improved assembly of contigs ≥10 kb by 10 to 100-fold for low input metagenomes. CONCLUSIONS PCR-amplified metagenomes have enabled scientists to explore communities traditionally challenging to describe, including some with extremely low biomass or from which DNA is particularly difficult to extract. Here we show that a modified assembly pipeline can lead to an improved de novo genome assembly from PCR-amplified datasets, and enables a better genome recovery from low input metagenomes.
Collapse
Affiliation(s)
- Simon Roux
- Department of Energy Joint Genome Institute, Walnut Creek, CA, United States of America
| | - Gareth Trubl
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
| | - Danielle Goudeau
- Department of Energy Joint Genome Institute, Walnut Creek, CA, United States of America
| | - Nandita Nath
- Department of Energy Joint Genome Institute, Walnut Creek, CA, United States of America
| | - Estelle Couradeau
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, United States of America
| | - Nathan A. Ahlgren
- Department of Biology, Clark University, Worcester, MA, United States of America
| | - Yuanchao Zhan
- Institution of Marine and Environmental Technology, University of Maryland Center for Environmental Science, Cambridge, MD, United States of America
| | - David Marsan
- Institution of Marine and Environmental Technology, University of Maryland Center for Environmental Science, Cambridge, MD, United States of America
| | - Feng Chen
- Institution of Marine and Environmental Technology, University of Maryland Center for Environmental Science, Cambridge, MD, United States of America
| | - Jed A. Fuhrman
- Department of Biological Sciences, University of Southern California, Los Angeles, CA, United States of America
| | - Trent R. Northen
- Department of Energy Joint Genome Institute, Walnut Creek, CA, United States of America
| | - Matthew B. Sullivan
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
- Department of Civil, Environmental and Geodetic Engineering, Ohio State University, Columbus, OH, United States of America
| | - Virginia I. Rich
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
| | - Rex R. Malmstrom
- Department of Energy Joint Genome Institute, Walnut Creek, CA, United States of America
| | | |
Collapse
|
79
|
Liu Y, Zhang LY, Li J. Fast detection of maximal exact matches via fixed sampling of query K-mers and Bloom filtering of index K-mers. Bioinformatics 2019; 35:4560-4567. [DOI: 10.1093/bioinformatics/btz273] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Revised: 03/31/2019] [Accepted: 04/11/2019] [Indexed: 11/13/2022] Open
Abstract
Abstract
Motivation
Detection of maximal exact matches (MEMs) between two long sequences is a fundamental problem in pairwise reference-query genome comparisons. To efficiently compare larger and larger genomes, reducing the number of indexed k-mers as well as the number of query k-mers has been adopted as a mainstream approach which saves the computational resources by avoiding a significant number of unnecessary matches.
Results
Under this framework, we proposed a new method to detect all MEMs from a pair of genomes. The method first performs a fixed sampling of k-mers on the query sequence, and adds these selected k-mers to a Bloom filter. Then all the k-mers of the reference sequence are tested by the Bloom filter. If a k-mer passes the test, it is inserted into a hash table for indexing. Compared with the existing methods, much less number of query k-mers are generated and much less k-mers are inserted into the index to avoid unnecessary matches, leading to an efficient matching process and memory usage savings. Experiments on large genomes demonstrate that our method is at least 1.8 times faster than the best of the existing algorithms. This performance is mainly attributed to the key novelty of our method that the fixed k-mer sampling must be conducted on the query sequence and the index k-mers are filtered from the reference sequence via a Bloom filter.
Availability and implementation
https://github.com/yuansliu/bfMEM
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuansheng Liu
- Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney, NSW 2007, Australia
| | - Leo Yu Zhang
- School of Information Technology, Deakin University, VIC 3216, Australia
| | - Jinyan Li
- Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney, NSW 2007, Australia
| |
Collapse
|
80
|
Tian S, Yan H, Klee EW, Kalmbach M, Slager SL. Comparative analysis of de novo assemblers for variation discovery in personal genomes. Brief Bioinform 2019; 19:893-904. [PMID: 28407084 PMCID: PMC6169673 DOI: 10.1093/bib/bbx037] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2016] [Accepted: 03/08/2017] [Indexed: 12/30/2022] Open
Abstract
Current variant discovery approaches often rely on an initial read mapping to the reference sequence. Their effectiveness is limited by the presence of gaps, potential misassemblies, regions of duplicates with a high-sequence similarity and regions of high-sequence divergence in the reference. Also, mapping-based approaches are less sensitive to large INDELs and complex variations and provide little phase information in personal genomes. A few de novo assemblers have been developed to identify variants through direct variant calling from the assembly graph, micro-assembly and whole-genome assembly, but mainly for whole-genome sequencing (WGS) data. We developed SGVar, a de novo assembly workflow for haplotype-based variant discovery from whole-exome sequencing (WES) data. Using simulated human exome data, we compared SGVar with five variation-aware de novo assemblers and with BWA-MEM together with three haplotype- or local de novo assembly-based callers. SGVar outperforms the other assemblers in sensitivity and tolerance of sequencing errors. We recapitulated the findings on whole-genome and exome data from a Utah residents with Northern and Western European ancestry (CEU) trio, showing that SGVar had high sensitivity both in the highly divergent human leukocyte antigen (HLA) region and in non-HLA regions of chromosome 6. In particular, SGVar is robust to sequencing error, k-mer selection, divergence level and coverage depth. Unlike mapping-based approaches, SGVar is capable of resolving long-range phase and identifying large INDELs from WES, more prominently from WGS. We conclude that SGVar represents an ideal platform for WES-based variant discovery in highly divergent regions and across the whole genome.
Collapse
Affiliation(s)
- Shulan Tian
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Huihuang Yan
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Eric W Klee
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.,Center for Individualized Medicine Bioinformatics Program, Mayo Clinic, USA
| | - Michael Kalmbach
- Division of Information Management and Analytics, Department of Information Technology, Mayo Clinic, USA
| | - Susan L Slager
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
81
|
Limasset A, Flot JF, Peterlongo P. Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs. Bioinformatics 2019; 36:1374-1381. [DOI: 10.1093/bioinformatics/btz102] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Revised: 01/07/2019] [Accepted: 02/18/2019] [Indexed: 12/25/2022] Open
Abstract
Abstract
Motivation
Short-read accuracy is important for downstream analyses such as genome assembly and hybrid long-read correction. Despite much work on short-read correction, present-day correctors either do not scale well on large datasets or consider reads as mere suites of k-mers, without taking into account their full-length sequence information.
Results
We propose a new method to correct short reads using de Bruijn graphs and implement it as a tool called Bcool. As a first step, Bcool constructs a compacted de Bruijn graph from the reads. This graph is filtered on the basis of k-mer abundance then of unitig abundance, thereby removing most sequencing errors. The cleaned graph is then used as a reference on which the reads are mapped to correct them. We show that this approach yields more accurate reads than k-mer-spectrum correctors while being scalable to human-size genomic datasets and beyond.
Availability and implementation
The implementation is open source, available at http://github.com/Malfoy/BCOOL under the Affero GPL license and as a Bioconda package.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Antoine Limasset
- Evolutionary Biology & Ecology, Université Libre de Bruxelles (ULB), Bruxelles, Belgium
| | - Jean-François Flot
- Evolutionary Biology & Ecology, Université Libre de Bruxelles (ULB), Bruxelles, Belgium
- Interuniversity Institute of Bioinformatics in Brussels – (IB) 2, Brussels, Belgium
| | | |
Collapse
|
82
|
Ellison C, Bachtrog D. Contingency in the convergent evolution of a regulatory network: Dosage compensation in Drosophila. PLoS Biol 2019; 17:e3000094. [PMID: 30742611 PMCID: PMC6417741 DOI: 10.1371/journal.pbio.3000094] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Revised: 03/14/2019] [Accepted: 01/18/2019] [Indexed: 11/18/2022] Open
Abstract
The repeatability or predictability of evolution is a central question in evolutionary biology and most often addressed in experimental evolution studies. Here, we infer how genetically heterogeneous natural systems acquire the same molecular changes to address how genomic background affects adaptation in natural populations. In particular, we take advantage of independently formed neo-sex chromosomes in Drosophila species that have evolved dosage compensation by co-opting the dosage-compensation male-specific lethal (MSL) complex to study the mutational paths that have led to the acquisition of hundreds of novel binding sites for the MSL complex in different species. This complex recognizes a conserved 21-bp GA-rich sequence motif that is enriched on the X chromosome, and newly formed X chromosomes recruit the MSL complex by de novo acquisition of this binding motif. We identify recently formed sex chromosomes in the D. melanica and D. robusta species groups by genome sequencing and generate genomic occupancy maps of the MSL complex to infer the location of novel binding sites. We find that diverse mutational paths were utilized in each species to evolve hundreds of de novo binding motifs along the neo-X, including expansions of microsatellites and transposable element (TE) insertions. However, the propensity to utilize a particular mutational path differs between independently formed X chromosomes and appears to be contingent on genomic properties of that species, such as simple repeat or TE density. This establishes the "genomic environment" as an important determinant in predicting the outcome of evolutionary adaptations.
Collapse
Affiliation(s)
- Christopher Ellison
- Department of Integrative Biology, University of California Berkeley, Berkeley, California, United States of America
| | - Doris Bachtrog
- Department of Integrative Biology, University of California Berkeley, Berkeley, California, United States of America
- * E-mail:
| |
Collapse
|
83
|
Abstract
BACKGROUND Single-cell sequencing experiments use short DNA barcode 'tags' to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other computations. However, this step can be difficult due to high rates of mismatch and deletion errors that can afflict barcodes. RESULTS Here we present an approach to identify and error-correct barcodes by traversing the de Bruijn graph of circularized barcode k-mers. Our approach is based on the observation that circularizing a barcode sequence can yield error-free k-mers even when the size of k is large relative to the length of the barcode sequence, a regime which is typical single-cell barcoding applications. This allows for assignment of reads to consensus fingerprints constructed from k-mers. CONCLUSION We show that for single-cell RNA-Seq circularization improves the recovery of accurate single-cell transcriptome estimates, especially when there are a high number of errors per read. This approach is robust to the type of error (mismatch, insertion, deletion), as well as to the relative abundances of the cells. Sircel, a software package that implements this approach is described and publically available.
Collapse
Affiliation(s)
- Akshay Tambe
- Division of Biology and Biological Engineering, California Institute of Technology, 116 Kerckhoff Laboratory, Pasadena, CA 91125 USA
| | - Lior Pachter
- Departments of Biology and Computing & Mathematical Sciences, California Institute of Technology, 116 Kerckhoff Laboratory, Pasadena, CA 91125 USA
| |
Collapse
|
84
|
Rane RV, Pearce SL, Li F, Coppin C, Schiffer M, Shirriffs J, Sgrò CM, Griffin PC, Zhang G, Lee SF, Hoffmann AA, Oakeshott JG. Genomic changes associated with adaptation to arid environments in cactophilic Drosophila species. BMC Genomics 2019; 20:52. [PMID: 30651071 PMCID: PMC6335815 DOI: 10.1186/s12864-018-5413-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Accepted: 12/26/2018] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Insights into the genetic capacities of species to adapt to future climate change can be gained by using comparative genomic and transcriptomic data to reconstruct the genetic changes associated with such adaptations in the past. Here we investigate the genetic changes associated with adaptation to arid environments, specifically climatic extremes and new cactus hosts, through such an analysis of five repleta group Drosophila species. RESULTS We find disproportionately high rates of gene gains in internal branches in the species' phylogeny where cactus use and subsequently cactus specialisation and high heat and desiccation tolerance evolved. The terminal branch leading to the most heat and desiccation resistant species, Drosophila aldrichi, also shows disproportionately high rates of both gene gains and positive selection. Several Gene Ontology terms related to metabolism were enriched in gene gain events in lineages where cactus use was evolving, while some regulatory and developmental genes were strongly selected in the Drosophila aldrichi branch. Transcriptomic analysis of flies subjected to sublethal heat shocks showed many more downregulation responses to the stress in a heat sensitive versus heat resistant species, confirming the existence of widespread regulatory as well as structural changes in the species' differing adaptations. Gene Ontology terms related to metabolism were enriched in the differentially expressed genes in the resistant species while terms related to stress response were over-represented in the sensitive one. CONCLUSION Adaptations to new cactus hosts and hot desiccating environments were associated with periods of accelerated evolutionary change in diverse biochemistries. The hundreds of genes involved suggest adaptations of this sort would be difficult to achieve in the timeframes projected for anthropogenic climate change.
Collapse
Affiliation(s)
- Rahul V. Rane
- CSIRO, Clunies Ross St, GPO Box 1700, Acton, ACT 2601 Australia
- Bio21 Institute, School of BioSciences, University of Melbourne, 30 Flemington Road, Parkville, 3010 Australia
| | | | - Fang Li
- China National GeneBank, BGI-Shenzhen, Shenzhen, China
| | - Chris Coppin
- CSIRO, Clunies Ross St, GPO Box 1700, Acton, ACT 2601 Australia
| | - Michele Schiffer
- Bio21 Institute, School of BioSciences, University of Melbourne, 30 Flemington Road, Parkville, 3010 Australia
| | - Jennifer Shirriffs
- Bio21 Institute, School of BioSciences, University of Melbourne, 30 Flemington Road, Parkville, 3010 Australia
| | - Carla M. Sgrò
- School of Biological Sciences, Monash University, Melbourne, 3800 Australia
| | - Philippa C. Griffin
- Bio21 Institute, School of BioSciences, University of Melbourne, 30 Flemington Road, Parkville, 3010 Australia
| | - Goujie Zhang
- China National GeneBank, BGI-Shenzhen, Shenzhen, China
- Centre for Social Evolution, Department of Biology, University of Copenhagen, Universitetsparken 15, København, Denmark
| | - Siu F. Lee
- CSIRO, Clunies Ross St, GPO Box 1700, Acton, ACT 2601 Australia
- Bio21 Institute, School of BioSciences, University of Melbourne, 30 Flemington Road, Parkville, 3010 Australia
| | - Ary A. Hoffmann
- Bio21 Institute, School of BioSciences, University of Melbourne, 30 Flemington Road, Parkville, 3010 Australia
| | | |
Collapse
|
85
|
Zhao L, Xie J, Bai L, Chen W, Wang M, Zhang Z, Wang Y, Zhao Z, Li J. Mining statistically-solid k-mers for accurate NGS error correction. BMC Genomics 2018; 19:912. [PMID: 30598110 PMCID: PMC6311904 DOI: 10.1186/s12864-018-5272-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND NGS data contains many machine-induced errors. The most advanced methods for the error correction heavily depend on the selection of solid k-mers. A solid k-mer is a k-mer frequently occurring in NGS reads. The other k-mers are called weak k-mers. A solid k-mer does not likely contain errors, while a weak k-mer most likely contains errors. An intensively investigated problem is to find a good frequency cutoff f0 to balance the numbers of solid and weak k-mers. Once the cutoff is determined, a more challenging but less-studied problem is to: (i) remove a small subset of solid k-mers that are likely to contain errors, and (ii) add a small subset of weak k-mers, that are likely to contain no errors, into the remaining set of solid k-mers. Identification of these two subsets of k-mers can improve the correction performance. RESULTS We propose to use a Gamma distribution to model the frequencies of erroneous k-mers and a mixture of Gaussian distributions to model correct k-mers, and combine them to determine f0. To identify the two special subsets of k-mers, we use the z-score of k-mers which measures the number of standard deviations a k-mer's frequency is from the mean. Then these statistically-solid k-mers are used to construct a Bloom filter for error correction. Our method is markedly superior to the state-of-art methods, tested on both real and synthetic NGS data sets. CONCLUSION The z-score is adequate to distinguish solid k-mers from weak k-mers, particularly useful for pinpointing out solid k-mers having very low frequency. Applying z-score on k-mer can markedly improve the error correction accuracy.
Collapse
Affiliation(s)
- Liang Zhao
- Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine, Shiyan, China
- School of Computing and Electronic Information, Guangxi University, Nanning, China
| | - Jin Xie
- Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine, Shiyan, China
| | - Lin Bai
- School of Computing and Electronic Information, Guangxi University, Nanning, China
| | - Wen Chen
- Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine, Shiyan, China
| | - Mingju Wang
- Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine, Shiyan, China
| | - Zhonglei Zhang
- Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine, Shiyan, China
| | - Yiqi Wang
- Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine, Shiyan, China
| | - Zhe Zhao
- School of Computing and Electronic Information, Guangxi University, Nanning, China
| | - Jinyan Li
- Advanced Analytics Institute, Faculty of Engineering & IT, University of Technology Sydney, NSW 2007, Australia
| |
Collapse
|
86
|
Bernards MA, Schorno S, McKenzie E, Winegard TM, Oke I, Plachetzki D, Fudge DS. Unraveling inter-species differences in hagfish slime skein deployment. J Exp Biol 2018; 221:221/24/jeb176925. [DOI: 10.1242/jeb.176925] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2018] [Accepted: 10/08/2018] [Indexed: 01/11/2023]
Abstract
ABSTRACT
Hagfishes defend themselves from fish predators by producing defensive slime consisting of mucous and thread components that interact synergistically with seawater to pose a suffocation risk to their attackers. Deployment of the slime occurs in a fraction of a second and involves hydration of mucous vesicles as well as unraveling of the coiled threads to their full length of ∼150 mm. Previous work showed that unraveling of coiled threads (or ‘skeins’) in Atlantic hagfish requires vigorous mixing with seawater as well as the presence of mucus, whereas skeins from Pacific hagfish tend to unravel spontaneously in seawater. Here, we explored the mechanisms that underlie these different unraveling modes, and focused on the molecules that make up the skein glue, a material that must be disrupted for unraveling to proceed. We found that Atlantic hagfish skeins are also held together with a protein glue, but compared with Pacific hagfish glue, it is less soluble in seawater. Using SDS-PAGE, we identified several soluble proteins and glycoproteins that are liberated from skeins under conditions that drive unraveling in vitro. Peptides generated by mass spectrometry of five of these proteins and glycoproteins mapped strongly to 14 sequences assembled from Pacific hagfish slime gland transcriptomes, with all but one of these sequences possessing homologs in the Atlantic hagfish. Two of these sequences encode unusual acidic proteins that we propose are the structural glycoproteins that make up the skein glue. These sequences have no known homologs in other species and are likely to be unique to hagfishes. Although the ecological significance of the two modes of skein unraveling described here are unknown, they may reflect differences in predation pressure, with selection for faster skein unraveling in the Eptatretus lineage leading to the evolution of a glue that is more soluble.
Collapse
Affiliation(s)
- Mark A. Bernards
- Department of Integrative Biology, University of Guelph, Guelph, Ontario, Canada, N1G 2W1
| | - Sarah Schorno
- Department of Integrative Biology, University of Guelph, Guelph, Ontario, Canada, N1G 2W1
| | - Evan McKenzie
- Department of Integrative Biology, University of Guelph, Guelph, Ontario, Canada, N1G 2W1
| | - Timothy M. Winegard
- Department of Integrative Biology, University of Guelph, Guelph, Ontario, Canada, N1G 2W1
| | - Isdin Oke
- Department of Integrative Biology, University of Guelph, Guelph, Ontario, Canada, N1G 2W1
| | - David Plachetzki
- Department of Molecular, Cellular, & Biomedical Sciences, University of New Hampshire, Durham, NH 03824, USA
| | - Douglas S. Fudge
- Department of Integrative Biology, University of Guelph, Guelph, Ontario, Canada, N1G 2W1
- Schmid College of Science and Technology, Chapman University, Orange, CA 92866, USA
| |
Collapse
|
87
|
Schulz F, Alteio L, Goudeau D, Ryan EM, Yu FB, Malmstrom RR, Blanchard J, Woyke T. Hidden diversity of soil giant viruses. Nat Commun 2018; 9:4881. [PMID: 30451857 PMCID: PMC6243002 DOI: 10.1038/s41467-018-07335-2] [Citation(s) in RCA: 76] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Accepted: 10/30/2018] [Indexed: 01/23/2023] Open
Abstract
Known giant virus diversity is currently skewed towards viruses isolated from aquatic environments and cultivated in the laboratory. Here, we employ cultivation-independent metagenomics and mini-metagenomics on soils from the Harvard Forest, leading to the discovery of 16 novel giant viruses, chiefly recovered by mini-metagenomics. The candidate viruses greatly expand phylogenetic diversity of known giant viruses and either represented novel lineages or are affiliated with klosneuviruses, Cafeteria roenbergensis virus or tupanviruses. One assembled genome with a size of 2.4 Mb represents the largest currently known viral genome in the Mimiviridae, and others encode up to 80% orphan genes. In addition, we find more than 240 major capsid proteins encoded on unbinned metagenome fragments, further indicating that giant viruses are underexplored in soil ecosystems. The fact that most of these novel viruses evaded detection in bulk metagenomes suggests that mini-metagenomics could be a valuable approach to unearth viral giants.
Collapse
Affiliation(s)
- Frederik Schulz
- U.S. Department of Energy, Joint Genome Institute, Walnut Creek, CA, USA.
| | - Lauren Alteio
- Department of Biology, University of Massachusetts, Amherst, MA, USA
| | - Danielle Goudeau
- U.S. Department of Energy, Joint Genome Institute, Walnut Creek, CA, USA
| | - Elizabeth M Ryan
- U.S. Department of Energy, Joint Genome Institute, Walnut Creek, CA, USA
| | - Feiqiao B Yu
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Rex R Malmstrom
- U.S. Department of Energy, Joint Genome Institute, Walnut Creek, CA, USA
| | - Jeffrey Blanchard
- Department of Biology, University of Massachusetts, Amherst, MA, USA.
| | - Tanja Woyke
- U.S. Department of Energy, Joint Genome Institute, Walnut Creek, CA, USA.
| |
Collapse
|
88
|
Mukherjee K, Washimkar D, Muggli MD, Salmela L, Boucher C. Error correcting optical mapping data. Gigascience 2018; 7:5005021. [PMID: 29846578 PMCID: PMC6007263 DOI: 10.1093/gigascience/giy061] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2017] [Accepted: 05/16/2018] [Indexed: 12/31/2022] Open
Abstract
Optical mapping is a unique system that is capable of producing high-resolution, high-throughput genomic map data that gives information about the structure of a genome . Recently it has been used for scaffolding contigs and for assembly validation for large-scale sequencing projects, including the maize, goat, and Amborella genomes. However, a major impediment in the use of this data is the variety and quantity of errors in the raw optical mapping data, which are called Rmaps. The challenges associated with using Rmap data are analogous to dealing with insertions and deletions in the alignment of long reads. Moreover, they are arguably harder to tackle since the data are numerical and susceptible to inaccuracy. We develop cOMet to error correct Rmap data, which to the best of our knowledge is the only optical mapping error correction method. Our experimental results demonstrate that cOMet has high prevision and corrects 82.49% of insertion errors and 77.38% of deletion errors in Rmap data generated from the Escherichia coli K-12 reference genome. Out of the deletion errors corrected, 98.26% are true errors. Similarly, out of the insertion errors corrected, 82.19% are true errors. It also successfully scales to large genomes, improving the quality of 78% and 99% of the Rmaps in the plum and goat genomes, respectively. Last, we show the utility of error correction by demonstrating how it improves the assembly of Rmap data. Error corrected Rmap data results in an assembly that is more contiguous and covers a larger fraction of the genome.
Collapse
Affiliation(s)
- Kingshuk Mukherjee
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville
| | - Darshan Washimkar
- Department of Computer Science, Colorado State University, Fort Collins
| | - Martin D Muggli
- Department of Computer Science, Colorado State University, Fort Collins
| | - Leena Salmela
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville
| |
Collapse
|
89
|
Grau JH, Hackl T, Koepfli KP, Hofreiter M. Improving draft genome contiguity with reference-derived in silico mate-pair libraries. Gigascience 2018; 7:4980916. [PMID: 29688527 PMCID: PMC5967465 DOI: 10.1093/gigascience/giy029] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 03/20/2018] [Indexed: 11/29/2022] Open
Abstract
Background Contiguous genome assemblies are a highly valued biological resource because of the higher number of completely annotated genes and genomic elements that are usable compared to fragmented draft genomes. Nonetheless, contiguity is difficult to obtain if only low coverage data and/or only distantly related reference genome assemblies are available. Findings In order to improve genome contiguity, we have developed Cross-Species Scaffolding—a new pipeline that imports long-range distance information directly into the de novo assembly process by constructing mate-pair libraries in silico. Conclusions We show how genome assembly metrics and gene prediction dramatically improve with our pipeline by assembling two primate genomes solely based on ∼30x coverage of shotgun sequencing data.
Collapse
Affiliation(s)
- José Horacio Grau
- Museum für Naturkunde Berlin, Leibniz-Institut für Evolutions- und Biodiversitätsforschung an der Humboldt-Universität zu Berlin. Invalidenstraße 43, 10115. Berlin, Germany
| | - Thomas Hackl
- Massachusetts Institute of Technology, Department of Civil and Environmental Engineering, 15 Vassar Street, Cambridge, MA, 02139, USA
| | - Klaus-Peter Koepfli
- Smithsonian Conservation Biology Institute, National Zoological Park, 3001 Connecticut Avenue NW, Washington, D.C. 20008, USA.,Theodosius Dobzhansky Center for Genome Bioinformatics, St. Petersburg State University, Sredniy Prospekt 41A, St. Petersburg, 199004, Russia
| | - Michael Hofreiter
- Faculty of Mathematics and Life Sciences, Institute of Biochemistry and Biology, Unit of General Zoology-Evolutionary Adaptive Genomics, University of Potsdam, Karl-Liebknecht-Straße 24-25, 14476 Potsdam, Germany
| |
Collapse
|
90
|
Yeo S, Coombe L, Warren RL, Chu J, Birol I. ARCS: scaffolding genome drafts with linked reads. Bioinformatics 2018; 34:725-731. [PMID: 29069293 PMCID: PMC6030987 DOI: 10.1093/bioinformatics/btx675] [Citation(s) in RCA: 116] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2017] [Accepted: 10/20/2017] [Indexed: 01/12/2023] Open
Abstract
Motivation Sequencing of human genomes is now routine, and assembly of shotgun reads is increasingly feasible. However, assemblies often fail to inform about chromosome-scale structure due to a lack of linkage information over long stretches of DNA—a shortcoming that is being addressed by new sequencing protocols, such as the GemCode and Chromium linked reads from 10 × Genomics. Results Here, we present ARCS, an application that utilizes the barcoding information contained in linked reads to further organize draft genomes into highly contiguous assemblies. We show how the contiguity of an ABySS H.sapiens genome assembly can be increased over six-fold, using moderate coverage (25-fold) Chromium data. We expect ARCS to have broad utility in harnessing the barcoding information contained in linked read data for connecting high-quality sequences in genome assembly drafts. Availability and implementation https://github.com/bcgsc/ARCS/ Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
91
|
Turner I, Garimella KV, Iqbal Z, McVean G. Integrating long-range connectivity information into de Bruijn graphs. Bioinformatics 2018; 34:2556-2565. [PMID: 29554215 PMCID: PMC6061703 DOI: 10.1093/bioinformatics/bty157] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Revised: 11/25/2017] [Accepted: 03/14/2018] [Indexed: 12/27/2022] Open
Abstract
Motivation The de Bruijn graph is a simple and efficient data structure that is used in many areas of sequence analysis including genome assembly, read error correction and variant calling. The data structure has a single parameter k, is straightforward to implement and is tractable for large genomes with high sequencing depth. It also enables representation of multiple samples simultaneously to facilitate comparison. However, unlike the string graph, a de Bruijn graph does not retain long range information that is inherent in the read data. For this reason, applications that rely on de Bruijn graphs can produce sub-optimal results given their input data. Results We present a novel assembly graph data structure: the Linked de Bruijn Graph (LdBG). Constructed by adding annotations on top of a de Bruijn graph, it stores long range connectivity information through the graph. We show that with error-free data it is possible to losslessly store and recover sequence from a Linked de Bruijn graph. With assembly simulations we demonstrate that the LdBG data structure outperforms both our de Bruijn graph and the String Graph Assembler (SGA). Finally we apply the LdBG to Klebsiella pneumoniae short read data to make large (12 kbp) variant calls, which we validate using PacBio sequencing data, and to characterize the genomic context of drug-resistance genes. Availability and implementation Linked de Bruijn Graphs and associated algorithms are implemented as part of McCortex, which is available under the MIT license at https://github.com/mcveanlab/mccortex. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Isaac Turner
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Kiran V Garimella
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Zamin Iqbal
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - Gil McVean
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| |
Collapse
|
92
|
Garg S, Rautiainen M, Novak AM, Garrison E, Durbin R, Marschall T. A graph-based approach to diploid genome assembly. Bioinformatics 2018; 34:i105-i114. [PMID: 29949989 PMCID: PMC6022571 DOI: 10.1093/bioinformatics/bty279] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Motivation Constructing high-quality haplotype-resolved de novo assemblies of diploid genomes is important for revealing the full extent of structural variation and its role in health and disease. Current assembly approaches often collapse the two sequences into one haploid consensus sequence and, therefore, fail to capture the diploid nature of the organism under study. Thus, building an assembler capable of producing accurate and complete diploid assemblies, while being resource-efficient with respect to sequencing costs, is a key challenge to be addressed by the bioinformatics community. Results We present a novel graph-based approach to diploid assembly, which combines accurate Illumina data and long-read Pacific Biosciences (PacBio) data. We demonstrate the effectiveness of our method on a pseudo-diploid yeast genome and show that we require as little as 50× coverage Illumina data and 10× PacBio data to generate accurate and complete assemblies. Additionally, we show that our approach has the ability to detect and phase structural variants. Availability and implementation https://github.com/whatshap/whatshap. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shilpa Garg
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, Saarbrücken, Germany
- Department of Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, Saarbrücken, Germany
- Saarbrücken Graduate School of Computer Science, Saarland University, Saarbrücken, Germany
| | - Mikko Rautiainen
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, Saarbrücken, Germany
- Department of Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, Saarbrücken, Germany
- Saarbrücken Graduate School of Computer Science, Saarland University, Saarbrücken, Germany
| | - Adam M Novak
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Erik Garrison
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Richard Durbin
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Tobias Marschall
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, Saarbrücken, Germany
- Department of Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, Saarbrücken, Germany
| |
Collapse
|
93
|
Cross-shelf investigation of coral reef cryptic benthic organisms reveals diversity patterns of the hidden majority. Sci Rep 2018; 8:8090. [PMID: 29795402 PMCID: PMC5967342 DOI: 10.1038/s41598-018-26332-5] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2017] [Accepted: 05/09/2018] [Indexed: 11/08/2022] Open
Abstract
Coral reefs harbor diverse assemblages of organisms yet the majority of this diversity is hidden within the three dimensional structure of the reef and neglected using standard visual surveys. This study uses Autonomous Reef Monitoring Structures (ARMS) and amplicon sequencing methodologies, targeting mitochondrial cytochrome oxidase I and 18S rRNA genes, to investigate changes in the cryptic reef biodiversity. ARMS, deployed at 11 sites across a near- to off-shore gradient in the Red Sea were dominated by Porifera (sessile fraction), Arthropoda and Annelida (mobile fractions). The two primer sets detected different taxa lists, but patterns in community composition and structure were similar. While the microhabitat of the ARMS deployment affected the community structure, a clear cross-shelf gradient was observed for all fractions investigated. The partitioning of beta-diversity revealed that replacement (i.e. the substitution of species) made the highest contribution with richness playing a smaller role. Hence, different reef habitats across the shelf are relevant to regional diversity, as they harbor different communities, a result with clear implications for the design of Marine Protected Areas. ARMS can be vital tools to assess biodiversity patterns in the generally neglected but species-rich cryptic benthos, providing invaluable information for the management and conservation of hard-bottomed habitats over local and global scales.
Collapse
|
94
|
Wala JA, Bandopadhayay P, Greenwald NF, O'Rourke R, Sharpe T, Stewart C, Schumacher S, Li Y, Weischenfeldt J, Yao X, Nusbaum C, Campbell P, Getz G, Meyerson M, Zhang CZ, Imielinski M, Beroukhim R. SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res 2018. [PMID: 29535149 PMCID: PMC5880247 DOI: 10.1101/gr.221028.117] [Citation(s) in RCA: 274] [Impact Index Per Article: 39.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Structural variants (SVs), including small insertion and deletion variants (indels), are challenging to detect through standard alignment-based variant calling methods. Sequence assembly offers a powerful approach to identifying SVs, but is difficult to apply at scale genome-wide for SV detection due to its computational complexity and the difficulty of extracting SVs from assembly contigs. We describe SvABA, an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements. We evaluated SvABA's performance on the NA12878 human genome and in simulated and real cancer genomes. SvABA demonstrates superior sensitivity and specificity across a large spectrum of SVs and substantially improves detection performance for variants in the 20–300 bp range, compared with existing methods. SvABA also identifies complex somatic rearrangements with chains of short (<1000 bp) templated-sequence insertions copied from distant genomic regions. We applied SvABA to 344 cancer genomes from 11 cancer types and found that short templated-sequence insertions occur in ∼4% of all somatic rearrangements. Finally, we demonstrate that SvABA can identify sites of viral integration and cancer driver alterations containing medium-sized (50–300 bp) SVs.
Collapse
Affiliation(s)
- Jeremiah A Wala
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA.,Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA.,Bioinformatics and Integrative Genomics, Harvard University, Cambridge, Massachusetts 02138, USA.,Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Pratiti Bandopadhayay
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA.,Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA
| | - Noah F Greenwald
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA.,Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA
| | - Ryan O'Rourke
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA.,Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA
| | - Ted Sharpe
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA
| | - Chip Stewart
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA
| | - Steve Schumacher
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA.,Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA
| | - Yilong Li
- Seven Bridges Genomics, Cambridge, Massachusetts 02142, USA.,Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom
| | - Joachim Weischenfeldt
- The Finsen Laboratory, Rigshospitalet, University of Copenhagen, DK-2200 Copenhagen, Denmark
| | - Xiaotong Yao
- Tri-Institutional PhD Program in Computational Biology and Medicine, New York, New York 10065, USA.,New York Genome Center, New York, New York 10013, USA
| | - Chad Nusbaum
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA
| | - Peter Campbell
- Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom.,Department of Haematology, University of Cambridge, Cambridge CB2 2XY, United Kingdom
| | - Gad Getz
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA.,Bioinformatics and Integrative Genomics, Harvard University, Cambridge, Massachusetts 02138, USA.,Harvard Medical School, Boston, Massachusetts 02115, USA.,Department of Pathology and Cancer Center, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Matthew Meyerson
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA.,Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA.,Bioinformatics and Integrative Genomics, Harvard University, Cambridge, Massachusetts 02138, USA.,Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Cheng-Zhong Zhang
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Marcin Imielinski
- New York Genome Center, New York, New York 10013, USA.,Department of Pathology and Laboratory Medicine, Englander Institute for Precision Medicine, Institute for Computational Biomedicine, and Meyer Cancer Center, Weill Cornell Medicine, New York, New York 10065, USA
| | - Rameen Beroukhim
- The Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA.,Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA.,Bioinformatics and Integrative Genomics, Harvard University, Cambridge, Massachusetts 02138, USA.,Harvard Medical School, Boston, Massachusetts 02115, USA
| |
Collapse
|
95
|
Guo X, Hu Q, Hao G, Wang X, Zhang D, Ma T, Liu J. The genomes of two Eutrema species provide insight into plant adaptation to high altitudes. DNA Res 2018; 25:4831046. [PMID: 29394339 PMCID: PMC6014361 DOI: 10.1093/dnares/dsy003] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Accepted: 01/24/2018] [Indexed: 11/12/2022] Open
Abstract
Eutrema is a genus in the Brassicaceae, which includes species of scientific and economic importance. Many Eutrema species are montane and/or alpine species that arose very recently, making them ideal candidates for comparative studies to understand both ecological speciation and high-altitude adaptation in plants. Here we provide de novo whole-genome assemblies for a pair of recently diverged perennials with contrasting altitude preferences, the high-altitude E. heterophyllum from the eastern Qinghai-Tibet Plateau and its lowland congener E. yunnanense. The two assembled genomes are 350 Mb and 412 Mb, respectively, with 29,606 and 28,881 predicted genes. Comparative analysis of the two species revealed contrasting demographic trajectories and evolution of gene families. Gene family expansions shared between E. heterophyllum and other alpine species were identified, including the disease resistance R genes (NBS-LRRs or NLRs). Genes that are duplicated specifically in the high-altitude E. heterophyllum are involved mainly in reproduction, DNA damage repair and cold tolerance. The two Eutrema genomes reported here constitute important genetic resources for diverse studies, including the evolution of the genus Eutrema, of the Brassicaceae as a whole and of alpine plants across the world.
Collapse
Affiliation(s)
- Xinyi Guo
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, PR China
| | - Quanjun Hu
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, PR China
| | - Guoqian Hao
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, PR China
- Management Committee for Emei Mountain Scenic Area, Biodiversity Institute of Emei Mountain, Leshan 614200, Sichuan, PR China
| | - Xiaojuan Wang
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, PR China
| | - Dan Zhang
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, PR China
| | - Tao Ma
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, PR China
| | - Jianquan Liu
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, PR China
| |
Collapse
|
96
|
Andújar C, Arribas P, Gray C, Bruce C, Woodward G, Yu DW, Vogler AP. Metabarcoding of freshwater invertebrates to detect the effects of a pesticide spill. Mol Ecol 2017; 27:146-166. [DOI: 10.1111/mec.14410] [Citation(s) in RCA: 49] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Revised: 10/09/2017] [Accepted: 10/16/2017] [Indexed: 12/22/2022]
Affiliation(s)
- Carmelo Andújar
- Department of Life Sciences; Natural History Museum; London UK
- Department of Life Sciences; Imperial College London; Ascot UK
- Grupo de Ecología y Evolución en Islas; Instituto de Productos Naturales y Agrobiología (IPNA-CSIC); San Cristóbal de la Laguna Spain
| | - Paula Arribas
- Department of Life Sciences; Natural History Museum; London UK
- Department of Life Sciences; Imperial College London; Ascot UK
- Grupo de Ecología y Evolución en Islas; Instituto de Productos Naturales y Agrobiología (IPNA-CSIC); San Cristóbal de la Laguna Spain
| | - Clare Gray
- Department of Life Sciences; Imperial College London; Ascot UK
| | | | - Guy Woodward
- Department of Life Sciences; Imperial College London; Ascot UK
| | - Douglas W. Yu
- State Key Laboratory of Genetic Resources and Evolution; Kunming Institute of Zoology; Chinese Academy of Sciences; Kunming Yunnan China
- School of Biological Sciences; University of East Anglia; Norwich Norfolk UK
| | - Alfried P. Vogler
- Department of Life Sciences; Natural History Museum; London UK
- Department of Life Sciences; Imperial College London; Ascot UK
| |
Collapse
|
97
|
Wala J, Beroukhim R. SeqLib: a C ++ API for rapid BAM manipulation, sequence alignment and sequence assembly. Bioinformatics 2017; 33:751-753. [PMID: 28011768 DOI: 10.1093/bioinformatics/btw741] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2016] [Accepted: 11/18/2016] [Indexed: 11/14/2022] Open
Abstract
We present SeqLib, a C ++ API and command line tool that provides a rapid and user-friendly interface to BAM/SAM/CRAM files, global sequence alignment operations and sequence assembly. Four C libraries perform core operations in SeqLib: HTSlib for BAM access, BWA-MEM and BLAT for sequence alignment and Fermi for error correction and sequence assembly. Benchmarking indicates that SeqLib has lower CPU and memory requirements than leading C ++ sequence analysis APIs. We demonstrate an example of how minimal SeqLib code can extract, error-correct and assemble reads from a CRAM file and then align with BWA-MEM. SeqLib also provides additional capabilities, including chromosome-aware interval queries and read plotting. Command line tools are available for performing integrated error correction, micro-assemblies and alignment. Availability and Implementation SeqLib is available on Linux and OSX for the C ++98 standard and later at github.com/walaj/SeqLib. SeqLib is released under the Apache2 license. Additional capabilities for BLAT alignment are available under the BLAT license. Contact jwala@broadinstitue.org ; rameen@broadinstitute.org.
Collapse
Affiliation(s)
- Jeremiah Wala
- The Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.,Bioinformatics and Integrative Genomics, Harvard University, Cambridge, MA 02138, USA.,Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02115, USA
| | - Rameen Beroukhim
- The Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.,Bioinformatics and Integrative Genomics, Harvard University, Cambridge, MA 02138, USA.,Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02115, USA
| |
Collapse
|
98
|
Schuelke TA, Wu G, Westbrook A, Woeste K, Plachetzki DC, Broders K, MacManes MD. Comparative Genomics of Pathogenic and Nonpathogenic Beetle-Vectored Fungi in the Genus Geosmithia. Genome Biol Evol 2017; 9:3312-3327. [PMID: 29186370 PMCID: PMC5737690 DOI: 10.1093/gbe/evx242] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/22/2017] [Indexed: 12/29/2022] Open
Abstract
Geosmithia morbida is an emerging fungal pathogen which serves as a model for examining the evolutionary processes behind pathogenicity because it is one of two known pathogens within a genus of mostly saprophytic, beetle-associated, fungi. This pathogen causes thousand cankers disease in black walnut trees and is vectored into the host via the walnut twig beetle. Geosmithia morbida was first detected in western United States and currently threatens the timber industry concentrated in eastern United States. We sequenced the genomes of G. morbida in a previous study and two nonpathogenic Geosmithia species in this work and compared these species to other fungal pathogens and nonpathogens to identify genes under positive selection in G. morbida that may be associated with pathogenicity. Geosmithia morbida possesses one of the smallest genomes among the fungal species observed in this study, and one of the smallest fungal pathogen genomes to date. The enzymatic profile in this pathogen is very similar to its nonpathogenic relatives. Our findings indicate that genome reduction or retention of a smaller genome may be an important adaptative force during the evolution of a specialized lifestyle in fungal species that occupy a specificniche, such as beetle vectored tree pathogens. We also present potential genes under selection in G. morbida that could be important for adaptation to a pathogenic lifestyle.
Collapse
Affiliation(s)
- Taruna A Schuelke
- Department of Molecular, Cellular, & Biomedical Sciences, University of New Hampshire
| | - Guangxi Wu
- Department of Bioagricultural Sciences and Pest Management, Colorado State University
| | | | - Keith Woeste
- USDA Forest Service Hardwood Tree Improvement and Regeneration Center, Department of Forestry and Natural Resources, Purdue University
| | - David C Plachetzki
- Department of Molecular, Cellular, & Biomedical Sciences, University of New Hampshire
| | - Kirk Broders
- Department of Bioagricultural Sciences and Pest Management, Colorado State University
| | - Matthew D MacManes
- Department of Molecular, Cellular, & Biomedical Sciences, University of New Hampshire
| |
Collapse
|
99
|
Becraft ED, Woyke T, Jarett J, Ivanova N, Godoy-Vitorino F, Poulton N, Brown JM, Brown J, Lau MCY, Onstott T, Eisen JA, Moser D, Stepanauskas R. Rokubacteria: Genomic Giants among the Uncultured Bacterial Phyla. Front Microbiol 2017; 8:2264. [PMID: 29234309 PMCID: PMC5712423 DOI: 10.3389/fmicb.2017.02264] [Citation(s) in RCA: 69] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2017] [Accepted: 11/02/2017] [Indexed: 01/08/2023] Open
Abstract
Recent advances in single-cell genomic and metagenomic techniques have facilitated the discovery of numerous previously unknown, deep branches of the tree of life that lack cultured representatives. Many of these candidate phyla are composed of microorganisms with minimalistic, streamlined genomes lacking some core metabolic pathways, which may contribute to their resistance to growth in pure culture. Here we analyzed single-cell genomes and metagenome bins to show that the "Candidate phylum Rokubacteria," formerly known as SPAM, represents an interesting exception, by having large genomes (6-8 Mbps), high GC content (66-71%), and the potential for a versatile, mixotrophic metabolism. We also observed an unusually high genomic heterogeneity among individual Rokubacteria cells in the studied samples. These features may have contributed to the limited recovery of sequences of this candidate phylum in prior cultivation and metagenomic studies. Our analyses suggest that Rokubacteria are distributed globally in diverse terrestrial ecosystems, including soils, the rhizosphere, volcanic mud, oil wells, aquifers, and the deep subsurface, with no reports from marine environments to date.
Collapse
Affiliation(s)
- Eric D Becraft
- Bigelow Laboratory for Ocean Sciences, East Boothbay, ME, United States
| | - Tanja Woyke
- Joint Genome Institute, Walnut Creek, CA, United States
| | | | | | - Filipa Godoy-Vitorino
- Department of Natural Sciences, Inter American University of Puerto Rico, San Juan, Puerto Rico
| | - Nicole Poulton
- Bigelow Laboratory for Ocean Sciences, East Boothbay, ME, United States
| | - Julia M Brown
- Bigelow Laboratory for Ocean Sciences, East Boothbay, ME, United States
| | - Joseph Brown
- Bigelow Laboratory for Ocean Sciences, East Boothbay, ME, United States
| | - M C Y Lau
- Department of Geosciences, Princeton University, Princeton, NJ, United States
| | - Tullis Onstott
- Department of Geosciences, Princeton University, Princeton, NJ, United States
| | - Jonathan A Eisen
- College of Biological Sciences, Genome Center, University of California, Davis, Davis, CA, United States
| | - Duane Moser
- Desert Research Institute, Las Vegas, NV, United States
| | | |
Collapse
|
100
|
Anes J, Hurley D, Martins M, Fanning S. Exploring the Genome and Phenotype of Multi-Drug Resistant Klebsiella pneumoniae of Clinical Origin. Front Microbiol 2017; 8:1913. [PMID: 29109700 PMCID: PMC5660112 DOI: 10.3389/fmicb.2017.01913] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2017] [Accepted: 09/20/2017] [Indexed: 11/16/2022] Open
Abstract
Klebsiella pneumoniae is an important nosocomial pathogen with an extraordinary resistant phenotype due to a combination of acquired resistant-elements and efflux mechanisms. In this study a detailed molecular characterization of 11 K. pneumoniae isolates of clinical origin was carried out. Eleven clinical isolates were tested for their susceptibilities, by disk diffusion and broth microdilution and interpreted according to CLSI guidelines. Efflux activity was determined by measuring the extrusion of ethidium bromide and biofilm formation was assessed following static growth in Müeller-Hinton and minimal media M9 broths at two temperatures and time points. Template DNA from all 11 isolates was extracted and sequenced. The study collection was found to be resistant to several (extended-spectrum beta-lactam) ESBL-type compounds along with several (fluoro)quinolones (FQ). Resistance to tetracycline accounted for 55% of the study collection (n = 6) and three of the 11 isolates were resistance to carbapenems. Genotyping identified blaCTX-M-15 (82%), blaSHV-12 (55%), and blaTEM-1B (45%) ESBL encoding genes and FQ resistance was associated the presence of the oqxAB operon, identified in 10 of the 11 isolates and qnrB gene in one isolate. The polymorphisms detected in the quinolone resistance-determining regions (QRDRs) were associated with isolates of the clonal group CG15. Sequence types (ST) identified were representative of previously described clonal groups including CG258 (n = 7), CG15 (n = 3), and CG147 (n = 1). Plasmid replicon type databases were queried indicating the presence of IncFII and IncFIB replicon types in the majority of the isolates (91%), followed by IncFIA (45%), and IncR (45%). Two of the 11 isolates were found positive for yersiniabactin siderophore-encoding genes. No differences in the ability to efflux ethidium bromide were identified. Biofilm formation was stronger when the isolates were grown under stressed conditions at 37°C for a period up to 96 h. These data confirm the fact that well-recognized clonal groups of K. pneumoniae of importance to human health carries a diverse repertoire of antimicrobial resistance determinants, particularly related to critically important drugs in the ESBL and FQ classes. The capacity of most isolates to form strong biofilms, when stressed under laboratory-simulated conditions, supports the risk to human health associated with nosocomial infections deriving from indwelling medical devices.
Collapse
Affiliation(s)
- João Anes
- UCD-Centre for Food Safety, School of Public Health, Physiotherapy and Sports Science, University College Dublin, Dublin, Ireland
| | - Daniel Hurley
- UCD-Centre for Food Safety, School of Public Health, Physiotherapy and Sports Science, University College Dublin, Dublin, Ireland
| | - Marta Martins
- UCD-Centre for Food Safety, School of Public Health, Physiotherapy and Sports Science, University College Dublin, Dublin, Ireland
| | - Séamus Fanning
- UCD-Centre for Food Safety, School of Public Health, Physiotherapy and Sports Science, University College Dublin, Dublin, Ireland.,Institute for Global Food Security, Queen's University Belfast, Belfast, United Kingdom
| |
Collapse
|