1
|
Xie L, Cao B, Wen X, Zheng Y, Wang B, Zhou S, Zheng P. ReLume: Enhancing DNA storage data reconstruction with flow network and graph partitioning. Methods 2025; 240:101-112. [PMID: 40268154 DOI: 10.1016/j.ymeth.2025.03.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2025] [Revised: 03/06/2025] [Accepted: 03/31/2025] [Indexed: 04/25/2025] Open
Abstract
DNA storage is an ideal alternative to silicon-based storage, but focusing on data writing alone cannot address the inevitable errors and durability issues. Therefore, we propose ReLume, a DNA storage data reconstruction method based on flow networks and graph partitioning technology, which can accomplish the data reconstruction task of millions of reads on a laptop with 24 GB RAM. The results show that ReLume copes well with many types of errors, more than doubles sequence recovery rates, and reduces memory usage by about 60 %. ReLume is 10 times more durable than other representative methods, meaning that data can be read without loss after 100 years. Results from the wet lab DNA storage dataset show that ReLume's sequence recovery rates of 73 % and 93.2 %, respectively, significantly outperform existing methods. In summary, ReLume effectively overcomes the accuracy and hardware limitations and provides a feasible idea for the portability of DNA storage.
Collapse
Affiliation(s)
- Lei Xie
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, PR China
| | - Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, PR China
| | - Xiaoru Wen
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, PR China
| | - Yanfen Zheng
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, PR China
| | - Bin Wang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, PR China.
| | - Shihua Zhou
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, PR China.
| | - Pan Zheng
- Department of Accounting and Information Systems, University of Canterbury, 8140 Christchurch, New Zealand
| |
Collapse
|
2
|
Jayakodi M, Lu Q, Pidon H, Rabanus-Wallace MT, Bayer M, Lux T, Guo Y, Jaegle B, Badea A, Bekele W, Brar GS, Braune K, Bunk B, Chalmers KJ, Chapman B, Jørgensen ME, Feng JW, Feser M, Fiebig A, Gundlach H, Guo W, Haberer G, Hansson M, Himmelbach A, Hoffie I, Hoffie RE, Hu H, Isobe S, König P, Kale SM, Kamal N, Keeble-Gagnère G, Keller B, Knauft M, Koppolu R, Krattinger SG, Kumlehn J, Langridge P, Li C, Marone MP, Maurer A, Mayer KFX, Melzer M, Muehlbauer GJ, Murozuka E, Padmarasu S, Perovic D, Pillen K, Pin PA, Pozniak CJ, Ramsay L, Pedas PR, Rutten T, Sakuma S, Sato K, Schüler D, Schmutzer T, Scholz U, Schreiber M, Shirasawa K, Simpson C, Skadhauge B, Spannagl M, Steffenson BJ, Thomsen HC, Tibbits JF, Nielsen MTS, Trautewig C, Vequaud D, Voss C, Wang P, Waugh R, Westcott S, Rasmussen MW, Zhang R, Zhang XQ, Wicker T, Dockter C, Mascher M, Stein N. Structural variation in the pangenome of wild and domesticated barley. Nature 2024; 636:654-662. [PMID: 39537924 PMCID: PMC11655362 DOI: 10.1038/s41586-024-08187-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 10/09/2024] [Indexed: 11/16/2024]
Abstract
Pangenomes are collections of annotated genome sequences of multiple individuals of a species1. The structural variants uncovered by these datasets are a major asset to genetic analysis in crop plants2. Here we report a pangenome of barley comprising long-read sequence assemblies of 76 wild and domesticated genomes and short-read sequence data of 1,315 genotypes. An expanded catalogue of sequence variation in the crop includes structurally complex loci that are rich in gene copy number variation. To demonstrate the utility of the pangenome, we focus on four loci involved in disease resistance, plant architecture, nutrient release and trichome development. Novel allelic variation at a powdery mildew resistance locus and population-specific copy number gains in a regulator of vegetative branching were found. Expansion of a family of starch-cleaving enzymes in elite malting barleys was linked to shifts in enzymatic activity in micro-malting trials. Deletion of an enhancer motif is likely to change the developmental trajectory of the hairy appendages on barley grains. Our findings indicate that allelic diversity at structurally complex loci may have helped crop plants to adapt to new selective regimes in agricultural ecosystems.
Collapse
Affiliation(s)
- Murukarthick Jayakodi
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
- Department of Soil and Crop Sciences, Texas A&M AgriLife Research-Dallas, Dallas, TX, USA
| | - Qiongxian Lu
- Carlsberg Research Laboratory, Copenhagen, Denmark
| | - Hélène Pidon
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
- IPSiM, University of Montpellier, CNRS, INRAE, Institut Agro, Montpellier, France
| | | | | | - Thomas Lux
- PGSB-Plant Genome and Systems Biology, Helmholtz Center Munich-German Research Center for Environmental Health, Neuherberg, Germany
| | - Yu Guo
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Benjamin Jaegle
- Department of Plant and Microbial Biology, University of Zurich, Zurich, Switzerland
| | - Ana Badea
- Brandon Research and Development Centre, Agriculture et Agri-Food Canada, Brandon, Manitoba, Canada
| | - Wubishet Bekele
- Ottawa Research and Development Centre, Agriculture et Agri-Food Canada, Ottawa, Ontario, Canada
| | - Gurcharn S Brar
- Faculty of Land and Food Systems, The University of British Columbia, Vancouver, British Columbia, Canada
- Faculty of Agricultural, Life and Environmental Sciences (ALES), University of Alberta, Edmonton, Alberta, Canada
| | | | - Boyke Bunk
- DSMZ-German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany
| | - Kenneth J Chalmers
- School of Agriculture, Food and Wine, University of Adelaide, Urrbrae, South Australia, Australia
| | - Brett Chapman
- Western Crop Genetics Alliance, Food Futures Institute/School of Agriculture, Murdoch University, Murdoch, Western Australia, Australia
| | | | - Jia-Wu Feng
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Manuel Feser
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Anne Fiebig
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Heidrun Gundlach
- PGSB-Plant Genome and Systems Biology, Helmholtz Center Munich-German Research Center for Environmental Health, Neuherberg, Germany
| | | | - Georg Haberer
- PGSB-Plant Genome and Systems Biology, Helmholtz Center Munich-German Research Center for Environmental Health, Neuherberg, Germany
| | - Mats Hansson
- Department of Biology, Lund University, Lund, Sweden
| | - Axel Himmelbach
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Iris Hoffie
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Robert E Hoffie
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Haifei Hu
- Western Crop Genetics Alliance, Food Futures Institute/School of Agriculture, Murdoch University, Murdoch, Western Australia, Australia
- Rice Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | | | - Patrick König
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Sandip M Kale
- Carlsberg Research Laboratory, Copenhagen, Denmark
- Department of Agroecology, Aarhus University, Slagelse, Denmark
| | - Nadia Kamal
- PGSB-Plant Genome and Systems Biology, Helmholtz Center Munich-German Research Center for Environmental Health, Neuherberg, Germany
| | - Gabriel Keeble-Gagnère
- Agriculture Victoria, Department of Jobs, Precincts and Regions, Agribio, La Trobe University, Bundoora, Victoria, Australia
| | - Beat Keller
- Department of Plant and Microbial Biology, University of Zurich, Zurich, Switzerland
| | - Manuela Knauft
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Ravi Koppolu
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Simon G Krattinger
- Plant Science Program, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Jochen Kumlehn
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Peter Langridge
- School of Agriculture, Food and Wine, University of Adelaide, Urrbrae, South Australia, Australia
| | - Chengdao Li
- Western Crop Genetics Alliance, Food Futures Institute/School of Agriculture, Murdoch University, Murdoch, Western Australia, Australia
- Department of Primary Industry and Regional Development, Government of Western Australia, Perth, Western Australia, Australia
- College of Agriculture, Yangtze University, Jingzhou, China
| | - Marina P Marone
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Andreas Maurer
- Institute of Agricultural and Nutritional Sciences, Martin Luther University Halle-Wittenberg, Halle, Germany
| | - Klaus F X Mayer
- PGSB-Plant Genome and Systems Biology, Helmholtz Center Munich-German Research Center for Environmental Health, Neuherberg, Germany
- School of Life Sciences Weihenstephan, Technical University Munich, Freising, Germany
| | - Michael Melzer
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Gary J Muehlbauer
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, USA
| | | | - Sudharsan Padmarasu
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Dragan Perovic
- Institute for Resistance Research and Stress Tolerance, Julius Kuehn-Institute (JKI), Federal Research Centre for Cultivated Plants, Quedlinburg, Germany
| | - Klaus Pillen
- Institute of Agricultural and Nutritional Sciences, Martin Luther University Halle-Wittenberg, Halle, Germany
| | | | - Curtis J Pozniak
- Department of Plant Sciences and Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | | | | | - Twan Rutten
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Shun Sakuma
- Faculty of Agriculture, Tottori University, Tottori, Japan
| | - Kazuhiro Sato
- Kazusa DNA Research Institute, Kisarazu, Japan
- Institute of Plant Science and Resources, Okayama University, Kurashiki, Japan
| | - Danuta Schüler
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Thomas Schmutzer
- Institute of Agricultural and Nutritional Sciences, Martin Luther University Halle-Wittenberg, Halle, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | | | | | | | | | - Manuel Spannagl
- PGSB-Plant Genome and Systems Biology, Helmholtz Center Munich-German Research Center for Environmental Health, Neuherberg, Germany
| | - Brian J Steffenson
- Department of Plant Pathology, University of Minnesota, St. Paul, MN, USA
| | | | - Josquin F Tibbits
- Agriculture Victoria, Department of Jobs, Precincts and Regions, Agribio, La Trobe University, Bundoora, Victoria, Australia
| | | | - Corinna Trautewig
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | | | - Cynthia Voss
- Carlsberg Research Laboratory, Copenhagen, Denmark
| | - Penghao Wang
- Western Crop Genetics Alliance, Food Futures Institute/School of Agriculture, Murdoch University, Murdoch, Western Australia, Australia
| | - Robbie Waugh
- The James Hutton Institute, Dundee, UK
- School of Life Sciences, University of Dundee, Dundee, UK
| | - Sharon Westcott
- Western Crop Genetics Alliance, Food Futures Institute/School of Agriculture, Murdoch University, Murdoch, Western Australia, Australia
| | | | | | - Xiao-Qi Zhang
- Western Crop Genetics Alliance, Food Futures Institute/School of Agriculture, Murdoch University, Murdoch, Western Australia, Australia
| | - Thomas Wicker
- Department of Plant and Microbial Biology, University of Zurich, Zurich, Switzerland.
| | | | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany.
- Institute of Agricultural and Nutritional Sciences, Martin Luther University Halle-Wittenberg, Halle, Germany.
| |
Collapse
|
3
|
Kang X, Zhang W, Li Y, Luo X, Schönhuth A. HyLight: Strain aware assembly of low coverage metagenomes. Nat Commun 2024; 15:8665. [PMID: 39375348 PMCID: PMC11458758 DOI: 10.1038/s41467-024-52907-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 09/23/2024] [Indexed: 10/09/2024] Open
Abstract
Different strains of identical species can vary substantially in terms of their spectrum of biomedically relevant phenotypes. Reconstructing the genomes of microbial communities at the level of their strains poses significant challenges, because sequencing errors can obscure strain-specific variants. Next-generation sequencing (NGS) reads are too short to resolve complex genomic regions. Third-generation sequencing (TGS) reads, although longer, are prone to higher error rates or substantially more expensive. Limiting TGS coverage to reduce costs compromises the accuracy of the assemblies. This explains why prior approaches agree on losses in strain awareness, accuracy, tendentially excessive costs, or combinations thereof. We introduce HyLight, a metagenome assembly approach that addresses these challenges by implementing the complementary strengths of TGS and NGS data. HyLight employs strain-resolved overlap graphs (OG) to accurately reconstruct individual strains within microbial communities. Our experiments demonstrate that HyLight produces strain-aware and contiguous assemblies at minimal error content, while significantly reducing costs because utilizing low-coverage TGS data. HyLight achieves an average improvement of 19.05% in preserving strain identity and demonstrates near-complete strain awareness across diverse datasets. In summary, HyLight offers considerable advances in metagenome assembly, insofar as it delivers significantly enhanced strain awareness, contiguity, and accuracy without the typical compromises observed in existing approaches.
Collapse
Affiliation(s)
- Xiongbin Kang
- College of Biology, Hunan University, Changsha, China
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Wenhai Zhang
- College of Biology, Hunan University, Changsha, China
| | - Yichen Li
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xiao Luo
- College of Biology, Hunan University, Changsha, China.
| | - Alexander Schönhuth
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany.
| |
Collapse
|
4
|
Zeng Y, He K, Chen X, Bai W, Lin H, Chen J, Nedyalkov N, Yamaguchi N, Vijayan K, Suganthasakthivel R, Kumar B, Han Y, Chen Z, Wang W, Liu Y. Museum specimens shedding light on the evolutionary history and cryptic diversity of the hedgehog family Erinaceidae. Integr Zool 2024. [PMID: 39370584 DOI: 10.1111/1749-4877.12909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/08/2024]
Abstract
The family Erinaceidae encompasses 27 extant species in two subfamilies: Erinaceinae, which includes spiny hedgehogs, and Galericinae, which comprises silky-furred gymnures and moonrats. Although they are commonly recognized by the general public, their phylogenetic history remains incompletely understood, and several species have never been included in any molecular analyses. Additionally, previous research suggested that the species diversity of Erinaceidae might be underestimated. In this study, we sequenced the mitochondrial genomes of 29 individuals representing 18 erinaceid species using 18 freshly collected tissue and 11 historical museum specimens. We also integrated previously published data for a concatenated analysis. We aimed to elucidate the evolutionary relationships within Erinaceidae, estimate divergence times, and uncover potential underestimated species diversity. Our data finely resolved intergeneric and interspecific relationships and presented the first molecular evidence for the phylogenetic position of Mesechinus wangi, Paraechinus micropus, and P. nudiventris. Our results revealed a sister relationship between Neotetracus and Neohylomys gymnures, as well as a sister relationship between Hemiechinus and Mesechinus, supporting previous hypotheses. Additionally, our findings provided a novel phylogenetic position for Paraechinus aethiopicus, placing it in a basal position within the genus. Furthermore, our study uncovered cryptic species diversity within Hylomys suillus as well as in Neotetracus sinensis, Atelerix albiventris, P. aethiopicus, and Hemiechinus auratus, most of which have been previously overlooked.
Collapse
Affiliation(s)
- Ying Zeng
- State Key Laboratory of Biocontrol, School of Ecology, Sun Yat-sen University, Shenzhen, China
| | - Kai He
- Key Laboratory of Conservation and Application in Biodiversity of South China, School of Life Sciences, Guangzhou University, Guangzhou, China
| | - Xing Chen
- School of Zoology, Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Weipeng Bai
- Institute of Nihewan Archaeology, College of History and Culture, Hebei Normal University, Shijiazhuang, China
| | - Hongzhou Lin
- State Key Laboratory of Biocontrol, School of Ecology, Sun Yat-sen University, Shenzhen, China
| | - Jianhai Chen
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
| | - Nedko Nedyalkov
- National Museum of Natural History, Bulgarian Academy of Sciences, Sofia, Bulgaria
| | - Nobuyuki Yamaguchi
- Department of Biological and Environmental Sciences, Faculty of Arts and Sciences, Qatar University, Doha, Qatar
- Institute of Tropical Biodiversity and Sustainable Development, University Malaysia Terengganu, Kuala Nerus, Malaysia
| | - Keerthy Vijayan
- Centre for Plant Biotechnology and Molecular Biology, Kerala Agricultural University, Thrissur, Kerala, India
| | | | - Brawin Kumar
- Indian Institute of Science Education and Research, Tirupati, Andhra Pradesh, India
- Hedgehog Conservation Alliance (HCA), Kanyakumari, Tamil Nadu, India
| | - Yuqing Han
- State Key Laboratory of Biocontrol, School of Ecology, Sun Yat-sen University, Shenzhen, China
| | - Zhongzheng Chen
- Collaborative Innovation Center of Recovery and Reconstruction of Degraded Ecosystem in Wanjiang Basin Co-founded by Anhui Province and Ministry of Education, School of Ecology and Environment, Anhui Normal University, Wuhu, China
- Wildlife Forensic Science Service, Kunming, China
| | - Wenzhi Wang
- Wildlife Forensic Science Service, Kunming, China
- Guizhou Jiandee Laboratories Co., Ltd., Guiyang, China
| | - Yang Liu
- State Key Laboratory of Biocontrol, School of Ecology, Sun Yat-sen University, Shenzhen, China
| |
Collapse
|
5
|
Laperriere SM, Minch B, Weissman JL, Hou S, Yeh YC, Ignacio-Espinoza JC, Ahlgren NA, Moniruzzaman M, Fuhrman JA. Phylogenetic proximity drives temporal succession of marine giant viruses in a five-year metagenomic time-series. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.12.607631. [PMID: 39185240 PMCID: PMC11343133 DOI: 10.1101/2024.08.12.607631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Nucleocytoplasmic Large DNA Viruses (NCLDVs, also called giant viruses) are widespread in marine systems and infect a broad range of microbial eukaryotes (protists). Recent biogeographic work has provided global snapshots of NCLDV diversity and community composition across the world's oceans, yet little information exists about the guiding 'rules' underpinning their community dynamics over time. We leveraged a five-year monthly metagenomic time-series to quantify the community composition of NCLDVs off the coast of Southern California and characterize these populations' temporal dynamics. NCLDVs were dominated by Algavirales (Phycodnaviruses, 59%) and Imitervirales (Mimiviruses, 36%). We identified clusters of NCLDVs with distinct classes of seasonal and non-seasonal temporal dynamics. Overall, NCLDV population abundances were often highly dynamic with a strong seasonal signal. The Imitervirales group had highest relative abundance in the more oligotrophic late summer and fall, while Algavirales did so in winter. Generally, closely related strains had similar temporal dynamics, suggesting that evolutionary history is a key driver of the temporal niche of marine NCLDVs. However, a few closely-related strains had drastically different seasonal dynamics, suggesting that while phylogenetic proximity often indicates ecological similarity, occasionally phenology can shift rapidly, possibly due to host-switching. Finally, we identified distinct functional content and possible host interactions of two major NCLDV orders-including connections of Imitervirales with primary producers like the diatom Chaetoceros and widespread marine grazers like Paraphysomonas and Spirotrichea ciliates. Together, our results reveal key insights on season-specific effect of phylogenetically distinct giant virus communities on marine protist metabolism, biogeochemical fluxes and carbon cycling.
Collapse
Affiliation(s)
- Sarah M. Laperriere
- Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
| | - Benjamin Minch
- Department of Marine Biology and Ecology, Rosenstiel School of Marine, Atmospheric, and Earth Sciences, University of Miami, Miami, FL, USA
| | - JL Weissman
- Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
- Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY, USA
- Institute for Advanced Computational Science, Stony Brook University, Stony Brook, NY, USA
| | - Shengwei Hou
- Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
- Department of Ocean Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| | - Yi-Chun Yeh
- Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
| | | | | | - Mohammad Moniruzzaman
- Department of Marine Biology and Ecology, Rosenstiel School of Marine, Atmospheric, and Earth Sciences, University of Miami, Miami, FL, USA
| | - Jed A. Fuhrman
- Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
| |
Collapse
|
6
|
Sami A, El-Metwally S, Rashad MZ. MAC-ErrorReads: machine learning-assisted classifier for filtering erroneous NGS reads. BMC Bioinformatics 2024; 25:61. [PMID: 38321434 PMCID: PMC10848413 DOI: 10.1186/s12859-024-05681-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 01/29/2024] [Indexed: 02/08/2024] Open
Abstract
BACKGROUND The rapid advancement of next-generation sequencing (NGS) machines in terms of speed and affordability has led to the generation of a massive amount of biological data at the expense of data quality as errors become more prevalent. This introduces the need to utilize different approaches to detect and filtrate errors, and data quality assurance is moved from the hardware space to the software preprocessing stages. RESULTS We introduce MAC-ErrorReads, a novel Machine learning-Assisted Classifier designed for filtering Erroneous NGS Reads. MAC-ErrorReads transforms the erroneous NGS read filtration process into a robust binary classification task, employing five supervised machine learning algorithms. These models are trained on features extracted through the computation of Term Frequency-Inverse Document Frequency (TF_IDF) values from various datasets such as E. coli, GAGE S. aureus, H. Chr14, Arabidopsis thaliana Chr1 and Metriaclima zebra. Notably, Naive Bayes demonstrated robust performance across various datasets, displaying high accuracy, precision, recall, F1-score, MCC, and ROC values. The MAC-ErrorReads NB model accurately classified S. aureus reads, surpassing most error correction tools with a 38.69% alignment rate. For H. Chr14, tools like Lighter, Karect, CARE, Pollux, and MAC-ErrorReads showed rates above 99%. BFC and RECKONER exceeded 98%, while Fiona had 95.78%. For the Arabidopsis thaliana Chr1, Pollux, Karect, RECKONER, and MAC-ErrorReads demonstrated good alignment rates of 92.62%, 91.80%, 91.78%, and 90.87%, respectively. For the Metriaclima zebra, Pollux achieved a high alignment rate of 91.23%, despite having the lowest number of mapped reads. MAC-ErrorReads, Karect, and RECKONER demonstrated good alignment rates of 83.76%, 83.71%, and 83.67%, respectively, while also producing reasonable numbers of mapped reads to the reference genome. CONCLUSIONS This study demonstrates that machine learning approaches for filtering NGS reads effectively identify and retain the most accurate reads, significantly enhancing assembly quality and genomic coverage. The integration of genomics and artificial intelligence through machine learning algorithms holds promise for enhancing NGS data quality, advancing downstream data analysis accuracy, and opening new opportunities in genetics, genomics, and personalized medicine research.
Collapse
Affiliation(s)
- Amira Sami
- Department of Computer Science, Faculty of Computers and Information, Mansoura University, P.O. Box: 35516, Mansoura, Egypt
| | - Sara El-Metwally
- Department of Computer Science, Faculty of Computers and Information, Mansoura University, P.O. Box: 35516, Mansoura, Egypt.
- Biomedical Informatics Department, Faculty of Computer Science and Engineering, New Mansoura University, Gamasa, 35712, Egypt.
| | - M Z Rashad
- Department of Computer Science, Faculty of Computers and Information, Mansoura University, P.O. Box: 35516, Mansoura, Egypt
| |
Collapse
|
7
|
Długosz M, Deorowicz S. Illumina reads correction: evaluation and improvements. Sci Rep 2024; 14:2232. [PMID: 38278837 PMCID: PMC11222498 DOI: 10.1038/s41598-024-52386-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 01/18/2024] [Indexed: 01/28/2024] Open
Abstract
The paper focuses on the correction of Illumina WGS sequencing reads. We provide an extensive evaluation of the existing correctors. To this end, we measure an impact of the correction on variant calling (VC) as well as de novo assembly. It shows, that in selected cases read correction improves the VC results quality. We also examine the algorithms behaviour in a processing of Illumina NovaSeq reads, with different reads quality characteristics than in older sequencers. We show that most of the algorithms are ready to cope with such reads. Finally, we introduce a new version of RECKONER, our read corrector, by optimizing it and equipping with a new correction strategy. Currently, RECKONER allows to correct high-coverage human reads in less than 2.5 h, is able to cope with two types of reads errors: indels and substitutions, and utilizes a new, based on a two lengths of oligomers, correction verification technique.
Collapse
Affiliation(s)
- Maciej Długosz
- Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, 44-100, Gliwice, Poland
| | - Sebastian Deorowicz
- Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, 44-100, Gliwice, Poland.
| |
Collapse
|
8
|
Ellenbogen JB, Borton MA, McGivern BB, Cronin DR, Hoyt DW, Freire-Zapata V, McCalley CK, Varner RK, Crill PM, Wehr RA, Chanton JP, Woodcroft BJ, Tfaily MM, Tyson GW, Rich VI, Wrighton KC. Methylotrophy in the Mire: direct and indirect routes for methane production in thawing permafrost. mSystems 2024; 9:e0069823. [PMID: 38063415 PMCID: PMC10805028 DOI: 10.1128/msystems.00698-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 10/24/2023] [Indexed: 01/24/2024] Open
Abstract
While wetlands are major sources of biogenic methane (CH4), our understanding of resident microbial metabolism is incomplete, which compromises the prediction of CH4 emissions under ongoing climate change. Here, we employed genome-resolved multi-omics to expand our understanding of methanogenesis in the thawing permafrost peatland of Stordalen Mire in Arctic Sweden. In quadrupling the genomic representation of the site's methanogens and examining their encoded metabolism, we revealed that nearly 20% of the metagenome-assembled genomes (MAGs) encoded the potential for methylotrophic methanogenesis. Further, 27% of the transcriptionally active methanogens expressed methylotrophic genes; for Methanosarcinales and Methanobacteriales MAGs, these data indicated the use of methylated oxygen compounds (e.g., methanol), while for Methanomassiliicoccales, they primarily implicated methyl sulfides and methylamines. In addition to methanogenic methylotrophy, >1,700 bacterial MAGs across 19 phyla encoded anaerobic methylotrophic potential, with expression across 12 phyla. Metabolomic analyses revealed the presence of diverse methylated compounds in the Mire, including some known methylotrophic substrates. Active methylotrophy was observed across all stages of a permafrost thaw gradient in Stordalen, with the most frozen non-methanogenic palsa found to host bacterial methylotrophy and the partially thawed bog and fully thawed fen seen to house both methanogenic and bacterial methylotrophic activities. Methanogenesis across increasing permafrost thaw is thus revised from the sole dominance of hydrogenotrophic production and the appearance of acetoclastic at full thaw to consider the co-occurrence of methylotrophy throughout. Collectively, these findings indicate that methanogenic and bacterial methylotrophy may be an important and previously underappreciated component of carbon cycling and emissions in these rapidly changing wetland habitats.IMPORTANCEWetlands are the biggest natural source of atmospheric methane (CH4) emissions, yet we have an incomplete understanding of the suite of microbial metabolism that results in CH4 formation. Specifically, methanogenesis from methylated compounds is excluded from all ecosystem models used to predict wetland contributions to the global CH4 budget. Though recent studies have shown methylotrophic methanogenesis to be active across wetlands, the broad climatic importance of the metabolism remains critically understudied. Further, some methylotrophic bacteria are known to produce methanogenic by-products like acetate, increasing the complexity of the microbial methylotrophic metabolic network. Prior studies of Stordalen Mire have suggested that methylotrophic methanogenesis is irrelevant in situ and have not emphasized the bacterial capacity for metabolism, both of which we countered in this study. The importance of our findings lies in the significant advancement toward unraveling the broader impact of methylotrophs in wetland methanogenesis and, consequently, their contribution to the terrestrial global carbon cycle.
Collapse
Affiliation(s)
- Jared B. Ellenbogen
- Department of Soil and Crop Science, Colorado State University, Fort Collins, Colorado, USA
| | - Mikayla A. Borton
- Department of Soil and Crop Science, Colorado State University, Fort Collins, Colorado, USA
| | - Bridget B. McGivern
- Department of Soil and Crop Science, Colorado State University, Fort Collins, Colorado, USA
| | - Dylan R. Cronin
- Department of Microbiology, The Ohio State University, Columbus, Ohio, USA
| | - David W. Hoyt
- Environmental Molecular Sciences Laboratory, Earth and Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | | | - Carmody K. McCalley
- Gosnell School of Life Sciences, Rochester Institute of Technology, Rochester, New York, USA
| | - Ruth K. Varner
- Department of Earth Sciences and Institute for the Study of Earth, Oceans and Space, University of New Hampshire, Durham, New Hampshire, USA
| | - Patrick M. Crill
- Department of Geological Sciences, Bolin Center for Climate Research, Stockholm University, Stockholm, Sweden
| | - Richard A. Wehr
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona, USA
| | - Jeffrey P. Chanton
- Earth Ocean and Atmospheric Sciences, Florida State University, Tallahassee, Florida, USA
| | - Ben J. Woodcroft
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology (QUT), Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Malak M. Tfaily
- Department of Environmental Science, University of Arizona, Tucson, Arizona, USA
| | - Gene W. Tyson
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology (QUT), Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Virginia I. Rich
- Department of Microbiology, The Ohio State University, Columbus, Ohio, USA
| | - Kelly C. Wrighton
- Department of Soil and Crop Science, Colorado State University, Fort Collins, Colorado, USA
| |
Collapse
|
9
|
Kang X, Xu J, Luo X, Schönhuth A. Hybrid-hybrid correction of errors in long reads with HERO. Genome Biol 2023; 24:275. [PMID: 38041098 PMCID: PMC10690975 DOI: 10.1186/s13059-023-03112-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 11/16/2023] [Indexed: 12/03/2023] Open
Abstract
Although generally superior, hybrid approaches for correcting errors in third-generation sequencing (TGS) reads, using next-generation sequencing (NGS) reads, mistake haplotype-specific variants for errors in polyploid and mixed samples. We suggest HERO, as the first "hybrid-hybrid" approach, to make use of both de Bruijn graphs and overlap graphs for optimal catering to the particular strengths of NGS and TGS reads. Extensive benchmarking experiments demonstrate that HERO improves indel and mismatch error rates by on average 65% (27[Formula: see text]95%) and 20% (4[Formula: see text]61%). Using HERO prior to genome assembly significantly improves the assemblies in the majority of the relevant categories.
Collapse
Affiliation(s)
- Xiongbin Kang
- College of Biology, Hunan University, Changsha, China
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Jialu Xu
- College of Biology, Hunan University, Changsha, China
| | - Xiao Luo
- College of Biology, Hunan University, Changsha, China.
| | - Alexander Schönhuth
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany.
| |
Collapse
|
10
|
Catapano PL, Falcinelli M, Damiani C, Cappelli A, Koukouli D, Rossi P, Ricci I, Napolioni V, Favia G. De novo genome assembly of the invasive mosquito species Aedes japonicus and Aedes koreicus. Parasit Vectors 2023; 16:427. [PMID: 37986088 PMCID: PMC10658958 DOI: 10.1186/s13071-023-06048-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 11/07/2023] [Indexed: 11/22/2023] Open
Abstract
BACKGROUND Recently, two invasive Aedes mosquito species, Ae. japonicus and Ae. koreicus, are circulating in several European countries posing potential health risks to humans and animals. Vector control is the main option to prevent mosquito-borne diseases, and an accurate genome sequence of these mosquitoes is essential to better understand their biology and to develop effective control strategies. METHODS A de novo genome assembly of Ae. japonicus (Ajap1) and Ae. koreicus (Akor1) has been produced based on a hybrid approach that combines Oxford Nanopore long-read and Illumina short-read data. Their quality was ascertained using various metrics. Masking of repetitive elements, gene prediction and functional annotation was performed. RESULTS Sequence analysis revealed a very high presence of repetitive DNA and, among others, thermal adaptation genes and insecticide-resistance genes. Through the RNA-seq analysis of larvae and adults of Ae. koreicus and Ae. japonicus exposed to different temperatures, we also identified genes showing a differential temperature-dependent activation. CONCLUSIONS The assembly of Akor1 and Ajap1 genomes constitutes the first updated collective knowledge of the genomes of both mosquito species, providing the possibility of understanding key mechanisms of their biology such as the ability to adapt to harsh climates and to develop insecticide-resistance mechanisms.
Collapse
Affiliation(s)
- Paolo L Catapano
- School of Biosciences and Veterinary Medicine, University of Camerino, Via Gentile III da Varano, 62032, Camerino, Italy
| | - Monica Falcinelli
- School of Biosciences and Veterinary Medicine, University of Camerino, Via Gentile III da Varano, 62032, Camerino, Italy
| | - Claudia Damiani
- School of Biosciences and Veterinary Medicine, University of Camerino, CIRM Italian Malaria Network, Via Gentile III da Varano, 62032, Camerino, Italy
| | - Alessia Cappelli
- School of Biosciences and Veterinary Medicine, University of Camerino, CIRM Italian Malaria Network, Via Gentile III da Varano, 62032, Camerino, Italy
| | - Despoina Koukouli
- School of Biosciences and Veterinary Medicine, University of Camerino, Via Gentile III da Varano, 62032, Camerino, Italy
| | - Paolo Rossi
- School of Biosciences and Veterinary Medicine, University of Camerino, Via Gentile III da Varano, 62032, Camerino, Italy
| | - Irene Ricci
- School of Biosciences and Veterinary Medicine, University of Camerino, CIRM Italian Malaria Network, Via Gentile III da Varano, 62032, Camerino, Italy
| | - Valerio Napolioni
- School of Biosciences and Veterinary Medicine, University of Camerino, Via Gentile III da Varano, 62032, Camerino, Italy
| | - Guido Favia
- School of Biosciences and Veterinary Medicine, University of Camerino, CIRM Italian Malaria Network, Via Gentile III da Varano, 62032, Camerino, Italy.
| |
Collapse
|
11
|
Coclet C, Sorensen PO, Karaoz U, Wang S, Brodie EL, Eloe-Fadrosh EA, Roux S. Virus diversity and activity is driven by snowmelt and host dynamics in a high-altitude watershed soil ecosystem. MICROBIOME 2023; 11:237. [PMID: 37891627 PMCID: PMC10604447 DOI: 10.1186/s40168-023-01666-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 09/07/2023] [Indexed: 10/29/2023]
Abstract
BACKGROUND Viruses impact nearly all organisms on Earth, including microbial communities and their associated biogeochemical processes. In soils, highly diverse viral communities have been identified, with a global distribution seemingly driven by multiple biotic and abiotic factors, especially soil temperature and moisture. However, our current understanding of the stability of soil viral communities across time and their response to strong seasonal changes in environmental parameters remains limited. Here, we investigated the diversity and activity of environmental soil DNA and RNA viruses, focusing especially on bacteriophages, across dynamics' seasonal changes in a snow-dominated mountainous watershed by examining paired metagenomes and metatranscriptomes. RESULTS We identified a large number of DNA and RNA viruses taxonomically divergent from existing environmental viruses, including a significant proportion of fungal RNA viruses, and a large and unsuspected diversity of positive single-stranded RNA phages (Leviviricetes), highlighting the under-characterization of the global soil virosphere. Among these, we were able to distinguish subsets of active DNA and RNA phages that changed across seasons, consistent with a "seed-bank" viral community structure in which new phage activity, for example, replication and host lysis, is sequentially triggered by changes in environmental conditions. At the population level, we further identified virus-host dynamics matching two existing ecological models: "Kill-The-Winner" which proposes that lytic phages are actively infecting abundant bacteria, and "Piggyback-The-Persistent" which argues that when the host is growing slowly, it is more beneficial to remain in a dormant state. The former was associated with summer months of high and rapid microbial activity, and the latter with winter months of limited and slow host growth. CONCLUSION Taken together, these results suggest that the high diversity of viruses in soils is likely associated with a broad range of host interaction types each adapted to specific host ecological strategies and environmental conditions. As our understanding of how environmental and host factors drive viral activity in soil ecosystems progresses, integrating these viral impacts in complex natural microbiome models will be key to accurately predict ecosystem biogeochemistry. Video Abstract.
Collapse
Affiliation(s)
- Clement Coclet
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| | - Patrick O Sorensen
- Earth and Environmental Sciences Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Ulas Karaoz
- Earth and Environmental Sciences Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Shi Wang
- Earth and Environmental Sciences Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Eoin L Brodie
- Earth and Environmental Sciences Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Department of Environmental Science, Policy and Management, University of California, Berkeley, Berkeley, CA, USA
| | - Emiley A Eloe-Fadrosh
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Simon Roux
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| |
Collapse
|
12
|
Riley R, Bowers RM, Camargo AP, Campbell A, Egan R, Eloe-Fadrosh EA, Foster B, Hofmeyr S, Huntemann M, Kellom M, Kimbrel JA, Oliker L, Yelick K, Pett-Ridge J, Salamov A, Varghese NJ, Clum A. Terabase-Scale Coassembly of a Tropical Soil Microbiome. Microbiol Spectr 2023; 11:e0020023. [PMID: 37310219 PMCID: PMC10434106 DOI: 10.1128/spectrum.00200-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 05/24/2023] [Indexed: 06/14/2023] Open
Abstract
Petabases of environmental metagenomic data are publicly available, presenting an opportunity to characterize complex environments and discover novel lineages of life. Metagenome coassembly, in which many metagenomic samples from an environment are simultaneously analyzed to infer the underlying genomes' sequences, is an essential tool for achieving this goal. We applied MetaHipMer2, a distributed metagenome assembler that runs on supercomputing clusters, to coassemble 3.4 terabases (Tbp) of metagenome data from a tropical soil in the Luquillo Experimental Forest (LEF), Puerto Rico. The resulting coassembly yielded 39 high-quality (>90% complete, <5% contaminated, with predicted 23S, 16S, and 5S rRNA genes and ≥18 tRNAs) metagenome-assembled genomes (MAGs), including two from the candidate phylum Eremiobacterota. Another 268 medium-quality (≥50% complete, <10% contaminated) MAGs were extracted, including the candidate phyla Dependentiae, Dormibacterota, and Methylomirabilota. In total, 307 medium- or higher-quality MAGs were assigned to 23 phyla, compared to 294 MAGs assigned to nine phyla in the same samples individually assembled. The low-quality (<50% complete, <10% contaminated) MAGs from the coassembly revealed a 49% complete rare biosphere microbe from the candidate phylum FCPU426 among other low-abundance microbes, an 81% complete fungal genome from the phylum Ascomycota, and 30 partial eukaryotic MAGs with ≥10% completeness, possibly representing protist lineages. A total of 22,254 viruses, many of them low abundance, were identified. Estimation of metagenome coverage and diversity indicates that we may have characterized ≥87.5% of the sequence diversity in this humid tropical soil and indicates the value of future terabase-scale sequencing and coassembly of complex environments. IMPORTANCE Petabases of reads are being produced by environmental metagenome sequencing. An essential step in analyzing these data is metagenome assembly, the computational reconstruction of genome sequences from microbial communities. "Coassembly" of metagenomic sequence data, in which multiple samples are assembled together, enables more complete detection of microbial genomes in an environment than "multiassembly," in which samples are assembled individually. To demonstrate the potential for coassembling terabases of metagenome data to drive biological discovery, we applied MetaHipMer2, a distributed metagenome assembler that runs on supercomputing clusters, to coassemble 3.4 Tbp of reads from a humid tropical soil environment. The resulting coassembly, its functional annotation, and analysis are presented here. The coassembly yielded more, and phylogenetically more diverse, microbial, eukaryotic, and viral genomes than the multiassembly of the same data. Our resource may facilitate the discovery of novel microbial biology in tropical soils and demonstrates the value of terabase-scale metagenome sequencing.
Collapse
Affiliation(s)
- Robert Riley
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley California, USA
| | - Robert M. Bowers
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley California, USA
| | - Antonio Pedro Camargo
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley California, USA
| | - Ashley Campbell
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California, USA
| | - Rob Egan
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley California, USA
| | | | - Brian Foster
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley California, USA
| | - Steven Hofmeyr
- Applied Math and Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Marcel Huntemann
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley California, USA
| | - Matthew Kellom
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley California, USA
| | - Jeffrey A. Kimbrel
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California, USA
| | - Leonid Oliker
- Applied Math and Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Katherine Yelick
- Applied Math and Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California, USA
| | - Jennifer Pett-Ridge
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California, USA
- Life & Environmental Sciences Department, University of California Merced, Merced, California, USA
| | - Asaf Salamov
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley California, USA
| | - Neha J. Varghese
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley California, USA
| | - Alicia Clum
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley California, USA
| |
Collapse
|
13
|
Hasegawa N, Shimizu K. Efficient Colored de Bruijn Graph for Indexing Reads. J Comput Biol 2023. [PMID: 37115583 DOI: 10.1089/cmb.2022.0259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023] Open
Abstract
The colored de Bruijn graph is a variation of the de Bruijn graph that has recently been utilized for indexing sequencing reads. Although state-of-the-art methods have achieved small index sizes, they produce many read-incoherent paths that tend to cover the same regions in the source genome sequence. To solve this problem, we propose an accurate coloring method that can reduce the generation of read-incoherent paths by utilizing different colors for a single read depending on the position in the read, which reduces ambiguous coloring in cases where a node has two successors, and both of the successors have the same color. To avoid having to memorize the order of the colors, we utilize a hash function to generate and reproduce the series of colors from the initial color and then apply a Bloom filter for storing the colors to reduce the index size. Experimental results using simulated data and real data demonstrate that our method reduces the occurrence of read-incoherent paths from 149,556 to only 2 and 5596 to 0 respectively. Moreover, the depths of coverage for the reconstructed reads are equal to those for the input reads for the simulated data, whereas the previous method decreases the depth of coverage at many positions in the source genome. Our method achieves quite a high accuracy with a comparable construction time, peak memory size, and index size to the previous method.
Collapse
Affiliation(s)
- Nozomi Hasegawa
- The Department of Computer Science and Engineering, Waseda University, Shinjuku-ku, Japan
| | - Kana Shimizu
- The Department of Computer Science and Engineering, Waseda University, Shinjuku-ku, Japan
- National Institute of Advanced Industrial Science and Technology, Koto-ku, Japan
| |
Collapse
|
14
|
Hyperactive nanobacteria with host-dependent traits pervade Omnitrophota. Nat Microbiol 2023; 8:727-744. [PMID: 36928026 PMCID: PMC10066038 DOI: 10.1038/s41564-022-01319-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2022] [Accepted: 12/30/2022] [Indexed: 03/18/2023]
Abstract
Candidate bacterial phylum Omnitrophota has not been isolated and is poorly understood. We analysed 72 newly sequenced and 349 existing Omnitrophota genomes representing 6 classes and 276 species, along with Earth Microbiome Project data to evaluate habitat, metabolic traits and lifestyles. We applied fluorescence-activated cell sorting and differential size filtration, and showed that most Omnitrophota are ultra-small (~0.2 μm) cells that are found in water, sediments and soils. Omnitrophota genomes in 6 classes are reduced, but maintain major biosynthetic and energy conservation pathways, including acetogenesis (with or without the Wood-Ljungdahl pathway) and diverse respirations. At least 64% of Omnitrophota genomes encode gene clusters typical of bacterial symbionts, suggesting host-associated lifestyles. We repurposed quantitative stable-isotope probing data from soils dominated by andesite, basalt or granite weathering and identified 3 families with high isotope uptake consistent with obligate bacterial predators. We propose that most Omnitrophota inhabit various ecosystems as predators or parasites.
Collapse
|
15
|
Dart E, Fuhrman JA, Ahlgren NA. Diverse Marine T4-like Cyanophage Communities Are Primarily Comprised of Low-Abundance Species Including Species with Distinct Seasonal, Persistent, Occasional, or Sporadic Dynamics. Viruses 2023; 15:v15020581. [PMID: 36851794 PMCID: PMC9960396 DOI: 10.3390/v15020581] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 02/16/2023] [Accepted: 02/17/2023] [Indexed: 02/22/2023] Open
Abstract
Cyanophages exert important top-down controls on their cyanobacteria hosts; however, concurrent analysis of both phage and host populations is needed to better assess phage-host interaction models. We analyzed picocyanobacteria Prochlorococcus and Synechococcus and T4-like cyanophage communities in Pacific Ocean surface waters using five years of monthly viral and cellular fraction metagenomes. Cyanophage communities contained thousands of mostly low-abundance (<2% relative abundance) species with varying temporal dynamics, categorized as seasonally recurring or non-seasonal and occurring persistently, occasionally, or sporadically (detected in ≥85%, 15-85%, or <15% of samples, respectively). Viromes contained mostly seasonal and persistent phages (~40% each), while cellular fraction metagenomes had mostly sporadic species (~50%), reflecting that these sample sets capture different steps of the infection cycle-virions from prior infections or within currently infected cells, respectively. Two groups of seasonal phages correlated to Synechococcus or Prochlorococcus were abundant in spring/summer or fall/winter, respectively. Cyanophages likely have a strong influence on the host community structure, as their communities explained up to 32% of host community variation. These results support how both seasonally recurrent and apparent stochastic processes, likely determined by host availability and different host-range strategies among phages, are critical to phage-host interactions and dynamics, consistent with both the Kill-the-Winner and the Bank models.
Collapse
Affiliation(s)
- Emily Dart
- Department of Biology, Clark University, Worcester, MA 01610, USA
| | - Jed A. Fuhrman
- Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Nathan A. Ahlgren
- Department of Biology, Clark University, Worcester, MA 01610, USA
- Correspondence: ; Tel.: +1-(508)-793-7107
| |
Collapse
|
16
|
Zhou T, Lu L, Li C. Optimization of the " in-silico" mate-pair method improves contiguity and accuracy of genome assembly. Ecol Evol 2023; 13:e9745. [PMID: 36644701 PMCID: PMC9833964 DOI: 10.1002/ece3.9745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2022] [Revised: 12/30/2022] [Accepted: 12/30/2022] [Indexed: 01/13/2023] Open
Abstract
A combination of short-insert paired-ended and mate-pair libraries of large insert sizes is used as a standard method to generate genome assemblies with high contiguity. The third-generation sequencing techniques also are used to improve the quality of assembled genomes. However, both mate-pair libraries and the third-generation libraries require high-molecular-weight DNA, making the use of these libraries inappropriate for samples with only degraded DNA. An in silico method that generates mate-pair libraries using a reference genome was devised for the task of assembling target genomes. Although the contiguity and completeness of assembled genomes were significantly improved by this method, a high level of errors manifested in the assembly, further to which the methods for using reference genomes, was not optimized. Here, we tested different strategies for using reference genomes to generate in silico mate-pairs. The results showed that using a closely related reference genome from the same genus was more effective than using divergent references. Conservation of in silico mate-pairs by comparing two references and using those to guide genome assembly reduced the number of misassemblies (18.6%-46.1%) and increased the contiguity of assembled genomes (9.7%-70.7%), while maintaining gene completeness at a level that was either similar or marginally lower than that obtained via the current method. Finally, we developed a pipeline of the optimized in silico method and compared it with another reference-guided assembler, RagTag. We found that RagTag produced longer scaffolds (17.8 Mbp vs 3.0 Mbp), but resulted in a much higher misassembly rate (85.68%) than our optimized in silico mate-pair method. This optimized in silico pipeline developed in this study should facilitate further studies on genomics, population genetics, and conservation of endangered species.
Collapse
Affiliation(s)
- Tao Zhou
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and EvolutionShanghai Ocean UniversityShanghaiChina
- Shanghai Collaborative Innovation for Aquatic Animal Genetics and BreedingShanghai Ocean UniversityShanghaiChina
| | - Liang Lu
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and EvolutionShanghai Ocean UniversityShanghaiChina
- Shanghai Collaborative Innovation for Aquatic Animal Genetics and BreedingShanghai Ocean UniversityShanghaiChina
| | - Chenhong Li
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and EvolutionShanghai Ocean UniversityShanghaiChina
- Shanghai Collaborative Innovation for Aquatic Animal Genetics and BreedingShanghai Ocean UniversityShanghaiChina
| |
Collapse
|
17
|
Luo M, Ji Y, Warton D, Yu DW. Extracting abundance information from DNA-based data. Mol Ecol Resour 2023; 23:174-189. [PMID: 35986714 PMCID: PMC10087802 DOI: 10.1111/1755-0998.13703] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 07/31/2022] [Accepted: 08/16/2022] [Indexed: 11/26/2022]
Abstract
The accurate extraction of species-abundance information from DNA-based data (metabarcoding, metagenomics) could contribute usefully to diet analysis and food-web reconstruction, the inference of species interactions, the modelling of population dynamics and species distributions, the biomonitoring of environmental state and change, and the inference of false positives and negatives. However, multiple sources of bias and noise in sampling and processing combine to inject error into DNA-based data sets. To understand how to extract abundance information, it is useful to distinguish two concepts. (i) Within-sample across-species quantification describes relative species abundances in one sample. (ii) Across-sample within-species quantification describes how the abundance of each individual species varies from sample to sample, such as over a time series, an environmental gradient or different experimental treatments. First, we review the literature on methods to recover across-species abundance information (by removing what we call "species pipeline biases") and within-species abundance information (by removing what we call "pipeline noise"). We argue that many ecological questions can be answered with just within-species quantification, and we therefore demonstrate how to use a "DNA spike-in" to correct for pipeline noise and recover within-species abundance information. We also introduce a model-based estimator that can be used on data sets without a physical spike-in to approximate and correct for pipeline noise.
Collapse
Affiliation(s)
- Mingjie Luo
- State Key Laboratory of Genetic Resources and Evolution and Yunnan Key Laboratory of Biodiversity and Ecological Security of Gaoligong MountainKunming Institute of Zoology, Chinese Academy of SciencesKunmingYunnanChina
- Kunming College of Life SciencesUniversity of Chinese Academy of SciencesKunmingYunnanChina
| | - Yinqiu Ji
- State Key Laboratory of Genetic Resources and Evolution and Yunnan Key Laboratory of Biodiversity and Ecological Security of Gaoligong MountainKunming Institute of Zoology, Chinese Academy of SciencesKunmingYunnanChina
| | - David Warton
- School of Mathematics and StatisticsUNSW SydneySydneyNew South WalesAustralia
- Evolution and Ecology Research Centre, UNSW SydneySydneyNew South WalesAustralia
| | - Douglas W. Yu
- State Key Laboratory of Genetic Resources and Evolution and Yunnan Key Laboratory of Biodiversity and Ecological Security of Gaoligong MountainKunming Institute of Zoology, Chinese Academy of SciencesKunmingYunnanChina
- Center for Excellence in Animal Evolution and GeneticsChinese Academy of SciencesKunmingYunnanChina
- School of Biological SciencesUniversity of East Anglia, Norwich Research ParkNorwichNorfolkUK
| |
Collapse
|
18
|
Enhanced terrestrial Fe(II) mobilization identified through a novel mechanism of microbially driven cave formation in Fe(III)-rich rocks. Sci Rep 2022; 12:17062. [PMID: 36224210 PMCID: PMC9556595 DOI: 10.1038/s41598-022-21365-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Accepted: 09/26/2022] [Indexed: 12/30/2022] Open
Abstract
Most cave formation requires mass separation from a host rock in a process that operates outward from permeable pathways to create the cave void. Given the poor solubility of Fe(III) phases, such processes are insufficient to account for the significant iron formation caves (IFCs) seen in Brazilian banded iron formations (BIF) and associated rock. In this study we demonstrate that microbially-mediated reductive Fe(III) dissolution is solubilizing the poorly soluble Fe(III) phases to soluble Fe(II) in the anoxic zone behind cave walls. The resultant Fe(III)-depleted material (termed sub muros) is unable to maintain the structural integrity of the walls and repeated rounds of wall collapse lead to formation of the cave void in an active, measurable process. This mechanism may move significant quantities of Fe(II) into ground water and may help to explain the mechanism of BIF dissolution and REE enrichment in the generation of canga. The role of Fe(III) reducing microorganism and mass separation behind the walls (outward-in, rather than inward-out) is not only a novel mechanism of speleogenesis, but it also may identify a previously overlooked source of continental Fe that may have contributed to Archaean BIF formation.
Collapse
|
19
|
Çiftçi O, Alverson AJ, van Bodegom P, Roberts WR, Mertens A, Van de Vijver B, Trobajo R, Mann DG, Pirovano W, van Eijk I, Gravendeel B. Phylotranscriptomics reveals the reticulate evolutionary history of a widespread diatom species complex. JOURNAL OF PHYCOLOGY 2022; 58:643-656. [PMID: 35861132 PMCID: PMC9804273 DOI: 10.1111/jpy.13281] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Accepted: 06/29/2022] [Indexed: 06/15/2023]
Abstract
In contrast to surveys based on a few genes that often provide limited taxonomic resolution, transcriptomes provide a wealth of genomic loci that can resolve relationships among taxonomically challenging lineages. Diatoms are a diverse group of aquatic microalgae that includes important bioindicator species and many such lineages. One example is Nitzschia palea, a widespread species complex with several morphologically defined taxonomic varieties, some of which are critical pollution indicators. Morphological differences among the varieties are subtle and phylogenetic studies based on a few genes fail to resolve their evolutionary relationships. We conducted morphometric and transcriptome analyses of 10 Nitzschia palea strains to resolve the relationships among strains and taxonomic varieties. Nitzschia palea was resolved into three clades, one of which corresponds to a group of strains with narrow linear-lanceolate valves. The other morphological group recovered in the shape outline analysis was not monophyletic and consisted of two clades. Gene-tree concordance analyses and phylogenetic network estimations revealed patterns of incomplete lineage sorting and gene flow between intraspecific lineages. We detected reticulated evolutionary patterns among lineages with different morphologies, resulting in a putative recent hybrid. Our study shows that phylogenomic analyses of unlinked nuclear loci, complemented with morphometrics, can resolve complex evolutionary histories of recently diverged species complexes.
Collapse
Affiliation(s)
- Ozan Çiftçi
- Institute of Environmental Sciences (CML)Leiden UniversityBox 95182300 RALeidenThe Netherlands
- Naturalis Biodiversity CenterDarwinweg 22333 CRLeidenThe Netherlands
- BaseClear B.VSylviusweg 742333 BELeidenthe Netherlands
| | - Andrew J. Alverson
- Department of Biological SciencesUniversity of Arkansas, 1 University of ArkansasFayettevilleArkansas72701USA
| | - Peter van Bodegom
- Institute of Environmental Sciences (CML)Leiden UniversityBox 95182300 RALeidenThe Netherlands
| | - Wade R. Roberts
- Department of Biological SciencesUniversity of Arkansas, 1 University of ArkansasFayettevilleArkansas72701USA
| | | | - Bart Van de Vijver
- Meise Botanic Garden Meise, Research DepartmentNieuwelaan 381860MeiseBelgium
- University of Antwerp, Department of Biology – ECOBEUniversiteitsplein 1B‐2610WilrijkBelgium
| | - Rosa Trobajo
- IRTA‐Institute for Food and Agricultural Research and Technology, Marine and Continental Waters ProgrammeCtra de Poble Nou Km 5.5, E43540, La RàpitaCataloniaSpain
| | - David G. Mann
- IRTA‐Institute for Food and Agricultural Research and Technology, Marine and Continental Waters ProgrammeCtra de Poble Nou Km 5.5, E43540, La RàpitaCataloniaSpain
- Royal Botanic Garden EdinburghEdinburghEH3 5LRScotlandUK
| | | | - Iris van Eijk
- Bayer Crop ScienceLeeuwenhoekweg 522661 CZBergschenhoekThe Netherlands
| | - Barbara Gravendeel
- Naturalis Biodiversity CenterDarwinweg 22333 CRLeidenThe Netherlands
- Radboud Institute for Biological and Environmental SciencesHeyendaalseweg 1356500 GLNijmegenThe Netherlands
| |
Collapse
|
20
|
Genome sequence assembly algorithms and misassembly identification methods. Mol Biol Rep 2022; 49:11133-11148. [PMID: 36151399 DOI: 10.1007/s11033-022-07919-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Accepted: 09/05/2022] [Indexed: 10/14/2022]
Abstract
The sequence assembly algorithms have rapidly evolved with the vigorous growth of genome sequencing technology over the past two decades. Assembly mainly uses the iterative expansion of overlap relationships between sequences to construct the target genome. The assembly algorithms can be typically classified into several categories, such as the Greedy strategy, Overlap-Layout-Consensus (OLC) strategy, and de Bruijn graph (DBG) strategy. In particular, due to the rapid development of third-generation sequencing (TGS) technology, some prevalent assembly algorithms have been proposed to generate high-quality chromosome-level assemblies. However, due to the genome complexity, the length of short reads, and the high error rate of long reads, contigs produced by assembly may contain misassemblies adversely affecting downstream data analysis. Therefore, several read-based and reference-based methods for misassembly identification have been developed to improve assembly quality. This work primarily reviewed the development of DNA sequencing technologies and summarized sequencing data simulation methods, sequencing error correction methods, various mainstream sequence assembly algorithms, and misassembly identification methods. A large amount of computation makes the sequence assembly problem more challenging, and therefore, it is necessary to develop more efficient and accurate assembly algorithms and alternative algorithms.
Collapse
|
21
|
Tang T, Hutvagner G, Wang W, Li J. Simultaneous compression of multiple error-corrected short-read sets for faster data transmission and better de novo assemblies. Brief Funct Genomics 2022; 21:387-398. [PMID: 35848773 DOI: 10.1093/bfgp/elac016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 06/10/2022] [Accepted: 06/14/2022] [Indexed: 11/14/2022] Open
Abstract
Next-Generation Sequencing has produced incredible amounts of short-reads sequence data for de novo genome assembly over the last decades. For efficient transmission of these huge datasets, high-performance compression algorithms have been intensively studied. As both the de novo assembly and error correction methods utilize the overlaps between reads data, a concern is that the will the sequencing errors bring up negative effects on genome assemblies also affect the compression of the NGS data. This work addresses two problems: how current error correction algorithms can enable the compression algorithms to make the sequence data much more compact, and whether the sequence-modified reads by the error-correction algorithms will lead to quality improvement for de novo contig assembly. As multiple sets of short reads are often produced by a single biomedical project in practice, we propose a graph-based method to reorder the files in the collection of multiple sets and then compress them simultaneously for a further compression improvement after error correction. We use examples to illustrate that accurate error correction algorithms can significantly reduce the number of mismatched nucleotides in the reference-free compression, hence can greatly improve the compression performance. Extensive test on practical collections of multiple short-read sets does confirm that the compression performance on the error-corrected data (with unchanged size) significantly outperforms that on the original data, and that the file reordering idea contributes furthermore. The error correction on the original reads has also resulted in quality improvements of the genome assemblies, sometimes remarkably. However, it is still an open question that how to combine appropriate error correction methods with an assembly algorithm so that the assembly performance can be always significantly improved.
Collapse
Affiliation(s)
- Tao Tang
- Data Science Institute, University of Technology Sydney, 81 Broadway, Ultimo, 2007, NSW, Australia.,School of Mordern Posts, Nanjing University of Posts and Telecommunications, 9 Wenyuan Rd, Qixia District, 210003, Jiangsu, China
| | - Gyorgy Hutvagner
- School of Biomedical Engineering, University of Technology Sydney, 81 Broadway, Ultimo, 2007, NSW, Australia
| | - Wenjian Wang
- School of Computer and Information Technology, Shanxi University, Shanxi Road, 030006, Shanxi, China
| | - Jinyan Li
- Data Science Institute, University of Technology Sydney, 81 Broadway, Ultimo, 2007, NSW, Australia
| |
Collapse
|
22
|
Kumar R, Kane H, Wang Q, Hibberd A, Jensen HM, Kim HS, Bak SY, Auzanneau I, Bry S, Christensen N, Friedman A, Rasinkangas P, Ouwehand AC, Forssten SD, Hasselwander O. Identification and Characterization of a Novel Species of Genus Akkermansia with Metabolic Health Effects in a Diet-Induced Obesity Mouse Model. Cells 2022; 11:cells11132084. [PMID: 35805168 PMCID: PMC9265676 DOI: 10.3390/cells11132084] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 06/26/2022] [Accepted: 06/29/2022] [Indexed: 12/15/2022] Open
Abstract
Akkermansia muciniphila is a well-known bacterium with the ability to degrade mucin. This metabolic capability is believed to play an important role in the colonization of this bacterium in the gut. In this study, we report the identification and characterization of a novel Akkermansia sp. DSM 33459 isolated from human feces of a healthy donor. Phylogenetic analysis based on the genome-wide average nucleotide identity indicated that the Akkermansia sp. DSM 33459 has only 87.5% similarity with the type strain A. muciniphila ATCC BAA-835. Akkermansia sp. DSM 33459 showed significant differences in its fatty acid profile and carbon utilization as compared to the type strain. The Akkermansia sp. DSM 33459 strain was tested in a preclinical obesity model to determine its effect on metabolic markers. Akkermansia sp. DSM 33459 showed significant improvement in body weight, total fat weight, and resistin and insulin levels. Interestingly, these effects were more pronounced with the live form as compared to a pasteurized form of the strain. The strain showed production of agmatine, suggesting a potential novel mechanism for supporting metabolic and cognitive health. Based on its phenotypic features and phylogenetic position, it is proposed that this isolate represents a novel species in the genus Akkermansia and a promising therapeutic candidate for the management of metabolic diseases.
Collapse
Affiliation(s)
- Ritesh Kumar
- Health & Biosciences, International Flavors & Fragrances, Inc. (IFF), Wilmington, DE 19803, USA; (H.K.); (Q.W.); (H.-S.K.); (A.F.)
- Correspondence: ; Tel.: +1-302-379-4738
| | - Helene Kane
- Health & Biosciences, International Flavors & Fragrances, Inc. (IFF), Wilmington, DE 19803, USA; (H.K.); (Q.W.); (H.-S.K.); (A.F.)
| | - Qiong Wang
- Health & Biosciences, International Flavors & Fragrances, Inc. (IFF), Wilmington, DE 19803, USA; (H.K.); (Q.W.); (H.-S.K.); (A.F.)
| | | | - Henrik Max Jensen
- Health & Biosciences, IFF, 8220 Brabrand, Denmark; (H.M.J.); (S.Y.B.); (N.C.)
| | - Hye-Sook Kim
- Health & Biosciences, International Flavors & Fragrances, Inc. (IFF), Wilmington, DE 19803, USA; (H.K.); (Q.W.); (H.-S.K.); (A.F.)
| | - Steffen Yde Bak
- Health & Biosciences, IFF, 8220 Brabrand, Denmark; (H.M.J.); (S.Y.B.); (N.C.)
| | | | - Stéphanie Bry
- Health & Biosciences, IFF, 86270 Dange, France; (I.A.); (S.B.)
| | - Niels Christensen
- Health & Biosciences, IFF, 8220 Brabrand, Denmark; (H.M.J.); (S.Y.B.); (N.C.)
| | - Andrew Friedman
- Health & Biosciences, International Flavors & Fragrances, Inc. (IFF), Wilmington, DE 19803, USA; (H.K.); (Q.W.); (H.-S.K.); (A.F.)
| | - Pia Rasinkangas
- Health & Biosciences, IFF, 02460 Kantvik, Finland; (P.R.); (A.C.O.); (S.D.F.)
| | - Arthur C. Ouwehand
- Health & Biosciences, IFF, 02460 Kantvik, Finland; (P.R.); (A.C.O.); (S.D.F.)
| | - Sofia D. Forssten
- Health & Biosciences, IFF, 02460 Kantvik, Finland; (P.R.); (A.C.O.); (S.D.F.)
| | | |
Collapse
|
23
|
Kallenborn F, Cascitti J, Schmidt B. CARE 2.0: reducing false-positive sequencing error corrections using machine learning. BMC Bioinformatics 2022; 23:227. [PMID: 35698033 PMCID: PMC9195321 DOI: 10.1186/s12859-022-04754-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 05/30/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Next-generation sequencing pipelines often perform error correction as a preprocessing step to obtain cleaned input data. State-of-the-art error correction programs are able to reliably detect and correct the majority of sequencing errors. However, they also introduce new errors by making false-positive corrections. These correction mistakes can have negative impact on downstream analysis, such as k-mer statistics, de-novo assembly, and variant calling. This motivates the need for more precise error correction tools. RESULTS We present CARE 2.0, a context-aware read error correction tool based on multiple sequence alignment targeting Illumina datasets. In addition to a number of newly introduced optimizations its most significant change is the replacement of CARE 1.0's hand-crafted correction conditions with a novel classifier based on random decision forests trained on Illumina data. This results in up to two orders-of-magnitude fewer false-positive corrections compared to other state-of-the-art error correction software. At the same time, CARE 2.0 is able to achieve high numbers of true-positive corrections comparable to its competitors. On a simulated full human dataset with 914M reads CARE 2.0 generates only 1.2M false positives (FPs) (and 801.4M true positives (TPs)) at a highly competitive runtime while the best corrections achieved by other state-of-the-art tools contain at least 3.9M FPs and at most 814.5M TPs. Better de-novo assembly and improved k-mer analysis show the applicability of CARE 2.0 to real-world data. CONCLUSION False-positive corrections can negatively influence down-stream analysis. The precision of CARE 2.0 greatly reduces the number of those corrections compared to other state-of-the-art programs including BFC, Karect, Musket, Bcool, SGA, and Lighter. Thus, higher-quality datasets are produced which improve k-mer analysis and de-novo assembly in real-world datasets which demonstrates the applicability of machine learning techniques in the context of sequencing read error correction. CARE 2.0 is written in C++/CUDA for Linux systems and can be run on the CPU as well as on CUDA-enabled GPUs. It is available at https://github.com/fkallen/CARE .
Collapse
Affiliation(s)
- Felix Kallenborn
- Department of Computer Science, Johannes Gutenberg University Mainz, Mainz, Germany.
| | - Julian Cascitti
- Department of Computer Science, Johannes Gutenberg University Mainz, Mainz, Germany
| | - Bertil Schmidt
- Department of Computer Science, Johannes Gutenberg University Mainz, Mainz, Germany
| |
Collapse
|
24
|
Pietsch GM, Gazis R, Klingeman WE, Huff ML, Staton ME, Kolarik M, Hadziabdic D. Characterization and microsatellite marker development for a common bark and ambrosia beetle associate, Geosmithia obscura. Microbiologyopen 2022; 11:e1286. [PMID: 35765178 PMCID: PMC9108439 DOI: 10.1002/mbo3.1286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 04/27/2022] [Indexed: 11/12/2022] Open
Abstract
Symbioses between Geosmithia fungi and wood-boring and bark beetles seldom result in disease induction within the plant host. Yet, exceptions exist such as Geosmithia morbida, the causal agent of Thousand Cankers Disease (TCD) of walnuts and wingnuts, and Geosmithia sp. 41, the causal agent of Foamy Bark Canker disease of oaks. Isolates of G. obscura were recovered from black walnut trees in eastern Tennessee and at least one isolate induced cankers following artificial inoculation. Due to the putative pathogenicity and lack of recovery of G. obscura from natural lesions, a molecular diagnostic screening tool was developed using microsatellite markers mined from the G. obscura genome. A total of 3256 candidate microsatellite markers were identified (2236, 789, 137 di-, tri-, and tetranucleotide motifs, respectively), with 2011, 703, 101 di-, tri-, and tetranucleotide motifs, respectively, containing markers with primers. From these, 75 microsatellite markers were randomly selected, screened, and optimized, resulting in 28 polymorphic markers that yielded single, consistently recovered bands, which were used in downstream analyses. Five of these microsatellite markers were found to be specific to G. obscura and did not cross-amplify into other, closely related species. Although the remaining tested markers could be useful, they cross-amplified within different Geosmithia species, making them not reliable for G. obscura detection. Five novel microsatellite markers (GOBS9, GOBS10, GOBS41, GOBS43, and GOBS50) were developed based on the G. obscura genome. These species-specific microsatellite markers are available as a tool for use in molecular diagnostics and can assist future surveillance studies.
Collapse
Affiliation(s)
- Grace M. Pietsch
- Department of Plant SciencesThe University of TennesseeKnoxvilleTennesseeUSA
| | - Romina Gazis
- Department of Plant PathologyUniversity of FloridaHomesteadFloridaUSA
| | | | - Matthew L. Huff
- Department of Entomology and Plant PathologyThe University of TennesseeKnoxvilleTennesseeUSA
| | - Margaret E. Staton
- Department of Entomology and Plant PathologyThe University of TennesseeKnoxvilleTennesseeUSA
| | - Miroslav Kolarik
- Institute of MicrobiologyCzech Academy of SciencesPragueCzech Republic
| | - Denita Hadziabdic
- Department of Entomology and Plant PathologyThe University of TennesseeKnoxvilleTennesseeUSA
| |
Collapse
|
25
|
Reji L, Cardarelli EL, Boye K, Bargar JR, Francis CA. Diverse ecophysiological adaptations of subsurface Thaumarchaeota in floodplain sediments revealed through genome-resolved metagenomics. THE ISME JOURNAL 2022; 16:1140-1152. [PMID: 34873295 PMCID: PMC8940955 DOI: 10.1038/s41396-021-01167-7] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Revised: 11/17/2021] [Accepted: 11/26/2021] [Indexed: 02/03/2023]
Abstract
The terrestrial subsurface microbiome contains vastly underexplored phylogenetic diversity and metabolic novelty, with critical implications for global biogeochemical cycling. Among the key microbial inhabitants of subsurface soils and sediments are Thaumarchaeota, an archaeal phylum that encompasses ammonia-oxidizing archaea (AOA) as well as non-ammonia-oxidizing basal lineages. Thaumarchaeal ecology in terrestrial systems has been extensively characterized, particularly in the case of AOA. However, there is little knowledge on the diversity and ecophysiology of Thaumarchaeota in deeper soils, as most lineages, particularly basal groups, remain uncultivated and underexplored. Here we use genome-resolved metagenomics to examine the phylogenetic and metabolic diversity of Thaumarchaeota along a 234 cm depth profile of hydrologically variable riparian floodplain sediments in the Wind River Basin near Riverton, Wyoming. Phylogenomic analysis of the metagenome-assembled genomes (MAGs) indicates a shift in AOA population structure from the dominance of the terrestrial Nitrososphaerales lineage in the well-drained top ~100 cm of the profile to the typically marine Nitrosopumilales in deeper, moister, more energy-limited sediment layers. We also describe two deeply rooting non-AOA MAGs with numerous unexpected metabolic features, including the reductive acetyl-CoA (Wood-Ljungdahl) pathway, tetrathionate respiration, a form III RuBisCO, and the potential for extracellular electron transfer. These MAGs also harbor tungsten-containing aldehyde:ferredoxin oxidoreductase, group 4f [NiFe]-hydrogenases and a canonical heme catalase, typically not found in Thaumarchaeota. Our results suggest that hydrological variables, particularly proximity to the water table, impart a strong control on the ecophysiology of Thaumarchaeota in alluvial sediments.
Collapse
Affiliation(s)
- Linta Reji
- grid.168010.e0000000419368956Department of Earth System Science, Stanford University, Stanford, CA USA ,grid.16750.350000 0001 2097 5006Present Address: Department of Geosciences, Princeton University, Princeton, NJ USA
| | - Emily L. Cardarelli
- grid.168010.e0000000419368956Department of Earth System Science, Stanford University, Stanford, CA USA ,grid.20861.3d0000000107068890Present Address: Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA USA
| | - Kristin Boye
- grid.445003.60000 0001 0725 7771Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA USA
| | - John R. Bargar
- grid.445003.60000 0001 0725 7771Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA USA
| | - Christopher A. Francis
- grid.168010.e0000000419368956Department of Earth System Science, Stanford University, Stanford, CA USA
| |
Collapse
|
26
|
Metagenome-Assembled Genomes from a Microbiome Converting Xylose to Medium-Chain Carboxylic Acids. Microbiol Resour Announc 2022; 11:e0115121. [PMID: 35343806 PMCID: PMC9022542 DOI: 10.1128/mra.01151-21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
There is growing interest in producing beneficial products from wastes using microbiomes. We previously performed multiomic analyses of a bioreactor microbiome that converted carbohydrate-rich lignocellulosic residues to medium-chain carboxylic acids. Here, we present draft metagenome-assembled genomes from this microbiome, obtained from reactors in which xylose was the primary carbon source.
Collapse
|
27
|
Aegilops sharonensis genome-assisted identification of stem rust resistance gene Sr62. Nat Commun 2022; 13:1607. [PMID: 35338132 PMCID: PMC8956640 DOI: 10.1038/s41467-022-29132-8] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 02/24/2022] [Indexed: 02/06/2023] Open
Abstract
The wild relatives and progenitors of wheat have been widely used as sources of disease resistance (R) genes. Molecular identification and characterization of these R genes facilitates their manipulation and tracking in breeding programmes. Here, we develop a reference-quality genome assembly of the wild diploid wheat relative Aegilops sharonensis and use positional mapping, mutagenesis, RNA-Seq and transgenesis to identify the stem rust resistance gene Sr62, which has also been transferred to common wheat. This gene encodes a tandem kinase, homologues of which exist across multiple taxa in the plant kingdom. Stable Sr62 transgenic wheat lines show high levels of resistance against diverse isolates of the stem rust pathogen, highlighting the utility of Sr62 for deployment as part of a polygenic stack to maximize the durability of stem rust resistance. Aegilops sharonensis is a wild diploid relative of wheat. Here, the authors assemble the genome of Ae. sharonensis and use the assembly as an aid to clone the Ae. sharonensis-derived stem rust resistance gene Sr62 in the allohexaploid genome of wheat.
Collapse
|
28
|
Panwar P, Allen MA, Williams TJ, Haque S, Brazendale S, Hancock AM, Paez-Espino D, Cavicchioli R. Remarkably coherent population structure for a dominant Antarctic Chlorobium species. MICROBIOME 2021; 9:231. [PMID: 34823595 PMCID: PMC8620254 DOI: 10.1186/s40168-021-01173-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Accepted: 10/09/2021] [Indexed: 05/22/2023]
Abstract
BACKGROUND In Antarctica, summer sunlight enables phototrophic microorganisms to drive primary production, thereby "feeding" ecosystems to enable their persistence through the long, dark winter months. In Ace Lake, a stratified marine-derived system in the Vestfold Hills of East Antarctica, a Chlorobium species of green sulphur bacteria (GSB) is the dominant phototroph, although its seasonal abundance changes more than 100-fold. Here, we analysed 413 Gb of Antarctic metagenome data including 59 Chlorobium metagenome-assembled genomes (MAGs) from Ace Lake and nearby stratified marine basins to determine how genome variation and population structure across a 7-year period impacted ecosystem function. RESULTS A single species, Candidatus Chlorobium antarcticum (most similar to Chlorobium phaeovibrioides DSM265) prevails in all three aquatic systems and harbours very little genomic variation (≥ 99% average nucleotide identity). A notable feature of variation that did exist related to the genomic capacity to biosynthesize cobalamin. The abundance of phylotypes with this capacity changed seasonally ~ 2-fold, consistent with the population balancing the value of a bolstered photosynthetic capacity in summer against an energetic cost in winter. The very high GSB concentration (> 108 cells ml-1 in Ace Lake) and seasonal cycle of cell lysis likely make Ca. Chlorobium antarcticum a major provider of cobalamin to the food web. Analysis of Ca. Chlorobium antarcticum viruses revealed the species to be infected by generalist (rather than specialist) viruses with a broad host range (e.g., infecting Gammaproteobacteria) that were present in diverse Antarctic lakes. The marked seasonal decrease in Ca. Chlorobium antarcticum abundance may restrict specialist viruses from establishing effective lifecycles, whereas generalist viruses may augment their proliferation using other hosts. CONCLUSION The factors shaping Antarctic microbial communities are gradually being defined. In addition to the cold, the annual variation in sunlight hours dictates which phototrophic species can grow and the extent to which they contribute to ecosystem processes. The Chlorobium population studied was inferred to provide cobalamin, in addition to carbon, nitrogen, hydrogen, and sulphur cycling, as critical ecosystem services. The specific Antarctic environmental factors and major ecosystem benefits afforded by this GSB likely explain why such a coherent population structure has developed in this Chlorobium species. Video abstract.
Collapse
Affiliation(s)
- Pratibha Panwar
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, New South Wales, 2052, Australia
| | - Michelle A Allen
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, New South Wales, 2052, Australia
| | - Timothy J Williams
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, New South Wales, 2052, Australia
| | - Sabrina Haque
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, New South Wales, 2052, Australia
- Present address: Department of Molecular Sciences, Macquarie University, Sydney, New South Wales, 2109, Australia
| | - Sarah Brazendale
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, New South Wales, 2052, Australia
- , Present address: Pegarah, Australia
| | - Alyce M Hancock
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, New South Wales, 2052, Australia
- Present address: Institute for Marine and Antarctic Studies, University of Tasmania, 20 Castray Esplanade, Battery Point, Tasmania, Australia
| | - David Paez-Espino
- Department of Energy Joint Genome Institute, Berkeley, CA, USA
- Present address: Mammoth Biosciences, Inc., 1000 Marina Blvd. Suite 600, Brisbane, CA, USA
| | - Ricardo Cavicchioli
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, New South Wales, 2052, Australia.
| |
Collapse
|
29
|
Parisot N, Vargas-Chávez C, Goubert C, Baa-Puyoulet P, Balmand S, Beranger L, Blanc C, Bonnamour A, Boulesteix M, Burlet N, Calevro F, Callaerts P, Chancy T, Charles H, Colella S, Da Silva Barbosa A, Dell'Aglio E, Di Genova A, Febvay G, Gabaldón T, Galvão Ferrarini M, Gerber A, Gillet B, Hubley R, Hughes S, Jacquin-Joly E, Maire J, Marcet-Houben M, Masson F, Meslin C, Montagné N, Moya A, Ribeiro de Vasconcelos AT, Richard G, Rosen J, Sagot MF, Smit AFA, Storer JM, Vincent-Monegat C, Vallier A, Vigneron A, Zaidman-Rémy A, Zamoum W, Vieira C, Rebollo R, Latorre A, Heddi A. The transposable element-rich genome of the cereal pest Sitophilus oryzae. BMC Biol 2021; 19:241. [PMID: 34749730 PMCID: PMC8576890 DOI: 10.1186/s12915-021-01158-2] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Accepted: 09/27/2021] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND The rice weevil Sitophilus oryzae is one of the most important agricultural pests, causing extensive damage to cereal in fields and to stored grains. S. oryzae has an intracellular symbiotic relationship (endosymbiosis) with the Gram-negative bacterium Sodalis pierantonius and is a valuable model to decipher host-symbiont molecular interactions. RESULTS We sequenced the Sitophilus oryzae genome using a combination of short and long reads to produce the best assembly for a Curculionidae species to date. We show that S. oryzae has undergone successive bursts of transposable element (TE) amplification, representing 72% of the genome. In addition, we show that many TE families are transcriptionally active, and changes in their expression are associated with insect endosymbiotic state. S. oryzae has undergone a high gene expansion rate, when compared to other beetles. Reconstruction of host-symbiont metabolic networks revealed that, despite its recent association with cereal weevils (30 kyear), S. pierantonius relies on the host for several amino acids and nucleotides to survive and to produce vitamins and essential amino acids required for insect development and cuticle biosynthesis. CONCLUSIONS Here we present the genome of an agricultural pest beetle, which may act as a foundation for pest control. In addition, S. oryzae may be a useful model for endosymbiosis, and studying TE evolution and regulation, along with the impact of TEs on eukaryotic genomes.
Collapse
Affiliation(s)
- Nicolas Parisot
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Carlos Vargas-Chávez
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
- Institute for Integrative Systems Biology (I2SySBio), Universitat de València and Spanish Research Council (CSIC), València, Spain
- Present Address: Institute of Evolutionary Biology (IBE), CSIC-Universitat Pompeu Fabra, Barcelona, Spain
| | - Clément Goubert
- Laboratoire de Biométrie et Biologie Evolutive, UMR5558, Université Lyon 1, Université Lyon, Villeurbanne, France
- Department of Molecular Biology and Genetics, Cornell University, 526 Campus Rd, Ithaca, New York, 14853, USA
- Present Address: Human Genetics, McGill University, Montreal, QC, Canada
| | | | - Séverine Balmand
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Louis Beranger
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Caroline Blanc
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Aymeric Bonnamour
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Matthieu Boulesteix
- Laboratoire de Biométrie et Biologie Evolutive, UMR5558, Université Lyon 1, Université Lyon, Villeurbanne, France
| | - Nelly Burlet
- Laboratoire de Biométrie et Biologie Evolutive, UMR5558, Université Lyon 1, Université Lyon, Villeurbanne, France
| | - Federica Calevro
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Patrick Callaerts
- Department of Human Genetics, Laboratory of Behavioral and Developmental Genetics, KU Leuven, University of Leuven, B-3000, Leuven, Belgium
| | - Théo Chancy
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Hubert Charles
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
- ERABLE European Team, INRIA, Rhône-Alpes, France
| | - Stefano Colella
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
- Present Address: LSTM, Laboratoire des Symbioses Tropicales et Méditerranéennes, IRD, CIRAD, INRAE, SupAgro, Univ Montpellier, Montpellier, France
| | - André Da Silva Barbosa
- INRAE, Sorbonne Université, CNRS, IRD, UPEC, Université de Paris, Institute of Ecology and Environmental Sciences of Paris, Versailles, France
| | - Elisa Dell'Aglio
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Alex Di Genova
- Laboratoire de Biométrie et Biologie Evolutive, UMR5558, Université Lyon 1, Université Lyon, Villeurbanne, France
- ERABLE European Team, INRIA, Rhône-Alpes, France
- Instituto de Ciencias de la Ingeniería, Universidad de O'Higgins, Rancagua, Chile
| | - Gérard Febvay
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Toni Gabaldón
- Life Sciences, Barcelona Supercomputing Centre (BSC-CNS), Barcelona, Spain
- Mechanisms of Disease, Institute for Research in Biomedicine (IRB), Barcelona, Spain
- Institut Catalan de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | | | - Alexandra Gerber
- Laboratório de Bioinformática, Laboratório Nacional de Computação Científica, Petrópolis, Brazil
| | - Benjamin Gillet
- Institut de Génomique Fonctionnelle de Lyon (IGFL), Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS UMR 5242, Lyon, France
| | | | - Sandrine Hughes
- Institut de Génomique Fonctionnelle de Lyon (IGFL), Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS UMR 5242, Lyon, France
| | - Emmanuelle Jacquin-Joly
- INRAE, Sorbonne Université, CNRS, IRD, UPEC, Université de Paris, Institute of Ecology and Environmental Sciences of Paris, Versailles, France
| | - Justin Maire
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
- Present Address: School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | | | - Florent Masson
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
- Present Address: Global Health Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015, Lausanne, Switzerland
| | - Camille Meslin
- INRAE, Sorbonne Université, CNRS, IRD, UPEC, Université de Paris, Institute of Ecology and Environmental Sciences of Paris, Versailles, France
| | - Nicolas Montagné
- INRAE, Sorbonne Université, CNRS, IRD, UPEC, Université de Paris, Institute of Ecology and Environmental Sciences of Paris, Versailles, France
| | - Andrés Moya
- Institute for Integrative Systems Biology (I2SySBio), Universitat de València and Spanish Research Council (CSIC), València, Spain
- Foundation for the Promotion of Sanitary and Biomedical Research of Valencian Community (FISABIO), València, Spain
| | | | - Gautier Richard
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653, Le Rheu, France
| | - Jeb Rosen
- Institute for Systems Biology, Seattle, WA, USA
| | - Marie-France Sagot
- Laboratoire de Biométrie et Biologie Evolutive, UMR5558, Université Lyon 1, Université Lyon, Villeurbanne, France
- ERABLE European Team, INRIA, Rhône-Alpes, France
| | | | | | | | - Agnès Vallier
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Aurélien Vigneron
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
- Present Address: Department of Evolutionary Ecology, Institute for Organismic and Molecular Evolution, Johannes Gutenberg University, 55128, Mainz, Germany
| | - Anna Zaidman-Rémy
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Waël Zamoum
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Cristina Vieira
- Laboratoire de Biométrie et Biologie Evolutive, UMR5558, Université Lyon 1, Université Lyon, Villeurbanne, France.
- ERABLE European Team, INRIA, Rhône-Alpes, France.
| | - Rita Rebollo
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France.
| | - Amparo Latorre
- Institute for Integrative Systems Biology (I2SySBio), Universitat de València and Spanish Research Council (CSIC), València, Spain.
- Foundation for the Promotion of Sanitary and Biomedical Research of Valencian Community (FISABIO), València, Spain.
| | - Abdelaziz Heddi
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France.
| |
Collapse
|
30
|
Nakabachi A, Piel J, Malenovský I, Hirose Y. Comparative Genomics Underlines Multiple Roles of Profftella, an Obligate Symbiont of Psyllids: Providing Toxins, Vitamins, and Carotenoids. Genome Biol Evol 2021; 12:1975-1987. [PMID: 32797185 PMCID: PMC7643613 DOI: 10.1093/gbe/evaa175] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/11/2020] [Indexed: 12/27/2022] Open
Abstract
The Asian citrus psyllid Diaphorina citri (Insecta: Hemiptera: Psylloidea), a serious pest of citrus species worldwide, harbors vertically transmitted intracellular mutualists, Candidatus Profftella armatura (Profftella_DC, Gammaproteobacteria: Burkholderiales) and Candidatus Carsonella ruddii (Carsonella_DC, Gammaproteobacteria: Oceanospirillales). Whereas Carsonella_DC is a typical nutritional symbiont, Profftella_DC is a unique defensive symbiont with organelle-like features, including intracellular localization within the host, perfect infection in host populations, vertical transmission over evolutionary time, and drastic genome reduction down to much less than 1 Mb. Large parts of the 460-kb genome of Profftella_DC are devoted to genes for synthesizing a polyketide toxin; diaphorin. To better understand the evolution of this unusual symbiont, the present study analyzed the genome of Profftella_Dco, a sister lineage to Profftella_DC, using Diaphorina cf. continua, a host psyllid congeneric with D. citri. The genome of coresiding Carsonella (Carsonella_Dco) was also analyzed. The analysis revealed nearly perfect synteny conservation in these genomes with their counterparts from D. citri. The substitution rate analysis further demonstrated genomic stability of Profftella which is comparable to that of Carsonella. Profftella_Dco and Profftella_DC shared all genes for the biosynthesis of diaphorin, hemolysin, riboflavin, biotin, and carotenoids, underlining multiple roles of Profftella, which may contribute to stabilizing symbiotic relationships with the host. However, acyl carrier proteins were extensively amplified in polyketide synthases DipP and DipT for diaphorin synthesis in Profftella_Dco. This level of acyl carrier protein augmentation, unprecedented in modular polyketide synthases of any known organism, is not thought to influence the polyketide structure but may improve the synthesis efficiency.
Collapse
Affiliation(s)
- Atsushi Nakabachi
- Electronics-Inspired Interdisciplinary Research Institute (EIIRIS), Toyohashi University of Technology, Japan.,Department of Applied Chemistry and Life Sciences, Toyohashi University of Technology, Japan
| | - Jörn Piel
- Institute of Microbiology, Eidgenössische Technische Hochschule (ETH) Zürich, Zurich, Switzerland
| | - Igor Malenovský
- Department of Botany and Zoology, Faculty of Science, Masaryk University, Brno, Czechia
| | - Yuu Hirose
- Department of Applied Chemistry and Life Sciences, Toyohashi University of Technology, Japan
| |
Collapse
|
31
|
Hale I, Ma X, Melo ATO, Padi FK, Hendre PS, Kingan SB, Sullivan ST, Chen S, Boffa JM, Muchugi A, Danquah A, Barnor MT, Jamnadass R, Van de Peer Y, Van Deynze A. Genomic Resources to Guide Improvement of the Shea Tree. FRONTIERS IN PLANT SCIENCE 2021; 12:720670. [PMID: 34567033 PMCID: PMC8459026 DOI: 10.3389/fpls.2021.720670] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 08/04/2021] [Indexed: 05/25/2023]
Abstract
A defining component of agroforestry parklands across Sahelo-Sudanian Africa (SSA), the shea tree (Vitellaria paradoxa) is central to sustaining local livelihoods and the farming environments of rural communities. Despite its economic and cultural value, however, not to mention the ecological roles it plays as a dominant parkland species, shea remains semi-domesticated with virtually no history of systematic genetic improvement. In truth, shea's extended juvenile period makes traditional breeding approaches untenable; but the opportunity for genome-assisted breeding is immense, provided the foundational resources are available. Here we report the development and public release of such resources. Using the FALCON-Phase workflow, 162.6 Gb of long-read PacBio sequence data were assembled into a 658.7 Mbp, chromosome-scale reference genome annotated with 38,505 coding genes. Whole genome duplication (WGD) analysis based on this gene space revealed clear signatures of two ancient WGD events in shea's evolutionary past, one prior to the Astrid-Rosid divergence (116-126 Mya) and the other at the root of the order Ericales (65-90 Mya). In a first genome-wide look at the suite of fatty acid (FA) biosynthesis genes that likely govern stearin content, the primary determinant of shea butter quality, relatively high copy numbers of six key enzymes were found (KASI, KASIII, FATB, FAD2, FAD3, and FAX2), some likely originating in shea's more recent WGD event. To help translate these findings into practical tools for characterization, selection, and genome-wide association studies (GWAS), resequencing data from a shea diversity panel was used to develop a database of more than 3.5 million functionally annotated, physically anchored SNPs. Two smaller, more curated sets of suggested SNPs, one for GWAS (104,211 SNPs) and the other targeting FA biosynthesis genes (90 SNPs), are also presented. With these resources, the hope is to support national programs across the shea belt in the strategic, genome-enabled conservation and long-term improvement of the shea tree for SSA.
Collapse
Affiliation(s)
- Iago Hale
- Department of Agriculture, Nutrition, and Food Systems, University of New Hampshire, Durham, NH, United States
| | - Xiao Ma
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Center for Plant Systems Biology, VIB, Ghent, Belgium
| | - Arthur T. O. Melo
- Department of Agriculture, Nutrition, and Food Systems, University of New Hampshire, Durham, NH, United States
| | - Francis Kwame Padi
- Plant Breeding Division, Cocoa Research Institute of Ghana, Ghana Cocoa Board, New Tafo, Ghana
| | - Prasad S. Hendre
- AOCC Genomics Laboratory and Tree Genebank Research Unit, World Agroforestry (CIFOR-ICRAF), Nairobi, Kenya
| | | | | | - Shiyu Chen
- Seed Biotechnology Center, University of California, Davis, Davis, CA, United States
| | - Jean-Marc Boffa
- AOCC Genomics Laboratory and Tree Genebank Research Unit, World Agroforestry (CIFOR-ICRAF), Nairobi, Kenya
| | - Alice Muchugi
- AOCC Genomics Laboratory and Tree Genebank Research Unit, World Agroforestry (CIFOR-ICRAF), Nairobi, Kenya
- The Forage Genebank, Feed and Forage Development Program, International Livestock Research Institute, Addis Ababa, Ethiopia
| | - Agyemang Danquah
- West Africa Centre for Crop Improvement, College of Basic and Applied Sciences, University of Ghana, Accra, Ghana
| | - Michael Teye Barnor
- Plant Breeding Division, Cocoa Research Institute of Ghana, Ghana Cocoa Board, New Tafo, Ghana
| | - Ramni Jamnadass
- AOCC Genomics Laboratory and Tree Genebank Research Unit, World Agroforestry (CIFOR-ICRAF), Nairobi, Kenya
| | - Yves Van de Peer
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Center for Plant Systems Biology, VIB, Ghent, Belgium
- College of Horticulture, Academy for Advanced Interdisciplinary Studies, Nanjing Agricultural University, Nanjing, China
- Centre for Microbial Ecology and Genomics, Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, South Africa
| | - Allen Van Deynze
- AOCC Genomics Laboratory and Tree Genebank Research Unit, World Agroforestry (CIFOR-ICRAF), Nairobi, Kenya
- Seed Biotechnology Center, University of California, Davis, Davis, CA, United States
| |
Collapse
|
32
|
Xiao W, Ren L, Chen Z, Fang LT, Zhao Y, Lack J, Guan M, Zhu B, Jaeger E, Kerrigan L, Blomquist TM, Hung T, Sultan M, Idler K, Lu C, Scherer A, Kusko R, Moos M, Xiao C, Sherry ST, Abaan OD, Chen W, Chen X, Nordlund J, Liljedahl U, Maestro R, Polano M, Drabek J, Vojta P, Kõks S, Reimann E, Madala BS, Mercer T, Miller C, Jacob H, Truong T, Moshrefi A, Natarajan A, Granat A, Schroth GP, Kalamegham R, Peters E, Petitjean V, Walton A, Shen TW, Talsania K, Vera CJ, Langenbach K, de Mars M, Hipp JA, Willey JC, Wang J, Shetty J, Kriga Y, Raziuddin A, Tran B, Zheng Y, Yu Y, Cam M, Jailwala P, Nguyen C, Meerzaman D, Chen Q, Yan C, Ernest B, Mehra U, Jensen RV, Jones W, Li JL, Papas BN, Pirooznia M, Chen YC, Seifuddin F, Li Z, Liu X, Resch W, Wang J, Wu L, Yavas G, Miles C, Ning B, Tong W, Mason CE, Donaldson E, Lababidi S, Staudt LM, Tezak Z, Hong H, Wang C, Shi L. Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing. Nat Biotechnol 2021; 39:1141-1150. [PMID: 34504346 PMCID: PMC8506910 DOI: 10.1038/s41587-021-00994-5] [Citation(s) in RCA: 86] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Accepted: 06/18/2021] [Indexed: 02/01/2023]
Abstract
Clinical applications of precision oncology require accurate tests that can distinguish true cancer-specific mutations from errors introduced at each step of next-generation sequencing (NGS). To date, no bulk sequencing study has addressed the effects of cross-site reproducibility, nor the biological, technical and computational factors that influence variant identification. Here we report a systematic interrogation of somatic mutations in paired tumor-normal cell lines to identify factors affecting detection reproducibility and accuracy at six different centers. Using whole-genome sequencing (WGS) and whole-exome sequencing (WES), we evaluated the reproducibility of different sample types with varying input amount and tumor purity, and multiple library construction protocols, followed by processing with nine bioinformatics pipelines. We found that read coverage and callers affected both WGS and WES reproducibility, but WES performance was influenced by insert fragment size, genomic copy content and the global imbalance score (GIV; G > T/C > A). Finally, taking into account library preparation protocol, tumor content, read coverage and bioinformatics processes concomitantly, we recommend actionable practices to improve the reproducibility and accuracy of NGS experiments for cancer mutation detection.
Collapse
Affiliation(s)
- Wenming Xiao
- The Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, MD, USA.
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Zhong Chen
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Li Tai Fang
- Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc., Belmont, CA, USA
| | - Yongmei Zhao
- Advanced Biomedical and Computational Sciences, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Justin Lack
- Advanced Biomedical and Computational Sciences, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | | | - Bin Zhu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA
| | | | | | - Thomas M Blomquist
- Departments of Medicine and Pathology, University of Toledo Medical Center, Toledo, OH, USA
| | | | - Marc Sultan
- Biomarker Development, Novartis Institutes for Biomedical Research, Basel, Switzerland
| | - Kenneth Idler
- Computational Genomics, Genomics Research Center, AbbVie, North Chicago, IL, USA
| | - Charles Lu
- Computational Genomics, Genomics Research Center, AbbVie, North Chicago, IL, USA
| | - Andreas Scherer
- Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
| | | | - Malcolm Moos
- The Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Stephen T Sherry
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Ogan D Abaan
- Illumina Inc., Foster City, CA, USA
- Seven Bridges Genomics Inc., Cambridge, MA, USA
| | - Wanqiu Chen
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Xin Chen
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Jessica Nordlund
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Ulrika Liljedahl
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Centro di Riferimento Oncologico di Aviano IRCCS, National Cancer Institute, Unit of Oncogenetics and Functional Oncogenomics, Aviano, Italy
| | - Roberta Maestro
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Centro di Riferimento Oncologico di Aviano IRCCS, National Cancer Institute, Unit of Oncogenetics and Functional Oncogenomics, Aviano, Italy
| | - Maurizio Polano
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Centro di Riferimento Oncologico di Aviano IRCCS, National Cancer Institute, Unit of Oncogenetics and Functional Oncogenomics, Aviano, Italy
| | - Jiri Drabek
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- IMTM, Faculty of Medicine and Dentistry, Palacky University Olomouc, Olomouc, Czech Republic
| | - Petr Vojta
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- IMTM, Faculty of Medicine and Dentistry, Palacky University Olomouc, Olomouc, Czech Republic
| | - Sulev Kõks
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Perron Institute for Neurological and Translational Science, Nedlands, Perth, Western Australia, Australia
- Centre for Molecular Medicine and Innovative Therapeutics, Murdoch University, Murdoch, Perth, Western Australia, Australia
| | - Ene Reimann
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Bindu Swapna Madala
- Garvan Institute of Medical Research, The Kinghorn Cancer Centre, Darlinghurst, New South Wales, Australia
| | - Timothy Mercer
- Garvan Institute of Medical Research, The Kinghorn Cancer Centre, Darlinghurst, New South Wales, Australia
| | - Chris Miller
- Computational Genomics, Genomics Research Center, AbbVie, North Chicago, IL, USA
| | - Howard Jacob
- Computational Genomics, Genomics Research Center, AbbVie, North Chicago, IL, USA
| | | | | | | | | | | | | | | | - Virginie Petitjean
- Biomarker Development, Novartis Institutes for Biomedical Research, Basel, Switzerland
| | - Ashley Walton
- Advanced Biomedical and Computational Sciences, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Tsai-Wei Shen
- Advanced Biomedical and Computational Sciences, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Keyur Talsania
- Advanced Biomedical and Computational Sciences, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Cristobal Juan Vera
- Advanced Biomedical and Computational Sciences, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | | | | | - Jennifer A Hipp
- Departments of Medicine and Pathology, University of Toledo Medical Center, Toledo, OH, USA
| | - James C Willey
- Departments of Medicine and Pathology, University of Toledo Medical Center, Toledo, OH, USA
| | - Jing Wang
- National Institute of Metrology, Beijing, China
| | - Jyoti Shetty
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Yuliya Kriga
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Arati Raziuddin
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Bao Tran
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Margaret Cam
- CCR Collaborative Bioinformatics Resource, Office of Science and Technology Resources, Center for Cancer Research, Bethesda, MD, USA
| | - Parthav Jailwala
- CCR Collaborative Bioinformatics Resource, Office of Science and Technology Resources, Center for Cancer Research, Bethesda, MD, USA
| | - Cu Nguyen
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology, National Cancer Institute, Rockville, MD, USA
| | - Daoud Meerzaman
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology, National Cancer Institute, Rockville, MD, USA
| | - Qingrong Chen
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology, National Cancer Institute, Rockville, MD, USA
| | - Chunhua Yan
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology, National Cancer Institute, Rockville, MD, USA
| | | | | | - Roderick V Jensen
- Department of Biological Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
| | | | - Jian-Liang Li
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, Durham, NC, USA
| | - Brian N Papas
- Integrative Bioinformatics, National Institute of Environmental Health Sciences, Durham, NC, USA
| | - Mehdi Pirooznia
- Bioinformatics and Computational Biology Core, National Heart Lung and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Yun-Ching Chen
- Bioinformatics and Computational Biology Core, National Heart Lung and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Fayaz Seifuddin
- Bioinformatics and Computational Biology Core, National Heart Lung and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Zhipan Li
- Sentieon Inc., Mountain View, CA, USA
| | - Xuelu Liu
- Center for Information Technology, National Institutes of Health, Bethesda, MD, USA
| | - Wolfgang Resch
- Center for Information Technology, National Institutes of Health, Bethesda, MD, USA
| | | | - Leihong Wu
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Gokhan Yavas
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Corey Miles
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Baitang Ning
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Weida Tong
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Eric Donaldson
- The Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Samir Lababidi
- Office of the Chief Scientist, Office of the Commissioner, US Food and Drug Information, Silver Spring, MD, USA
| | - Louis M Staudt
- Lymphoid Malignancies Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Zivana Tezak
- The Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, MD, USA
| | - Huixiao Hong
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Charles Wang
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA.
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China.
| |
Collapse
|
33
|
Boeuf D, Eppley JM, Mende DR, Malmstrom RR, Woyke T, DeLong EF. Metapangenomics reveals depth-dependent shifts in metabolic potential for the ubiquitous marine bacterial SAR324 lineage. MICROBIOME 2021; 9:172. [PMID: 34389059 PMCID: PMC8364033 DOI: 10.1186/s40168-021-01119-5] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 06/22/2021] [Indexed: 06/01/2023]
Abstract
BACKGROUND Oceanic microbiomes play a pivotal role in the global carbon cycle and are central to the transformation and recycling of carbon and energy in the ocean's interior. SAR324 is a ubiquitous but poorly understood uncultivated clade of Deltaproteobacteria that inhabits the entire water column, from ocean surface waters to its deep interior. Although some progress has been made in elucidating potential metabolic traits of SAR324 in the dark ocean, very little is known about the ecology and the metabolic capabilities of this group in the euphotic and twilight zones. To investigate the comparative genomics, ecology, and physiological potential of the SAR324 clade, we examined the distribution and variability of key genomic features and metabolic pathways in this group from surface waters to the abyss in the North Pacific Subtropical Gyre, one of the largest biomes on Earth. RESULTS We leveraged a pangenomic ecological approach, combining spatio-temporally resolved single-amplified genome, metagenomic, and metatranscriptomic datasets. The data revealed substantial genomic diversity throughout the SAR324 clade, with distinct depth and temporal distributions that clearly differentiated ecotypes. Phylogenomic subclade delineation, environmental distributions, genomic feature similarities, and metabolic capacities revealed strong congruence. The four SAR324 ecotypes delineated in this study revealed striking divergence from one another with respect to their habitat-specific metabolic potentials. The ecotypes living in the dark or twilight oceans shared genomic features and metabolic capabilities consistent with a sulfur-based chemolithoautotrophic lifestyle. In contrast, those inhabiting the sunlit ocean displayed higher plasticity energy-related metabolic pathways, supporting a presumptive photoheterotrophic lifestyle. In epipelagic SAR324 ecotypes, we observed the presence of two types of proton-pumping rhodopsins, as well as genomic, transcriptomic, and ecological evidence for active photoheterotrophy, based on xanthorhodopsin-like light-harvesting proteins. CONCLUSIONS Combining pangenomic and both metagenomic and metatranscriptomic profiling revealed a striking divergence in the vertical distribution, genomic composition, metabolic potential, and predicted lifestyle strategies of geographically co-located members of the SAR324 bacterial clade. The results highlight the utility of metapangenomic approaches employed across environmental gradients, to decipher the properties and variation in function and ecological traits of specific phylogenetic clades within complex microbiomes. Video abstract.
Collapse
Affiliation(s)
- Dominique Boeuf
- Daniel K. Inouye Center for Microbial Oceanography: Research and Education, University of Hawaii, Manoa, Honolulu, HI 96822 USA
| | - John M. Eppley
- Daniel K. Inouye Center for Microbial Oceanography: Research and Education, University of Hawaii, Manoa, Honolulu, HI 96822 USA
| | - Daniel R. Mende
- Daniel K. Inouye Center for Microbial Oceanography: Research and Education, University of Hawaii, Manoa, Honolulu, HI 96822 USA
| | | | - Tanja Woyke
- DOE Joint Genome Institute, Berkeley, CA 94720 USA
| | - Edward F. DeLong
- Daniel K. Inouye Center for Microbial Oceanography: Research and Education, University of Hawaii, Manoa, Honolulu, HI 96822 USA
| |
Collapse
|
34
|
Hempel E, Westbury MV, Grau JH, Trinks A, Paijmans JLA, Kliver S, Barlow A, Mayer F, Müller J, Chen L, Koepfli KP, Hofreiter M, Bibi F. Diversity and Paleodemography of the Addax ( Addax nasomaculatus), a Saharan Antelope on the Verge of Extinction. Genes (Basel) 2021; 12:genes12081236. [PMID: 34440410 PMCID: PMC8394336 DOI: 10.3390/genes12081236] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Revised: 08/05/2021] [Accepted: 08/08/2021] [Indexed: 12/18/2022] Open
Abstract
Since the 19th century, the addax (Addax nasomaculatus) has lost approximately 99% of its former range. Along with its close relatives, the blue antelope (Hippotragus leucophaeus) and the scimitar-horned oryx (Oryx dammah), the addax may be the third large African mammal species to go extinct in the wild in recent times. Despite this, the evolutionary history of this critically endangered species remains virtually unknown. To gain insight into the population history of the addax, we used hybridization capture to generate ten complete mitochondrial genomes from historical samples and assembled a nuclear genome. We found that both mitochondrial and nuclear diversity are low compared to other African bovids. Analysis of mitochondrial genomes revealed a most recent common ancestor ~32 kya (95% CI 11–58 kya) and weak phylogeographic structure, indicating that the addax likely existed as a highly mobile, panmictic population across its Sahelo–Saharan range in the past. PSMC analysis revealed a continuous decline in effective population size since ~2 Ma, with short intermediate increases at ~500 and ~44 kya. Our results suggest that the addax went through a major bottleneck in the Late Pleistocene, remaining at low population size prior to the human disturbances of the last few centuries.
Collapse
Affiliation(s)
- Elisabeth Hempel
- Evolutionary Adaptive Genomics, Institute of Biochemistry and Biology, Faculty of Science, University of Potsdam, Karl-Liebknecht-Straße 24-25, 14476 Potsdam, Germany; (J.H.G.); (M.H.)
- Museum für Naturkunde, Berlin, Leibniz Institute for Evolution and Biodiversity Science, Invalidenstraße 43, 10115 Berlin, Germany; (F.M.); (J.M.); (F.B.)
- Correspondence:
| | - Michael V. Westbury
- Evolutionary Adaptive Genomics, Institute of Biochemistry and Biology, Faculty of Science, University of Potsdam, Karl-Liebknecht-Straße 24-25, 14476 Potsdam, Germany; (J.H.G.); (M.H.)
- Section for Evolutionary Genomics, The GLOBE Institute, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark;
| | - José H. Grau
- Evolutionary Adaptive Genomics, Institute of Biochemistry and Biology, Faculty of Science, University of Potsdam, Karl-Liebknecht-Straße 24-25, 14476 Potsdam, Germany; (J.H.G.); (M.H.)
- Museum für Naturkunde, Berlin, Leibniz Institute for Evolution and Biodiversity Science, Invalidenstraße 43, 10115 Berlin, Germany; (F.M.); (J.M.); (F.B.)
| | - Alexandra Trinks
- Evolutionary Adaptive Genomics, Institute of Biochemistry and Biology, Faculty of Science, University of Potsdam, Karl-Liebknecht-Straße 24-25, 14476 Potsdam, Germany; (J.H.G.); (M.H.)
- Institute of Pathology, Charité–Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany;
| | - Johanna L. A. Paijmans
- Evolutionary Adaptive Genomics, Institute of Biochemistry and Biology, Faculty of Science, University of Potsdam, Karl-Liebknecht-Straße 24-25, 14476 Potsdam, Germany; (J.H.G.); (M.H.)
- Department of Zoology, University of Cambridge, Downing Street, Cambridge CB2 3EJ, UK;
| | - Sergei Kliver
- Institute of Molecular and Cellular Biology SB RAS, 8/2 Acad. Lavrentiev Ave, 630090 Novosibirsk, Russia;
| | - Axel Barlow
- Evolutionary Adaptive Genomics, Institute of Biochemistry and Biology, Faculty of Science, University of Potsdam, Karl-Liebknecht-Straße 24-25, 14476 Potsdam, Germany; (J.H.G.); (M.H.)
- School of Science and Technology, Nottingham Trent University, Clifton Lane, Nottingham NG11 8NS, UK;
| | - Frieder Mayer
- Museum für Naturkunde, Berlin, Leibniz Institute for Evolution and Biodiversity Science, Invalidenstraße 43, 10115 Berlin, Germany; (F.M.); (J.M.); (F.B.)
| | - Johannes Müller
- Museum für Naturkunde, Berlin, Leibniz Institute for Evolution and Biodiversity Science, Invalidenstraße 43, 10115 Berlin, Germany; (F.M.); (J.M.); (F.B.)
| | - Lei Chen
- School of Ecology and Environment, Northwestern Polytechnical University, Xi’an 710072, China;
| | - Klaus-Peter Koepfli
- Smithsonian-Mason School of Conservation, George Mason University, Front Royal, VA 22630, USA;
- Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park, Front Royal, VA 22630, USA
- Computer Technologies Laboratory, ITMO University, 197101 Saint Petersburg, Russia
| | - Michael Hofreiter
- Evolutionary Adaptive Genomics, Institute of Biochemistry and Biology, Faculty of Science, University of Potsdam, Karl-Liebknecht-Straße 24-25, 14476 Potsdam, Germany; (J.H.G.); (M.H.)
| | - Faysal Bibi
- Museum für Naturkunde, Berlin, Leibniz Institute for Evolution and Biodiversity Science, Invalidenstraße 43, 10115 Berlin, Germany; (F.M.); (J.M.); (F.B.)
| |
Collapse
|
35
|
Zhang X, Ping P, Hutvagner G, Blumenstein M, Li J. Aberration-corrected ultrafine analysis of miRNA reads at single-base resolution: a k-mer lattice approach. Nucleic Acids Res 2021; 49:e106. [PMID: 34291293 PMCID: PMC8631080 DOI: 10.1093/nar/gkab610] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 07/01/2021] [Accepted: 07/06/2021] [Indexed: 12/21/2022] Open
Abstract
Raw sequencing reads of miRNAs contain machine-made substitution errors, or even insertions and deletions (indels). Although the error rate can be low at 0.1%, precise rectification of these errors is critically important because isoform variation analysis at single-base resolution such as novel isomiR discovery, editing events understanding, differential expression analysis, or tissue-specific isoform identification is very sensitive to base positions and copy counts of the reads. Existing error correction methods do not work for miRNA sequencing data attributed to miRNAs’ length and per-read-coverage properties distinct from DNA or mRNA sequencing reads. We present a novel lattice structure combining kmers, (k – 1)mers and (k + 1)mers to address this problem. The method is particularly effective for the correction of indel errors. Extensive tests on datasets having known ground truth of errors demonstrate that the method is able to remove almost all of the errors, without introducing any new error, to improve the data quality from every-50-reads containing one error to every-1300-reads containing one error. Studies on experimental miRNA sequencing datasets show that the errors are often rectified at the 5′ ends and the seed regions of the reads, and that there are remarkable changes after the correction in miRNA isoform abundance, volume of singleton reads, overall entropy, isomiR families, tissue-specific miRNAs, and rare-miRNA quantities.
Collapse
Affiliation(s)
- Xuan Zhang
- Data Science Institute, University of Technology Sydney, PO Box 123, Broadway, NSW 2007, Australia
| | - Pengyao Ping
- Data Science Institute, University of Technology Sydney, PO Box 123, Broadway, NSW 2007, Australia
| | - Gyorgy Hutvagner
- School of Biomedical Engineering, Faculty of Engineering and IT, University of Technology Sydney, PO Box 123, Broadway, NSW 2007, Australia
| | - Michael Blumenstein
- Faculty of Engineering and IT, University of Technology Sydney, PO Box 123, Broadway, NSW 2007, Australia
| | - Jinyan Li
- To whom correspondence should be addressed. Tel: +61 295149264; Fax: +61 295149264;
| |
Collapse
|
36
|
Mascher M, Wicker T, Jenkins J, Plott C, Lux T, Koh CS, Ens J, Gundlach H, Boston LB, Tulpová Z, Holden S, Hernández-Pinzón I, Scholz U, Mayer KFX, Spannagl M, Pozniak CJ, Sharpe AG, Šimková H, Moscou MJ, Grimwood J, Schmutz J, Stein N. Long-read sequence assembly: a technical evaluation in barley. THE PLANT CELL 2021; 33:1888-1906. [PMID: 33710295 PMCID: PMC8290290 DOI: 10.1093/plcell/koab077] [Citation(s) in RCA: 192] [Impact Index Per Article: 48.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Accepted: 02/28/2021] [Indexed: 05/19/2023]
Abstract
Sequence assembly of large and repeat-rich plant genomes has been challenging, requiring substantial computational resources and often several complementary sequence assembly and genome mapping approaches. The recent development of fast and accurate long-read sequencing by circular consensus sequencing (CCS) on the PacBio platform may greatly increase the scope of plant pan-genome projects. Here, we compare current long-read sequencing platforms regarding their ability to rapidly generate contiguous sequence assemblies in pan-genome studies of barley (Hordeum vulgare). Most long-read assemblies are clearly superior to the current barley reference sequence based on short-reads. Assemblies derived from accurate long reads excel in most metrics, but the CCS approach was the most cost-effective strategy for assembling tens of barley genomes. A downsampling analysis indicated that 20-fold CCS coverage can yield very good sequence assemblies, while even five-fold CCS data may capture the complete sequence of most genes. We present an updated reference genome assembly for barley with near-complete representation of the repeat-rich intergenic space. Long-read assembly can underpin the construction of accurate and complete sequences of multiple genomes of a species to build pan-genome infrastructures in Triticeae crops and their wild relatives.
Collapse
Affiliation(s)
- Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Seeland 06466, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Leipzig 04103, Germany
| | - Thomas Wicker
- Department of Plant and Microbial Biology, University of Zürich, Zürich 8008, Switzerland
| | - Jerry Jenkins
- HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806
| | | | - Thomas Lux
- PGSB–Plant Genome and Systems Biology, Helmholtz Center Munich–German Research Center for Environmental Health, Neuherberg 85764, Germany
| | - Chu Shin Koh
- Global Institute for Food Security, University of Saskatchewan, Saskatoon SK S7N 4L8, Canada
| | - Jennifer Ens
- Department of Plant Sciences, Crop Development Centre, University of Saskatchewan, Saskatoon SK S7N 5A8, Canada
| | - Heidrun Gundlach
- PGSB–Plant Genome and Systems Biology, Helmholtz Center Munich–German Research Center for Environmental Health, Neuherberg 85764, Germany
| | - Lori B Boston
- HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806
| | - Zuzana Tulpová
- Institute of Experimental Botany of the Czech Academy of Sciences, Centre of the Region Haná for Biotechnological and Agricultural Research, Olomouc 78371, Czech Republic
| | - Samuel Holden
- The Sainsbury Laboratory, University of East Anglia, Norwich NR4 7UH, UK
| | | | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Seeland 06466, Germany
| | - Klaus F X Mayer
- PGSB–Plant Genome and Systems Biology, Helmholtz Center Munich–German Research Center for Environmental Health, Neuherberg 85764, Germany
| | - Manuel Spannagl
- PGSB–Plant Genome and Systems Biology, Helmholtz Center Munich–German Research Center for Environmental Health, Neuherberg 85764, Germany
| | - Curtis J Pozniak
- Department of Plant Sciences, Crop Development Centre, University of Saskatchewan, Saskatoon SK S7N 5A8, Canada
| | - Andrew G Sharpe
- Global Institute for Food Security, University of Saskatchewan, Saskatoon SK S7N 4L8, Canada
| | - Hana Šimková
- Institute of Experimental Botany of the Czech Academy of Sciences, Centre of the Region Haná for Biotechnological and Agricultural Research, Olomouc 78371, Czech Republic
| | - Matthew J Moscou
- The Sainsbury Laboratory, University of East Anglia, Norwich NR4 7UH, UK
| | - Jane Grimwood
- HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806
| | - Jeremy Schmutz
- HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Seeland 06466, Germany
- Center for Integrated Breeding Research (CiBreed), Georg-August-University Göttingen, Göttingen 37073, Germany
| |
Collapse
|
37
|
Xu W, Tucker JR, Bekele WA, You FM, Fu YB, Khanal R, Yao Z, Singh J, Boyle B, Beattie AD, Belzile F, Mascher M, Tinker NA, Badea A. Genome Assembly of the Canadian two-row Malting Barley cultivar AAC Synergy. G3-GENES GENOMES GENETICS 2021; 11:6128399. [PMID: 33856017 PMCID: PMC8049406 DOI: 10.1093/g3journal/jkab031] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 01/22/2021] [Indexed: 12/20/2022]
Abstract
Barley (Hordeum vulgare L.) is one of the most important global crops. The six-row barley cultivar Morex reference genome has been used by the barley research community worldwide. However, this reference genome can have limitations when used for genomic and genetic diversity analysis studies, gene discovery, and marker development when working in two-row germplasm that is more common to Canadian barley. Here we assembled, for the first time, the genome sequence of a Canadian two-row malting barley, cultivar AAC Synergy. We applied deep Illumina paired-end reads, long mate-pair reads, PacBio sequences, 10X chromium linked read libraries, and chromosome conformation capture sequencing (Hi-C) to generate a contiguous assembly. The genome assembled from super-scaffolds had a size of 4.85 Gb, N50 of 2.32 Mb, and an estimated 93.9% of complete genes from a plant database (BUSCO, benchmarking universal single-copy orthologous genes). After removal of small scaffolds (< 300 Kb), the assembly was arranged into pseudomolecules of 4.14 Gb in size with seven chromosomes plus unanchored scaffolds. The completeness and annotation of the assembly were assessed by comparing it with the updated version of six-row Morex and recently released two-row Golden Promise genome assemblies.
Collapse
Affiliation(s)
- Wayne Xu
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, 101 Route 100 Morden, MB R6M 1Y5, Canada
| | - James R Tucker
- Brandon Research and Development Centre, Agriculture and Agri-Food Canada, 2701 Grand Valley Road, Brandon, MB R7A 5Y3, Canada
| | - Wubishet A Bekele
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, 960 Carling Avenue, Ottawa, ON K1A 0C6, Canada
| | - Frank M You
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, 960 Carling Avenue, Ottawa, ON K1A 0C6, Canada
| | - Yong-Bi Fu
- Plant Gene Resources of Canada, Saskatoon Research and Development Centre, Agriculture and Agri-Food Canada, Saskatoon, SK S7N 0X2, Canada
| | - Raja Khanal
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, 960 Carling Avenue, Ottawa, ON K1A 0C6, Canada
| | - Zhen Yao
- Morden Research and Development Centre, Agriculture and Agri-Food Canada, 101 Route 100 Morden, MB R6M 1Y5, Canada
| | - Jaswinder Singh
- Plant Science Department, McGill University, 21111 Lakeshore Road, Ste. Anne de Bellevue, Quebec, QC H9X 3V9, Canada
| | - Brian Boyle
- Institut de Biologie Intégrative et des Systèmes, Université Laval, Québec, QC G1V 0A6, Canada
| | - Aaron D Beattie
- Crop Development Centre, University of Saskatchewan, Saskatoon, SK S7N 5A8, Canada
| | - François Belzile
- Institut de Biologie Intégrative et des Systèmes, Université Laval, Québec, QC G1V 0A6, Canada.,Département de phytologie, Université Laval, Québec, QC G1V 0A6, Canada
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, 06466 Seeland, Germany.,German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, 04103 Leipzig, Germany
| | - Nicholas A Tinker
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, 960 Carling Avenue, Ottawa, ON K1A 0C6, Canada
| | - Ana Badea
- Brandon Research and Development Centre, Agriculture and Agri-Food Canada, 2701 Grand Valley Road, Brandon, MB R7A 5Y3, Canada
| |
Collapse
|
38
|
Abstract
Rock varnish is a prominent feature of desert landscapes and the canvas for many prehistoric petroglyphs. How it forms—and, in particular, the basis for its extremely high manganese content—has been an enduring mystery. The work presented here establishes a biological mechanism for this manganese enrichment, underpinned by an apparent antioxidant strategy that enables microbes to survive in the harsh environments where varnish forms. The understanding that varnish is the residue of life using manganese to thrive in the desert illustrates that, even in extremely stark environments, the imprint of life is omnipresent on the landscape. Desert varnish is a dark rock coating that forms in arid environments worldwide. It is highly and selectively enriched in manganese, the mechanism for which has been a long-standing geological mystery. We collected varnish samples from diverse sites across the western United States, examined them in petrographic thin section using microscale chemical imaging techniques, and investigated the associated microbial communities using 16S amplicon and shotgun metagenomic DNA sequencing. Our analyses described a material governed by sunlight, water, and manganese redox cycling that hosts an unusually aerobic microbial ecosystem characterized by a remarkable abundance of photosynthetic Cyanobacteria in the genus Chroococcidiopsis as the major autotrophic constituent. We then showed that diverse Cyanobacteria, including the relevant Chroococcidiopsis taxon, accumulate extraordinary amounts of intracellular manganese—over two orders of magnitude higher manganese content than other cells. The speciation of this manganese determined by advanced paramagnetic resonance techniques suggested that the Cyanobacteria use it as a catalytic antioxidant—a valuable adaptation for coping with the substantial oxidative stress present in this environment. Taken together, these results indicated that the manganese enrichment in varnish is related to its specific uptake and use by likely founding members of varnish microbial communities.
Collapse
|
39
|
Kallenborn F, Hildebrandt A, Schmidt B. CARE: context-aware sequencing read error correction. Bioinformatics 2021; 37:889-895. [PMID: 32818262 DOI: 10.1093/bioinformatics/btaa738] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Revised: 07/14/2020] [Accepted: 08/14/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Error correction is a fundamental pre-processing step in many Next-Generation Sequencing (NGS) pipelines, in particular for de novo genome assembly. However, existing error correction methods either suffer from high false-positive rates since they break reads into independent k-mers or do not scale efficiently to large amounts of sequencing reads and complex genomes. RESULTS We present CARE-an alignment-based scalable error correction algorithm for Illumina data using the concept of minhashing. Minhashing allows for efficient similarity search within large sequencing read collections which enables fast computation of high-quality multiple alignments. Sequencing errors are corrected by detailed inspection of the corresponding alignments. Our performance evaluation shows that CARE generates significantly fewer false-positive corrections than state-of-the-art tools (Musket, SGA, BFC, Lighter, Bcool, Karect) while maintaining a competitive number of true positives. When used prior to assembly it can achieve superior de novo assembly results for a number of real datasets. CARE is also the first multiple sequence alignment-based error corrector that is able to process a human genome Illumina NGS dataset in only 4 h on a single workstation using GPU acceleration. AVAILABILITYAND IMPLEMENTATION CARE is open-source software written in C++ (CPU version) and in CUDA/C++ (GPU version). It is licensed under GPLv3 and can be downloaded at https://github.com/fkallen/CARE. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Felix Kallenborn
- Department of Computer Science, Johannes Gutenberg University, Mainz 55122, Germany
| | - Andreas Hildebrandt
- Department of Computer Science, Johannes Gutenberg University, Mainz 55122, Germany
| | - Bertil Schmidt
- Department of Computer Science, Johannes Gutenberg University, Mainz 55122, Germany
| |
Collapse
|
40
|
Panibe JP, Wang L, Li J, Li MY, Lee YC, Wang CS, Ku MSB, Lu MYJ, Li WH. Chromosomal-level genome assembly of the semi-dwarf rice Taichung Native 1, an initiator of Green Revolution. Genomics 2021; 113:2656-2674. [PMID: 34111524 DOI: 10.1016/j.ygeno.2021.06.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 05/25/2021] [Accepted: 06/04/2021] [Indexed: 10/21/2022]
Abstract
Here we report the 409.5 Mb chromosome-level assembly of the first bred semi-dwarf rice, the Taichung Native 1 (TN1), which served as the template for the development of the Green Revolution (GR) cultivar IR8 "miracle rice". We sequenced the TN1 genome utilizing multiple platforms and produced PacBio long reads, Illumina paired-end reads, Illumina mate-pair reads and 10x Genomics linked reads. We used a hybrid approach to assemble the 226× coverage of sequences by a combination of de novo and reference-guided approaches. The assembled TN1 genome has an N50 scaffold size of 33.1 Mb with the longest measuring 45.5 Mb. We annotated 37,526 genes, in which 24,102 (64.23%) were assigned Blast2GO annotations. The genome has 4672 or 95.4% complete BUSCOs and a repeat content of 51.52%. We developed our own method of creating a GR pangenome using the orthologous relationships of the proteins of TN1, IR8, MH63 and IR64, identifying 16,999 core orthologue groups of Green Revolution. From the pangenome, we identified a set of shared and unique gene ontology terms for the accessory clusters, characterizing TN1, IR8, MH63 and IR64. This TN1 genome assembly and GR pangenome will be a resource for new genomic discoveries about Green Revolution, and for improving the disease and insect resistances and the yield of rice.
Collapse
Affiliation(s)
- Jerome P Panibe
- Institute of Molecular and Cellular Biology, National Tsing Hua University, Hsinchu 300, Taiwan; Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei 115, Taiwan; Biodiversity Research Center, Academia Sinica, Taipei 115, Taiwan
| | - Long Wang
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, 210023 Nanjing, China
| | - Jengyi Li
- Biodiversity Research Center, Academia Sinica, Taipei 115, Taiwan
| | - Meng-Yun Li
- Biodiversity Research Center, Academia Sinica, Taipei 115, Taiwan
| | - Yi-Chen Lee
- Biodiversity Research Center, Academia Sinica, Taipei 115, Taiwan
| | - Chang-Sheng Wang
- Department of Agronomy, National Chung-Hsing University, Taichung 40227, Taiwan
| | - Maurice S B Ku
- Department of Bioagricultutral Sciences, National Chiayi University, Chiayi 60004, Taiwan; School of Biological Sciences, Washington State University, Pullman 99164, WA, USA
| | - Mei-Yeh Jade Lu
- Biodiversity Research Center, Academia Sinica, Taipei 115, Taiwan
| | - Wen-Hsiung Li
- Institute of Molecular and Cellular Biology, National Tsing Hua University, Hsinchu 300, Taiwan; Biodiversity Research Center, Academia Sinica, Taipei 115, Taiwan; Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
41
|
Zhang X, Liu Y, Yu Z, Blumenstein M, Hutvagner G, Li J. Instance-based error correction for short reads of disease-associated genes. BMC Bioinformatics 2021; 22:142. [PMID: 34078284 PMCID: PMC8170817 DOI: 10.1186/s12859-021-04058-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 03/02/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Genomic reads from sequencing platforms contain random errors. Global correction algorithms have been developed, aiming to rectify all possible errors in the reads using generic genome-wide patterns. However, the non-uniform sequencing depths hinder the global approach to conduct effective error removal. As some genes may get under-corrected or over-corrected by the global approach, we conduct instance-based error correction for short reads of disease-associated genes or pathways. The paramount requirement is to ensure the relevant reads, instead of the whole genome, are error-free to provide significant benefits for single-nucleotide polymorphism (SNP) or variant calling studies on the specific genes. RESULTS To rectify possible errors in the short reads of disease-associated genes, our novel idea is to exploit local sequence features and statistics directly related to these genes. Extensive experiments are conducted in comparison with state-of-the-art methods on both simulated and real datasets of lung cancer associated genes (including single-end and paired-end reads). The results demonstrated the superiority of our method with the best performance on precision, recall and gain rate, as well as on sequence assembly results (e.g., N50, the length of contig and contig quality). CONCLUSION Instance-based strategy makes it possible to explore fine-grained patterns focusing on specific genes, providing high precision error correction and convincing gene sequence assembly. SNP case studies show that errors occurring at some traditional SNP areas can be accurately corrected, providing high precision and sensitivity for investigations on disease-causing point mutations.
Collapse
Affiliation(s)
- Xuan Zhang
- Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney, Ultimo, NSW, 2007, Australia
| | - Yuansheng Liu
- Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney, Ultimo, NSW, 2007, Australia
| | - Zuguo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, 411105, China
| | - Michael Blumenstein
- Faculty of Engineering and IT, University of Technology Sydney, Ultimo, NSW, 2007, Australia
| | - Gyorgy Hutvagner
- Faculty of Engineering and IT, University of Technology Sydney, Ultimo, NSW, 2007, Australia
| | - Jinyan Li
- Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney, Ultimo, NSW, 2007, Australia.
| |
Collapse
|
42
|
Skarzyńska A, Pawełkowicz M, Pląder W. Influence of transgenesis on genome variability in cucumber lines with a thaumatin II gene. PHYSIOLOGY AND MOLECULAR BIOLOGY OF PLANTS : AN INTERNATIONAL JOURNAL OF FUNCTIONAL PLANT BIOLOGY 2021; 27:985-996. [PMID: 34092948 PMCID: PMC8139995 DOI: 10.1007/s12298-021-00990-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 03/25/2021] [Accepted: 04/04/2021] [Indexed: 06/01/2023]
Abstract
UNLABELLED The development of new plant varieties by genetic modification aims at improving their features or introducing new qualities. However, concerns about the unintended effects of transgenes and negative environmental impact of genetically modified plants are an obstacle for the use of these plants in crops. To analyze the impact of transgenesis on plant genomes, we analyze three cucumber transgenic lines with an introduced thaumatin II gene. After genomes sequencing, we analyzed the transgene insertion site and performed variant prediction. As a result, we obtained similar number of variants for all analyzed lines (average of 4307 polymorphisms), with high abundance in one region of chromosome 4. According to SnpEff analysis, the presence of genomic variants generally does not influence the genome functionality, as less than 2% of polymorphisms have high impact. Moreover, analysis indicates that these changes were more likely induced by in vitro culture than by the transgenesis itself. The insertion site analysis shows that the region of transgene integration could cause changes in gene expression, by gene disruption or loss of promoter region continuity. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s12298-021-00990-8.
Collapse
Affiliation(s)
- Agnieszka Skarzyńska
- Department of Plant Genetics, Breeding and Biotechnology, Institute of Biology, Warsaw University of Life Sciences, Warsaw, Poland
| | - Magdalena Pawełkowicz
- Department of Plant Genetics, Breeding and Biotechnology, Institute of Biology, Warsaw University of Life Sciences, Warsaw, Poland
| | - Wojciech Pląder
- Department of Plant Genetics, Breeding and Biotechnology, Institute of Biology, Warsaw University of Life Sciences, Warsaw, Poland
| |
Collapse
|
43
|
Albanese D, Coleine C, Rota-Stabelli O, Onofri S, Tringe SG, Stajich JE, Selbmann L, Donati C. Pre-Cambrian roots of novel Antarctic cryptoendolithic bacterial lineages. MICROBIOME 2021; 9:63. [PMID: 33741058 PMCID: PMC7980648 DOI: 10.1186/s40168-021-01021-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 02/02/2021] [Indexed: 05/25/2023]
Abstract
BACKGROUND Cryptoendolithic communities are microbial ecosystems dwelling inside porous rocks that are able to persist at the edge of the biological potential for life in the ice-free areas of the Antarctic desert. These regions include the McMurdo Dry Valleys, often accounted as the closest terrestrial counterpart of the Martian environment and thought to be devoid of life until the discovery of these cryptic life-forms. Despite their interest as a model for the early colonization by living organisms of terrestrial ecosystems and for adaptation to extreme conditions of stress, little is known about the evolution, diversity, and genetic makeup of bacterial species that reside in these environments. Using the Illumina Novaseq platform, we generated the first metagenomes from rocks collected in Continental Antarctica over a distance of about 350 km along an altitudinal transect from 834 up to 3100 m above sea level (a.s.l.). RESULTS A total of 497 draft bacterial genome sequences were assembled and clustered into 269 candidate species that lack a representative genome in public databases. Actinobacteria represent the most abundant phylum, followed by Chloroflexi and Proteobacteria. The "Candidatus Jiangella antarctica" has been recorded across all samples, suggesting a high adaptation and specialization of this species to the harshest Antarctic desert environment. The majority of these new species belong to monophyletic bacterial clades that diverged from related taxa in a range from 1.2 billion to 410 Ma and are functionally distinct from known related taxa. CONCLUSIONS Our findings significantly increase the repertoire of genomic data for several taxa and, to date, represent the first example of bacterial genomes recovered from endolithic communities. Their ancient origin seems to not be related to the geological history of the continent, rather they may represent evolutionary remnants of pristine clades that evolved across the Tonian glaciation. These unique genomic resources will underpin future studies on the structure, evolution, and function of these ecosystems at the edge of life. Video abstract.
Collapse
Affiliation(s)
- Davide Albanese
- Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38098 San Michele all’Adige, Italy
| | - Claudia Coleine
- Department of Ecological and Biological Sciences, University of Tuscia, Largo dell’Università, 01100 Viterbo, Italy
| | - Omar Rota-Stabelli
- Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38098 San Michele all’Adige, Italy
| | - Silvano Onofri
- Department of Ecological and Biological Sciences, University of Tuscia, Largo dell’Università, 01100 Viterbo, Italy
| | - Susannah G. Tringe
- Department of Energy Joint Genome Institute, One Cyclotron Road, Berkeley, CA 94720 USA
| | - Jason E. Stajich
- Department of Microbiology and Plant Pathology and Institute of Integrative Genome Biology, University of California, Watkins Drive 3401, Riverside, Riverside, CA 92507 USA
| | - Laura Selbmann
- Department of Ecological and Biological Sciences, University of Tuscia, Largo dell’Università, 01100 Viterbo, Italy
- Mycological Section, Italian Antarctic National Museum (MNA), Via al Porto Antico, 16128 Genoa, Italy
| | - Claudio Donati
- Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38098 San Michele all’Adige, Italy
| |
Collapse
|
44
|
Heo Y, Manikandan G, Ramachandran A, Chen D. Comprehensive Evaluation of Error-Correction Methodologies for Genome Sequencing Data. Bioinformatics 2021. [DOI: 10.36255/exonpublications.bioinformatics.2021.ch6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
45
|
Mushinski RM, Payne ZC, Raff JD, Craig ME, Pusede SE, Rusch DB, White JR, Phillips RP. Nitrogen cycling microbiomes are structured by plant mycorrhizal associations with consequences for nitrogen oxide fluxes in forests. GLOBAL CHANGE BIOLOGY 2020; 27:1068-1082. [PMID: 33319480 PMCID: PMC7898693 DOI: 10.1111/gcb.15439] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 10/25/2020] [Accepted: 10/26/2020] [Indexed: 05/02/2023]
Abstract
Volatile nitrogen oxides (N2 O, NO, NO2 , HONO, …) can negatively impact climate, air quality, and human health. Using soils collected from temperate forests across the eastern United States, we show microbial communities involved in nitrogen (N) cycling are structured, in large part, by the composition of overstory trees, leading to predictable N-cycling syndromes, with consequences for emissions of volatile nitrogen oxides to air. Trees associating with arbuscular mycorrhizal (AM) fungi promote soil microbial communities with higher N-cycle potential and activity, relative to microbial communities in soils dominated by trees associating with ectomycorrhizal (ECM) fungi. Metagenomic analysis and gene expression studies reveal a 5 and 3.5 times greater estimated N-cycle gene and transcript copy numbers, respectively, in AM relative to ECM soil. Furthermore, we observe a 60% linear decrease in volatile reactive nitrogen gas flux (NOy ≡ NO, NO2 , HONO) as ECM tree abundance increases. Compared to oxic conditions, gas flux potential of N2 O and NO increase significantly under anoxic conditions for AM soil (30- and 120-fold increase), but not ECM soil-likely owing to small concentrations of available substrate ( NO 3 - ) in ECM soil. Linear mixed effects modeling shows that ECM tree abundance, microbial process rates, and geographic location are primarily responsible for variation in peak potential NOy flux. Given that nearly all tree species associate with either AM or ECM fungi, our results indicate that the consequences of tree species shifts associated with global change may have predictable consequences for soil N cycling.
Collapse
Affiliation(s)
- Ryan M. Mushinski
- School of Life SciencesUniversity of WarwickCoventryUK
- O'Neill School of Public and Environmental AffairsIndiana UniversityBloomingtonINUSA
| | - Zachary C. Payne
- O'Neill School of Public and Environmental AffairsIndiana UniversityBloomingtonINUSA
- Department of ChemistryIndiana UniversityBloomingtonINUSA
| | - Jonathan D. Raff
- O'Neill School of Public and Environmental AffairsIndiana UniversityBloomingtonINUSA
- Department of ChemistryIndiana UniversityBloomingtonINUSA
| | - Matthew E. Craig
- Department of BiologyIndiana UniversityBloomingtonINUSA
- Environmental Sciences Division and Climate Change Science InstituteOak Ridge National LaboratoryOak RidgeTNUSA
| | - Sally E. Pusede
- Department of Environmental SciencesUniversity of VirginiaCharlottesvilleVAUSA
| | - Douglas B. Rusch
- Center for Genomics and BioinformaticsIndiana UniversityBloomingtonINUSA
| | - Jeffrey R. White
- O'Neill School of Public and Environmental AffairsIndiana UniversityBloomingtonINUSA
- Department of Earth and Atmospheric SciencesIndiana UniversityBloomingtonINUSA
| | | |
Collapse
|
46
|
Huang L, Ma Y, Jiang J, Li T, Yang W, Zhang L, Wu L, Feng L, Xi Z, Xu X, Liu J, Hu Q. A chromosome-scale reference genome of Lobularia maritima, an ornamental plant with high stress tolerance. HORTICULTURE RESEARCH 2020; 7:197. [PMID: 33328471 PMCID: PMC7705659 DOI: 10.1038/s41438-020-00422-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Revised: 09/21/2020] [Accepted: 09/30/2020] [Indexed: 06/12/2023]
Abstract
Lobularia maritima (L.) Desv. is an ornamental plant cultivated across the world. It belongs to the family Brassicaceae and can tolerate dry, poor and contaminated habitats. Here, we present a chromosome-scale, high-quality genome assembly of L. maritima based on integrated approaches combining Illumina short reads and Hi-C chromosome conformation data. The genome was assembled into 12 pseudochromosomes with a 197.70 Mb length, and it includes 25,813 protein-coding genes. Approximately 41.94% of the genome consists of repetitive sequences, with abundant long terminal repeat transposable elements. Comparative genomic analysis confirmed that L. maritima underwent a species-specific whole-genome duplication (WGD) event ~22.99 million years ago. We identified ~1900 species-specific genes, 25 expanded gene families, and 50 positively selected genes in L. maritima. Functional annotations of these genes indicated that they are mainly related to stress tolerance. These results provide new insights into the stress tolerance of L. maritima, and this genomic resource will be valuable for further genetic improvement of this important ornamental plant.
Collapse
Affiliation(s)
- Li Huang
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, 610065, Chengdu, China
| | - Yazhen Ma
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, 610065, Chengdu, China
| | - Jiebei Jiang
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, 610065, Chengdu, China
| | - Ting Li
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, 610065, Chengdu, China
| | - Wenjie Yang
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, 610065, Chengdu, China
| | - Lei Zhang
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, 610065, Chengdu, China
| | - Lei Wu
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, 610065, Chengdu, China
| | - Landi Feng
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, 610065, Chengdu, China
| | - Zhenxiang Xi
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, 610065, Chengdu, China
| | - Xiaoting Xu
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, 610065, Chengdu, China
| | - Jianquan Liu
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, 610065, Chengdu, China
- State Key Laboratory of Grassland Agro-Ecosystem, Institute of Innovation Ecology, Lanzhou University, Lanzhou, China
| | - Quanjun Hu
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, 610065, Chengdu, China.
| |
Collapse
|
47
|
Mei R, Nobu MK, Narihiro T, Liu WT. Metagenomic and Metatranscriptomic Analyses Revealed Uncultured Bacteroidales Populations as the Dominant Proteolytic Amino Acid Degraders in Anaerobic Digesters. Front Microbiol 2020; 11:593006. [PMID: 33193263 PMCID: PMC7661554 DOI: 10.3389/fmicb.2020.593006] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2020] [Accepted: 10/13/2020] [Indexed: 01/22/2023] Open
Abstract
Current understanding of amino acid (AA) degraders in anaerobic digesters is mainly based on cultured species, whereas microorganisms that play important roles in a complex microbial community remain poorly characterized. This study investigated short-term enrichments degrading single AAs using metagenomics and metatranscriptomics. Metagenomic analysis revealed that populations related to cultured AA degraders had an abundance <2.5% of the sequences. In contrast, metagenomic-assembled bins related to uncultured Bacteroidales collectively accounted for >35% of the sequences. Phylogenetic analyses suggested that these Bacteroidales populations represented a yet-to-be characterized family lineage, i.e., Bacteroidetes vadinHA17. The bins possessed the genetic capacity related to protein degradation, including surface adhesion (3–7 genes), secreted peptidase (52–77 genes), and polypeptide-specific transporters (2–5 genes). Furthermore, metatranscriptomics revealed that these Bacteroidales populations expressed the complete metabolic pathways for degrading 16 to 17 types of AAs in enrichments fed with respective substrates. These characteristics were distinct from cultured AA degraders including Acidaminobacter and Peptoclostridium, suggesting the uncultured Bacteroidales were the major protein-hydrolyzing and AA-degrading populations. These uncultured Bacteroidales were further found to be dominant and active in full-scale anaerobic digesters, indicating their important ecological roles in the native habitats. “Candidatus Aminobacteroidaceae” was proposed to represent the previously uncharted family Bacteroidetes vadinHA17.
Collapse
Affiliation(s)
- Ran Mei
- Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| | - Masaru K Nobu
- Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States.,Bioproduction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan
| | - Takashi Narihiro
- Bioproduction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan
| | - Wen-Tso Liu
- Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| |
Collapse
|
48
|
Metagenomes and Metatranscriptomes of a Glucose-Amended Agricultural Soil. Microbiol Resour Announc 2020; 9:9/44/e00895-20. [PMID: 33122409 PMCID: PMC7595945 DOI: 10.1128/mra.00895-20] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The addition of glucose to soil has long been used to study the metabolic activity of microbes in soil; however, the response of the microbial ecophysiology remains poorly characterized. To address this, we sequenced the metagenomes and metatranscriptomes of glucose-amended soil microbial communities in a laboratory incubation.
Collapse
|
49
|
Lopez Sanchez MIG, Ziemann M, Bachem A, Makam R, Crowston JG, Pinkert CA, McKenzie M, Bedoui S, Trounce IA. Nuclear response to divergent mitochondrial DNA genotypes modulates the interferon immune response. PLoS One 2020; 15:e0239804. [PMID: 33031404 PMCID: PMC7544115 DOI: 10.1371/journal.pone.0239804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 09/14/2020] [Indexed: 11/23/2022] Open
Abstract
Mitochondrial OXPHOS generates most of the energy required for cellular function. OXPHOS biogenesis requires the coordinated expression of the nuclear and mitochondrial genomes. This represents a unique challenge that highlights the importance of nuclear-mitochondrial genetic communication to cellular function. Here we investigated the transcriptomic and functional consequences of nuclear-mitochondrial genetic divergence in vitro and in vivo. We utilized xenomitochondrial cybrid cell lines containing nuclear DNA from the common laboratory mouse Mus musculus domesticus and mitochondrial DNA (mtDNA) from Mus musculus domesticus, or exogenous mtDNA from progressively divergent mouse species Mus spretus, Mus terricolor, Mus caroli and Mus pahari. These cybrids model a wide range of nuclear-mitochondrial genetic divergence that cannot be achieved with other research models. Furthermore, we used a xenomitochondrial mouse model generated in our laboratory that harbors wild-type, C57BL/6J Mus musculus domesticus nuclear DNA and homoplasmic mtDNA from Mus terricolor. RNA sequencing analysis of xenomitochondrial cybrids revealed an activation of interferon signaling pathways even in the absence of OXPHOS dysfunction or immune challenge. In contrast, xenomitochondrial mice displayed lower baseline interferon gene expression and an impairment in the interferon-dependent innate immune response upon immune challenge with herpes simplex virus, which resulted in decreased viral control. Our work demonstrates that nuclear-mitochondrial genetic divergence caused by the introduction of exogenous mtDNA can modulate the interferon immune response both in vitro and in vivo, even when OXPHOS function is not compromised. This work may lead to future insights into the role of mitochondrial genetic variation and the immune function in humans, as patients affected by mitochondrial disease are known to be more susceptible to immune challenges.
Collapse
Affiliation(s)
- M. Isabel G. Lopez Sanchez
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, Melbourne, Victoria, Australia
- Ophthalmology, Department of Surgery, University of Melbourne, Melbourne, Victoria, Australia
- * E-mail: (MIGLS); (IAT)
| | - Mark Ziemann
- Department of Diabetes, Monash University Central Clinical School, The Alfred Medical Research and Education Precinct, Melbourne, Victoria, Australia
- School of Life and Environmental Sciences, Deakin University, Victoria, Australia
| | - Annabell Bachem
- Department of Microbiology and Immunology, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Victoria, Australia
| | - Rahul Makam
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, Melbourne, Victoria, Australia
| | - Jonathan G. Crowston
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, Melbourne, Victoria, Australia
- Ophthalmology, Department of Surgery, University of Melbourne, Melbourne, Victoria, Australia
| | - Carl A. Pinkert
- Department of Pathobiology, College of Veterinary Medicine, Auburn University, Auburn, Alabama, United States of America
| | - Matthew McKenzie
- School of Life and Environmental Sciences, Deakin University, Victoria, Australia
- Centre for Innate Immunity and Infectious Diseases, Hudson Institute of Medical Research, Melbourne, Victoria, Australia
- Department of Molecular and Translational Science, Monash University, Melbourne, Victoria, Australia
| | - Sammy Bedoui
- Department of Microbiology and Immunology, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Victoria, Australia
| | - Ian A. Trounce
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, Melbourne, Victoria, Australia
- Ophthalmology, Department of Surgery, University of Melbourne, Melbourne, Victoria, Australia
- * E-mail: (MIGLS); (IAT)
| |
Collapse
|
50
|
Yang Q, Bi H, Yang W, Li T, Jiang J, Zhang L, Liu J, Hu Q. The Genome Sequence of Alpine Megacarpaea delavayi Identifies Species-Specific Whole-Genome Duplication. Front Genet 2020; 11:812. [PMID: 32849811 PMCID: PMC7416671 DOI: 10.3389/fgene.2020.00812] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Accepted: 07/06/2020] [Indexed: 11/18/2022] Open
Abstract
Megacarpaea delavayi (Brassicaceae), a plant found the high mountains of southwest China at high altitudes (3000–4800 m), is used as a vegetable or medicine. Here, we report a draft genome for this species. The assembly genome of M. delavayi is 883 Mb, and 61.59% of the genome is composed of repeat sequences. Annotation of the genome identified a total of 41,114 protein-coding genes. We found that M. delavayi experienced an independent whole-genome duplication (WGD), paralleling those independent WGDs in Iberis, Biscutella, and Anastatica in the early Miocene. Phylogenetic analyses based on the single-copy genes confirmed the position of the genus Megacarpaea within the expanded lineage II of the family and resolved its basal divergence to a subclade consisting of Anastatica, Iberis, and Biscutella. Species-specific and fast-evolving genes in M. delavayi are mainly involved in “DNA repair” and “response to UV-B radiation.” These genetic changes may together help this species survive in high-altitude environments. The reference genome reported here provides a valuable resource for studying adaptation of this and other alpine plants to the high-altitude habitats.
Collapse
Affiliation(s)
- Qiao Yang
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, State Key Laboratory of Hydraulics and Mountain River Engineering, College of Life Sciences, Sichuan University, Chengdu, China
| | - Hao Bi
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, State Key Laboratory of Hydraulics and Mountain River Engineering, College of Life Sciences, Sichuan University, Chengdu, China
| | - Wenjie Yang
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, State Key Laboratory of Hydraulics and Mountain River Engineering, College of Life Sciences, Sichuan University, Chengdu, China
| | - Ting Li
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, State Key Laboratory of Hydraulics and Mountain River Engineering, College of Life Sciences, Sichuan University, Chengdu, China
| | - Jiebei Jiang
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, State Key Laboratory of Hydraulics and Mountain River Engineering, College of Life Sciences, Sichuan University, Chengdu, China
| | - Lei Zhang
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, State Key Laboratory of Hydraulics and Mountain River Engineering, College of Life Sciences, Sichuan University, Chengdu, China
| | - Jianquan Liu
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, State Key Laboratory of Hydraulics and Mountain River Engineering, College of Life Sciences, Sichuan University, Chengdu, China.,State Key Laboratory of Grassland Agro-Ecosystem, Institute of Innovation Ecology, Lanzhou University, Lanzhou, China
| | - Quanjun Hu
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, State Key Laboratory of Hydraulics and Mountain River Engineering, College of Life Sciences, Sichuan University, Chengdu, China
| |
Collapse
|