1
|
Jiang Z, Peng Z, Wei Z, Sun J, Luo Y, Bie L, Zhang G, Wang Y. A deep learning-based method enables the automatic and accurate assembly of chromosome-level genomes. Nucleic Acids Res 2024; 52:e92. [PMID: 39287126 PMCID: PMC11514472 DOI: 10.1093/nar/gkae789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Revised: 08/25/2024] [Accepted: 08/30/2024] [Indexed: 09/19/2024] Open
Abstract
The application of high-throughput chromosome conformation capture (Hi-C) technology enables the construction of chromosome-level assemblies. However, the correction of errors and the anchoring of sequences to chromosomes in the assembly remain significant challenges. In this study, we developed a deep learning-based method, AutoHiC, to address the challenges in chromosome-level genome assembly by enhancing contiguity and accuracy. Conventional Hi-C-aided scaffolding often requires manual refinement, but AutoHiC instead utilizes Hi-C data for automated workflows and iterative error correction. When trained on data from 300+ species, AutoHiC demonstrated a robust average error detection accuracy exceeding 90%. The benchmarking results confirmed its significant impact on genome contiguity and error correction. The innovative approach and comprehensive results of AutoHiC constitute a breakthrough in automated error detection, promising more accurate genome assemblies for advancing genomics research.
Collapse
Affiliation(s)
- Zijie Jiang
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Biological Science Research Center, Southwest University, Chongqing, China
| | - Zhixiang Peng
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Biological Science Research Center, Southwest University, Chongqing, China
| | - Zhaoyuan Wei
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Biological Science Research Center, Southwest University, Chongqing, China
| | - Jiahe Sun
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Biological Science Research Center, Southwest University, Chongqing, China
| | - Yongjiang Luo
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Biological Science Research Center, Southwest University, Chongqing, China
| | - Lingzi Bie
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Biological Science Research Center, Southwest University, Chongqing, China
| | - Guoqing Zhang
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Biological Science Research Center, Southwest University, Chongqing, China
| | - Yi Wang
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Biological Science Research Center, Southwest University, Chongqing, China
| |
Collapse
|
2
|
Harper S, Counihan KL, Kanrar S, Paoli GC, Tilman S, Gehring AG. Investigating the Quantification Capabilities of a Nanopore-Based Sequencing Platform for Food Safety Application via External Standards of Lambda DNA and Lambda Spiked Beef. Foods 2024; 13:3304. [PMID: 39456366 PMCID: PMC11507243 DOI: 10.3390/foods13203304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Revised: 10/10/2024] [Accepted: 10/11/2024] [Indexed: 10/28/2024] Open
Abstract
Six hundred million cases of disease and roughly 420,000 deaths occur globally each year due to foodborne pathogens. Current methods to screen and identify pathogens in swine, poultry, and cattle products include immuno-based techniques (e.g., immunoassay integrated biosensors), molecular methods (e.g., DNA hybridization and PCR assays), and traditional culturing. These methods are often used in tandem to screen, quantify, and characterize samples, prolonging real-time comprehensive analysis. Next-generation sequencing (NGS) is a relatively new technology that combines DNA-sequencing chemistry and bioinformatics to generate and analyze large amounts of short- or long-read DNA sequences and whole genomes. The goal of this project was to evaluate the quantitative capabilities of the real-time NGS Oxford Nanopore Technologies' MinION sequencer through a shotgun-based sequencing approach. This investigation explored the correlation between known amounts of the analyte (lambda DNA as a pathogenic bacterial surrogate) with data output, in both the presence and absence of a background matrix (Bos taurus DNA). A positive linear correlation was observed between the concentration of analyte and the amount of data produced, number of bases sequenced, and number of reads generated in both the presence and absence of a background matrix. In the presence of bovine DNA, the sequenced data were successfully mapped to the NCBI lambda reference genome. Furthermore, the workflow from pre-extracted DNA to target identification took less than 3 h, demonstrating the potential of long-read sequencing in food safety as a rapid method for screening, identification, and quantification.
Collapse
Affiliation(s)
| | | | | | | | | | - Andrew G. Gehring
- United States Department of Agriculture, Agricultural Research Service, Eastern Regional Research Center, Wyndmoor, PA 19038, USA; (S.H.); (K.L.C.); (S.K.); (G.C.P.); (S.T.)
| |
Collapse
|
3
|
Jia H, Tan S, Cai Y, Guo Y, Shen J, Zhang Y, Ma H, Zhang Q, Chen J, Qiao G, Ruan J, Zhang YE. Low-input PacBio sequencing generates high-quality individual fly genomes and characterizes mutational processes. Nat Commun 2024; 15:5644. [PMID: 38969648 PMCID: PMC11226609 DOI: 10.1038/s41467-024-49992-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 06/20/2024] [Indexed: 07/07/2024] Open
Abstract
Long-read sequencing, exemplified by PacBio, revolutionizes genomics, overcoming challenges like repetitive sequences. However, the high DNA requirement ( > 1 µg) is prohibitive for small organisms. We develop a low-input (100 ng), low-cost, and amplification-free library-generation method for PacBio sequencing (LILAP) using Tn5-based tagmentation and DNA circularization within one tube. We test LILAP with two Drosophila melanogaster individuals, and generate near-complete genomes, surpassing preexisting single-fly genomes. By analyzing variations in these two genomes, we characterize mutational processes: complex transpositions (transposon insertions together with extra duplications and/or deletions) prefer regions characterized by non-B DNA structures, and gene conversion of transposons occurs on both DNA and RNA levels. Concurrently, we generate two complete assemblies for the endosymbiotic bacterium Wolbachia in these flies and similarly detect transposon conversion. Thus, LILAP promises a broad PacBio sequencing adoption for not only mutational studies of flies and their symbionts but also explorations of other small organisms or precious samples.
Collapse
Affiliation(s)
- Hangxing Jia
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
| | - Shengjun Tan
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
| | - Yingao Cai
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yanyan Guo
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jieyu Shen
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yaqiong Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Huijing Ma
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Qingzhu Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jinfeng Chen
- University of Chinese Academy of Sciences, Beijing, China
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Gexia Qiao
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.
| | - Yong E Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
4
|
Zhang Z, Xiao J, Wang H, Yang C, Huang Y, Yue Z, Chen Y, Han L, Yin K, Lyu A, Fang X, Zhang L. Exploring high-quality microbial genomes by assembling short-reads with long-range connectivity. Nat Commun 2024; 15:4631. [PMID: 38821971 PMCID: PMC11143213 DOI: 10.1038/s41467-024-49060-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Accepted: 05/17/2024] [Indexed: 06/02/2024] Open
Abstract
Although long-read sequencing enables the generation of complete genomes for unculturable microbes, its high cost limits the widespread adoption of long-read sequencing in large-scale metagenomic studies. An alternative method is to assemble short-reads with long-range connectivity, which can be a cost-effective way to generate high-quality microbial genomes. Here, we develop Pangaea, a bioinformatic approach designed to enhance metagenome assembly using short-reads with long-range connectivity. Pangaea leverages connectivity derived from physical barcodes of linked-reads or virtual barcodes by aligning short-reads to long-reads. Pangaea utilizes a deep learning-based read binning algorithm to assemble co-barcoded reads exhibiting similar sequence contexts and abundances, thereby improving the assembly of high- and medium-abundance microbial genomes. Pangaea also leverages a multi-thresholding algorithm strategy to refine assembly for low-abundance microbes. We benchmark Pangaea on linked-reads and a combination of short- and long-reads from simulation data, mock communities and human gut metagenomes. Pangaea achieves significantly higher contig continuity as well as more near-complete metagenome-assembled genomes (NCMAGs) than the existing assemblers. Pangaea also generates three complete and circular NCMAGs on the human gut microbiomes.
Collapse
Grants
- This research was partially supported by the Young Collaborative Research Grant (C2004-23Y, L.Z.), HMRF (11221026, L.Z.), the open project of BGI-Shenzhen, Shenzhen 518000, China (BGIRSZ20220012, L.Z.), the Hong Kong Research Grant Council Early Career Scheme (HKBU 22201419, L.Z.), HKBU Start-up Grant Tier 2 (RC-SGT2/19-20/SCI/007, L.Z.), HKBU IRCMS (No. IRCMS/19-20/D02, L.Z.).
- This research was partially supported by the open project of BGI-Shenzhen, Shenzhen 518000, China (BGIRSZ20220014, KJ.Y.).
- The study were partially supported by the Science Technology and Innovation Committee of Shenzhen Municipality, China (SGDX20190919142801722, XD.F.),
Collapse
Affiliation(s)
- Zhenmiao Zhang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong, China
| | - Jin Xiao
- Department of Computer Science, Hong Kong Baptist University, Hong Kong, China
| | - Hongbo Wang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong, China
| | - Chao Yang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong, China
| | | | - Zhen Yue
- BGI Research, Sanya, 572025, China
| | - Yang Chen
- State Key Laboratory of Dampness Syndrome of Chinese Medicine, The Second Affiliated Hospital of Guangzhou University of Chinese, Guangzhou, China
| | - Lijuan Han
- Department of Scientific Research, Kangmeihuada GeneTech Co., Ltd (KMHD), Shenzhen, China
| | - Kejing Yin
- Department of Computer Science, Hong Kong Baptist University, Hong Kong, China
- Institute for Research and Continuing Education, Hong Kong Baptist University, Shenzhen, China
| | - Aiping Lyu
- School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, China
| | - Xiaodong Fang
- BGI Research, Shenzhen, 518083, China
- BGI Research, Sanya, 572025, China
- Department of Scientific Research, Kangmeihuada GeneTech Co., Ltd (KMHD), Shenzhen, China
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong, China.
- Institute for Research and Continuing Education, Hong Kong Baptist University, Shenzhen, China.
| |
Collapse
|
5
|
Yu W, Luo H, Yang J, Zhang S, Jiang H, Zhao X, Hui X, Sun D, Li L, Wei XQ, Lonardi S, Pan W. Comprehensive assessment of 11 de novo HiFi assemblers on complex eukaryotic genomes and metagenomes. Genome Res 2024; 34:326-340. [PMID: 38428994 PMCID: PMC10984382 DOI: 10.1101/gr.278232.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 01/23/2024] [Indexed: 03/03/2024]
Abstract
Pacific Biosciences (PacBio) HiFi sequencing technology generates long reads (>10 kbp) with very high accuracy (<0.01% sequencing error). Although several de novo assembly tools are available for HiFi reads, there are no comprehensive studies on the evaluation of these assemblers. We evaluated the performance of 11 de novo HiFi assemblers on (1) real data for three eukaryotic genomes; (2) 34 synthetic data sets with different ploidy, sequencing coverage levels, heterozygosity rates, and sequencing error rates; (3) one real metagenomic data set; and (4) five synthetic metagenomic data sets with different composition abundance and heterozygosity rates. The 11 assemblers were evaluated using quality assessment tool (QUAST) and benchmarking universal single-copy ortholog (BUSCO). We also used several additional criteria, namely, completion rate, single-copy completion rate, duplicated completion rate, average proportion of largest category, average distance difference, quality value, run-time, and memory utilization. Results show that hifiasm and hifiasm-meta should be the first choice for assembling eukaryotic genomes and metagenomes with HiFi data. We performed a comprehensive benchmarking study of commonly used assemblers on complex eukaryotic genomes and metagenomes. Our study will help the research community to choose the most appropriate assembler for their data and identify possible improvements in assembly algorithms.
Collapse
Affiliation(s)
- Wenjuan Yu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Haohui Luo
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Jinbao Yang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Shengchen Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Heling Jiang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Xianjia Zhao
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
- School of Agricultural Sciences, Zhengzhou University, Zhengzhou, Henan 450001, China
| | - Xingqi Hui
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
- School of Agricultural Sciences, Zhengzhou University, Zhengzhou, Henan 450001, China
| | - Da Sun
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Liang Li
- Fruit Research Institute, Fujian Academy of Agricultural Sciences, Fuzhou, Fujian 350002, China
| | - Xiu-Qing Wei
- Fruit Research Institute, Fujian Academy of Agricultural Sciences, Fuzhou, Fujian 350002, China;
| | - Stefano Lonardi
- Department of Computer Science and Engineering, University of California, Riverside, California 92521, USA;
| | - Weihua Pan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China;
| |
Collapse
|
6
|
Pozo G, Albuja-Quintana M, Larreátegui L, Gutiérrez B, Fuentes N, Alfonso-Cortés F, Torres MDL. First whole-genome sequence and assembly of the Ecuadorian brown-headed spider monkey (Ateles fusciceps fusciceps), a critically endangered species, using Oxford Nanopore Technologies. G3 (BETHESDA, MD.) 2024; 14:jkae014. [PMID: 38244218 PMCID: PMC10917520 DOI: 10.1093/g3journal/jkae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 12/11/2023] [Accepted: 01/05/2024] [Indexed: 01/22/2024]
Abstract
The Ecuadorian brown-headed spider monkey (Ateles fusciceps fusciceps) is currently considered one of the most endangered primates in the world and is classified as critically endangered [International union for conservation of nature (IUCN)]. It faces multiple threats, the most significant one being habitat loss due to deforestation in western Ecuador. Genomic tools are keys for the management of endangered species, but this requires a reference genome, which until now was unavailable for A. f. fusciceps. The present study reports the first whole-genome sequence and assembly of A. f. fusciceps generated using Oxford Nanopore long reads. DNA was extracted from a subadult male, and libraries were prepared for sequencing following the Ligation Sequencing Kit SQK-LSK112 workflow. Sequencing was performed using a MinION Mk1C sequencer. The sequencing reads were processed to generate a genome assembly. Two different assemblers were used to obtain draft genomes using raw reads, of which the Flye assembly was found to be superior. The final assembly has a total length of 2.63 Gb and contains 3,861 contigs, with an N50 of 7,560,531 bp. The assembly was analyzed for annotation completeness based on primate ortholog prediction using a high-resolution database, and was found to be 84.3% complete, with a low number of duplicated genes indicating a precise assembly. The annotation of the assembly predicted 31,417 protein-coding genes, comparable with other mammal assemblies. A reference genome for this critically endangered species will allow researchers to gain insight into the genetics of its populations and thus aid conservation and management efforts of this vulnerable species.
Collapse
Affiliation(s)
- Gabriela Pozo
- Laboratorio de Biotecnología Vegetal, Colegio de Ciencias Biológicas y Ambientales, Universidad San Francisco de Quito (USFQ), Quito 170901, Ecuador
- Instituto Nacional de Biodiversidad (INABIO), Quito 170135, Ecuador
| | - Martina Albuja-Quintana
- Laboratorio de Biotecnología Vegetal, Colegio de Ciencias Biológicas y Ambientales, Universidad San Francisco de Quito (USFQ), Quito 170901, Ecuador
| | - Lizbeth Larreátegui
- Laboratorio de Biotecnología Vegetal, Colegio de Ciencias Biológicas y Ambientales, Universidad San Francisco de Quito (USFQ), Quito 170901, Ecuador
| | - Bernardo Gutiérrez
- Laboratorio de Biotecnología Vegetal, Colegio de Ciencias Biológicas y Ambientales, Universidad San Francisco de Quito (USFQ), Quito 170901, Ecuador
- Department of Biology, University of Oxford, Oxford OX1 3SZ, UK
| | - Nathalia Fuentes
- Proyecto Washu/Fundación Naturaleza y Arte, Quito 170521, Ecuador
| | | | - Maria de Lourdes Torres
- Laboratorio de Biotecnología Vegetal, Colegio de Ciencias Biológicas y Ambientales, Universidad San Francisco de Quito (USFQ), Quito 170901, Ecuador
- Instituto Nacional de Biodiversidad (INABIO), Quito 170135, Ecuador
| |
Collapse
|
7
|
Paré L, Bideau L, Baduel L, Dalle C, Benchouaia M, Schneider SQ, Laplane L, Clément Y, Vervoort M, Gazave E. Transcriptomic landscape of posterior regeneration in the annelid Platynereis dumerilii. BMC Genomics 2023; 24:583. [PMID: 37784028 PMCID: PMC10546743 DOI: 10.1186/s12864-023-09602-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 08/18/2023] [Indexed: 10/04/2023] Open
Abstract
BACKGROUND Restorative regeneration, the capacity to reform a lost body part following amputation or injury, is an important and still poorly understood process in animals. Annelids, or segmented worms, show amazing regenerative capabilities, and as such are a crucial group to investigate. Elucidating the molecular mechanisms that underpin regeneration in this major group remains a key goal. Among annelids, the nereididae Platynereis dumerilii (re)emerged recently as a front-line regeneration model. Following amputation of its posterior part, Platynereis worms can regenerate both differentiated tissues of their terminal part as well as a growth zone that contains putative stem cells. While this regeneration process follows specific and reproducible stages that have been well characterized, the transcriptomic landscape of these stages remains to be uncovered. RESULTS We generated a high-quality de novo Reference transcriptome for the annelid Platynereis dumerilii. We produced and analyzed three RNA-sequencing datasets, encompassing five stages of posterior regeneration, along with blastema stages and non-amputated tissues as controls. We included two of these regeneration RNA-seq datasets, as well as embryonic and tissue-specific datasets from the literature to produce a Reference transcriptome. We used this Reference transcriptome to perform in depth analyzes of RNA-seq data during the course of regeneration to reveal the important dynamics of the gene expression, process with thousands of genes differentially expressed between stages, as well as unique and specific gene expression at each regeneration stage. The study of these genes highlighted the importance of the nervous system at both early and late stages of regeneration, as well as the enrichment of RNA-binding proteins (RBPs) during almost the entire regeneration process. CONCLUSIONS In this study, we provided a high-quality de novo Reference transcriptome for the annelid Platynereis that is useful for investigating various developmental processes, including regeneration. Our extensive stage-specific transcriptional analysis during the course of posterior regeneration sheds light upon major molecular mechanisms and pathways, and will foster many specific studies in the future.
Collapse
Affiliation(s)
- Louis Paré
- Université Paris Cité, CNRS, Institut Jacques Monod, Paris, F-75013, France
| | - Loïc Bideau
- Université Paris Cité, CNRS, Institut Jacques Monod, Paris, F-75013, France
| | - Loeiza Baduel
- Université Paris Cité, CNRS, Institut Jacques Monod, Paris, F-75013, France
| | - Caroline Dalle
- Université Paris Cité, CNRS, Institut Jacques Monod, Paris, F-75013, France
| | - Médine Benchouaia
- Département de biologie, GenomiqueENS, Institut de Biologie de l'ENS (IBENS), École normale supérieure, CNRS, INSERM, Université PSL, Paris, 75005, France
| | - Stephan Q Schneider
- Institute of Cellular and Organismic Biology, Academia Sinica, Taipei, 11529, Taiwan
| | - Lucie Laplane
- Université Paris I Panthéon-Sorbonne, CNRS UMR 8590 Institut d'Histoire et de Philosophie des Sciences et des Techniques (IHPST), Paris, France
- Gustave Roussy, UMR 1287, Villejuif, France
| | - Yves Clément
- Université Paris Cité, CNRS, Institut Jacques Monod, Paris, F-75013, France
| | - Michel Vervoort
- Université Paris Cité, CNRS, Institut Jacques Monod, Paris, F-75013, France
| | - Eve Gazave
- Université Paris Cité, CNRS, Institut Jacques Monod, Paris, F-75013, France.
| |
Collapse
|
8
|
Leung W, Torosin N, Cao W, Reed LK, Arrigo C, Elgin SCR, Ellison CE. Long-read genome assemblies for the study of chromosome expansion: Drosophila kikkawai, Drosophila takahashii, Drosophila bipectinata, and Drosophila ananassae. G3 (BETHESDA, MD.) 2023; 13:jkad191. [PMID: 37611223 PMCID: PMC10542312 DOI: 10.1093/g3journal/jkad191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 08/01/2023] [Accepted: 08/04/2023] [Indexed: 08/25/2023]
Abstract
Flow cytometry estimates of genome sizes among species of Drosophila show a 3-fold variation, ranging from ∼127 Mb in Drosophila mercatorum to ∼400 Mb in Drosophila cyrtoloma. However, the assembled portion of the Muller F element (orthologous to the fourth chromosome in Drosophila melanogaster) shows a nearly 14-fold variation in size, ranging from ∼1.3 Mb to >18 Mb. Here, we present chromosome-level long-read genome assemblies for 4 Drosophila species with expanded F elements ranging in size from 2.3 to 20.5 Mb. Each Muller element is present as a single scaffold in each assembly. These assemblies will enable new insights into the evolutionary causes and consequences of chromosome size expansion.
Collapse
Affiliation(s)
- Wilson Leung
- Department of Biology, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Nicole Torosin
- Department of Genetics and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08854, USA
| | - Weihuan Cao
- Department of Genetics and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08854, USA
| | - Laura K Reed
- Department of Biological Sciences, The University of Alabama, Tuscaloosa, AL 35487, USA
| | - Cindy Arrigo
- Department of Biology, New Jersey City University, Jersey City, NJ 07305, USA
| | - Sarah C R Elgin
- Department of Biology, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Christopher E Ellison
- Department of Genetics and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08854, USA
| |
Collapse
|
9
|
Mugge RL, Moseley RD, Hamdan LJ. Substrate Specificity of Biofilms Proximate to Historic Shipwrecks. Microorganisms 2023; 11:2416. [PMID: 37894074 PMCID: PMC10608953 DOI: 10.3390/microorganisms11102416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 09/13/2023] [Accepted: 09/25/2023] [Indexed: 10/29/2023] Open
Abstract
The number of built structures on the seabed, such as shipwrecks, energy platforms, and pipelines, is increasing in coastal and offshore regions. These structures, typically composed of steel or wood, are substrates for microbial attachment and biofilm formation. The success of biofilm growth depends on substrate characteristics and local environmental conditions, though it is unclear which feature is dominant in shaping biofilm microbiomes. The goal of this study was to understand the substrate- and site-specific impacts of built structures on short-term biofilm composition and functional potential. Seafloor experiments were conducted wherein steel and wood surfaces were deployed for four months at distances extending up to 115 m away from three historic (>50 years old) shipwrecks in the Gulf of Mexico. DNA from biofilms on the steel and wood was extracted, and metagenomes were sequenced on an Illumina NextSeq. A bioinformatics analysis revealed that the taxonomic composition was significantly different between substrates and sites, with substrate being the primary determining factor. Regardless of site, the steel biofilms had a higher abundance of genes related to biofilm formation, and sulfur, iron, and nitrogen cycling, while the wood biofilms showed a higher abundance of manganese cycling and methanol oxidation genes. This study demonstrates how substrate composition shapes biofilm microbiomes and suggests that marine biofilms may contribute to nutrient cycling at depth. Analyzing the marine biofilm microbiome provides insight into the ecological impact of anthropogenic structures on the seabed.
Collapse
Affiliation(s)
- Rachel L. Mugge
- U.S. Naval Research Laboratory, Ocean Sciences Division, Stennis Space Center, MS 39529, USA;
| | - Rachel D. Moseley
- School of Ocean Science and Engineering, University of Southern Mississippi, Ocean Springs, MS 39564, USA
| | - Leila J. Hamdan
- School of Ocean Science and Engineering, University of Southern Mississippi, Ocean Springs, MS 39564, USA
| |
Collapse
|
10
|
Diesel J, Molano G, Montecinos GJ, DeWeese K, Calhoun S, Kuo A, Lipzen A, Salamov A, Grigoriev IV, Reed DC, Miller RJ, Nuzhdin SV, Alberto F. A scaffolded and annotated reference genome of giant kelp (Macrocystis pyrifera). BMC Genomics 2023; 24:543. [PMID: 37704968 PMCID: PMC10498591 DOI: 10.1186/s12864-023-09658-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 09/07/2023] [Indexed: 09/15/2023] Open
Abstract
Macrocystis pyrifera (giant kelp), is a brown macroalga of great ecological importance as a primary producer and structure-forming foundational species that provides habitat for hundreds of species. It has many commercial uses (e.g. source of alginate, fertilizer, cosmetics, feedstock). One of the limitations to exploiting giant kelp's economic potential and assisting in giant kelp conservation efforts is a lack of genomic tools like a high quality, contiguous reference genome with accurate gene annotations. Reference genomes attempt to capture the complete genomic sequence of an individual or species, and importantly provide a universal structure for comparison across a multitude of genetic experiments, both within and between species. We assembled the giant kelp genome of a haploid female gametophyte de novo using PacBio reads, then ordered contigs into chromosome level scaffolds using Hi-C. We found the giant kelp genome to be 537 MB, with a total of 35 scaffolds and 188 contigs. The assembly N50 is 13,669,674 with GC content of 50.37%. We assessed the genome completeness using BUSCO, and found giant kelp contained 94% of the BUSCO genes from the stramenopile clade. Annotation of the giant kelp genome revealed 25,919 genes. Additionally, we present genetic variation data based on 48 diploid giant kelp sporophytes from three different Southern California populations that confirms the population structure found in other studies of these populations. This work resulted in a high-quality giant kelp genome that greatly increases the genetic knowledge of this ecologically and economically vital species.
Collapse
Affiliation(s)
- Jose Diesel
- Department of Molecular and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Gary Molano
- Department of Molecular and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Gabriel J Montecinos
- Department of Biological Sciences, University of Wisconsin-Milwaukee, Milwaukee, WI, USA
| | - Kelly DeWeese
- Department of Molecular and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Sara Calhoun
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Alan Kuo
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Anna Lipzen
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Asaf Salamov
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Igor V Grigoriev
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, USA
| | - Daniel C Reed
- Marine Science Institute, University of California at Santa Barbara, Santa Barbara, CA, 93106, USA
| | - Robert J Miller
- Marine Science Institute, University of California at Santa Barbara, Santa Barbara, CA, 93106, USA
| | - Sergey V Nuzhdin
- Department of Molecular and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Filipe Alberto
- Department of Biological Sciences, University of Wisconsin-Milwaukee, Milwaukee, WI, USA.
| |
Collapse
|
11
|
Rose R, Nolan DJ, Ashcraft D, Feehan AK, Velez-Climent L, Huston C, Lain B, Rosenthal S, Miele L, Fogel GB, Pankey G, Garcia-Diaz J, Lamers SL. Comparing antimicrobial resistant genes and phenotypes across multiple sequencing platforms and assays for Enterobacterales clinical isolates. BMC Microbiol 2023; 23:225. [PMID: 37596530 PMCID: PMC10436404 DOI: 10.1186/s12866-023-02975-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 08/08/2023] [Indexed: 08/20/2023] Open
Abstract
INTRODUCTION Whole genome sequencing (WGS) of bacterial isolates can be used to identify antimicrobial resistance (AMR) genes. Previous studies have shown that genotype-based AMR has variable accuracy for predicting carbapenem resistance in carbapenem-resistant Enterobacterales (CRE); however, the majority of these studies used short-read platforms (e.g. Illumina) to generate sequence data. In this study, our objective was to determine whether Oxford Nanopore Technologies (ONT) long-read WGS would improve detection of carbapenem AMR genes with respect to short-read only WGS for nine clinical CRE samples. We measured the minimum inhibitory breakpoint (MIC) using two phenotype assays (MicroScan and ETEST) for six antibiotics, including two carbapenems (meropenem and ertapenem) and four non-carbapenems (gentamicin, ciprofloxacin, cefepime, and trimethoprim/sulfamethoxazole). We generated short-read data using the Illumina NextSeq and long-read data using the ONT MinION. Four assembly methods were compared: ONT-only assembly; ONT-only assembly plus short-read polish; ONT + short-read hybrid assembly plus short-read polish; short-read only assembly. RESULTS Consistent with previous studies, our results suggest that the hybrid assembly produced the highest quality results as measured by gene completeness and contig circularization. However, ONT-only methods had minimal impact on the detection of AMR genes and plasmids compared to short-read methods, although, notably, differences in gene copy number differed between methods. All four assembly methods showed identical presence/absence of the blaKPC-2 carbapenemase gene for all samples. The two phenotype assays showed 100% concordant results for the non-carbapenems, but only 65% concordance for the two carbapenems. The presence/absence of AMR genes was 100% concordant with AMR phenotypes for all four non-carbapenem drugs, although only 22%-50% sensitivity for the carbapenems. CONCLUSIONS Overall, these findings suggest that the lack of complete correspondence between CRE AMR genotype and phenotype for carbapenems, while concerning, is independent of sequencing platform/assembly method.
Collapse
Affiliation(s)
- Rebecca Rose
- BioInfoExperts LLC, 718 Bayou Lane, Thibodaux, LA, 70301, USA.
- FoxSeq, LLC, Thibodaux, LA, USA.
| | - David J Nolan
- BioInfoExperts LLC, 718 Bayou Lane, Thibodaux, LA, 70301, USA
| | - Deborah Ashcraft
- Infectious Disease Translational Research, Ochsner Clinic Foundation, New Orleans, LA, USA
| | - Amy K Feehan
- Infectious Disease Clinical Research, Ochsner Clinic Foundation, New Orleans, LA, USA
| | | | | | - Benjamin Lain
- BioInfoExperts LLC, 718 Bayou Lane, Thibodaux, LA, 70301, USA
| | - Simon Rosenthal
- BioInfoExperts LLC, 718 Bayou Lane, Thibodaux, LA, 70301, USA
| | - Lucio Miele
- Translational Science and Genetics at Louisiana State University Health Science Center, New Orleans, LA, USA
| | | | - George Pankey
- Infectious Disease Translational Research, Ochsner Clinic Foundation, New Orleans, LA, USA
| | - Julia Garcia-Diaz
- Infectious Disease Clinical Research, Ochsner Clinic Foundation, New Orleans, LA, USA
| | - Susanna L Lamers
- BioInfoExperts LLC, 718 Bayou Lane, Thibodaux, LA, 70301, USA
- FoxSeq, LLC, Thibodaux, LA, USA
| |
Collapse
|
12
|
Gmiter D, Pacak I, Nawrot S, Czerwonka G, Kaca W. Genomes comparison of two Proteus mirabilis clones showing varied swarming ability. Mol Biol Rep 2023; 50:5817-5826. [PMID: 37219671 PMCID: PMC10290045 DOI: 10.1007/s11033-023-08518-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 05/10/2023] [Indexed: 05/24/2023]
Abstract
BACKGROUND Proteus mirabilis is a Gram-negative bacteria most noted for its involvement with catheter-associated urinary tract infections. It is also known for its multicellular migration over solid surfaces, referred to as 'swarming motility'. Here we analyzed the genomic sequences of two P. mirabilis isolates, designated K38 and K39, which exhibit varied swarming ability. METHODS AND RESULTS The isolates genomes were sequenced using Illumina NextSeq sequencer, resulting in about 3.94 Mbp, with a GC content of 38.6%, genomes. Genomes were subjected for in silico comparative investigation. We revealed that, despite a difference in swarming motility, the isolates showed high genomic relatedness (up to 100% ANI similarity), suggesting that one of the isolates probably originated from the other. CONCLUSIONS The genomic sequences will allow us to investigate the mechanism driving this intriguing phenotypic heterogeneity between closely related P. mirabilis isolates. Phenotypic heterogeneity is an adaptive strategy of bacterial cells to several environmental pressures. It is also an important factor related to their pathogenesis. Therefore, the availability of these genomic sequences will facilitate studies that focus on the host-pathogen interactions during catheter-associated urinary tract infections.
Collapse
Affiliation(s)
- Dawid Gmiter
- Department of Microbiology, Institute of Biology, Faculty of Natural Sciences, Jan Kochanowski University in Kielce, Kielce, Poland.
| | - Ilona Pacak
- Department of Microbiology, Institute of Biology, Faculty of Natural Sciences, Jan Kochanowski University in Kielce, Kielce, Poland
| | - Sylwia Nawrot
- Department of Microbiology, Institute of Biology, Faculty of Natural Sciences, Jan Kochanowski University in Kielce, Kielce, Poland
| | - Grzegorz Czerwonka
- Department of Microbiology, Institute of Biology, Faculty of Natural Sciences, Jan Kochanowski University in Kielce, Kielce, Poland
| | - Wieslaw Kaca
- Department of Microbiology, Institute of Biology, Faculty of Natural Sciences, Jan Kochanowski University in Kielce, Kielce, Poland
| |
Collapse
|
13
|
Leung W, Torosin N, Cao W, Reed LK, Arrigo C, Elgin SCR, Ellison CE. Long-read genome assemblies for the study of chromosome expansion: Drosophila kikkawai , Drosophila takahashii , Drosophila bipectinata , and Drosophila ananassae. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.22.541758. [PMID: 37292993 PMCID: PMC10245892 DOI: 10.1101/2023.05.22.541758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Flow cytometry estimates of genome sizes among species of Drosophila show a 3-fold variation, ranging from ∼127 Mb in Drosophila mercatorum to ∼400 Mb in Drosophila cyrtoloma . However, the assembled portion of the Muller F Element (orthologous to the fourth chromosome in Drosophila melanogaster ) shows a nearly 14-fold variation in size, ranging from ∼1.3 Mb to > 18 Mb. Here, we present chromosome-level long read genome assemblies for four Drosophila species with expanded F Elements ranging in size from 2.3 Mb to 20.5 Mb. Each Muller Element is present as a single scaffold in each assembly. These assemblies will enable new insights into the evolutionary causes and consequences of chromosome size expansion.
Collapse
Affiliation(s)
- Wilson Leung
- Department of Biology, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Nicole Torosin
- Department of Genetics and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08854, USA
| | - Weihuan Cao
- Department of Genetics and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08854, USA
| | - Laura K Reed
- Department of Biological Sciences, The University of Alabama, Tuscaloosa, Alabama, 35487, USA
| | - Cindy Arrigo
- Department of Biology, New Jersey City University, Jersey City, NJ 07305, USA
| | - Sarah C R Elgin
- Department of Biology, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Christopher E Ellison
- Department of Genetics and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08854, USA
| |
Collapse
|
14
|
De La Cerda GY, Landis JB, Eifler E, Hernandez AI, Li F, Zhang J, Tribble CM, Karimi N, Chan P, Givnish T, Strickler SR, Specht CD. Balancing read length and sequencing depth: Optimizing Nanopore long-read sequencing for monocots with an emphasis on the Liliales. APPLICATIONS IN PLANT SCIENCES 2023; 11:e11524. [PMID: 37342170 PMCID: PMC10278932 DOI: 10.1002/aps3.11524] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 01/20/2023] [Accepted: 01/30/2023] [Indexed: 06/22/2023]
Abstract
Premise We present approaches used to generate long-read Nanopore sequencing reads for the Liliales and demonstrate how modifications to standard protocols directly impact read length and total output. The goal is to help those interested in generating long-read sequencing data determine which steps may be necessary for optimizing output and results. Methods Four species of Calochortus (Liliaceae) were sequenced. Modifications made to sodium dodecyl sulfate (SDS) extractions and cleanup protocols included grinding with a mortar and pestle, using cut or wide-bore tips, chloroform cleaning, bead cleaning, eliminating short fragments, and using highly purified DNA. Results Steps taken to maximize read length can decrease overall output. Notably, the number of pores in a flow cell is correlated with the overall output, yet we did not see an association between the pore number and the read length or the number of reads produced. Discussion Many factors contribute to the overall success of a Nanopore sequencing run. We showed the direct impact that several modifications to the DNA extraction and cleaning steps have on the total sequencing output, read size, and number of reads generated. We show a tradeoff between read length and the number of reads and, to a lesser extent, the total sequencing output, all of which are important factors for successful de novo genome assembly.
Collapse
Affiliation(s)
- Gisel Y. De La Cerda
- School of Integrative Plant Science, Section of Plant Biology and the L. H. Bailey HortoriumCornell UniversityIthacaNew York14853USA
| | - Jacob B. Landis
- School of Integrative Plant Science, Section of Plant Biology and the L. H. Bailey HortoriumCornell UniversityIthacaNew York14853USA
- BTI Computational Biology CenterBoyce Thompson InstituteIthacaNew York14853USA
| | - Evan Eifler
- Department of BotanyUniversity of Wisconsin–MadisonMadisonWisconsin53706USA
| | - Adriana I. Hernandez
- School of Integrative Plant Science, Section of Plant Biology and the L. H. Bailey HortoriumCornell UniversityIthacaNew York14853USA
| | - Fay‐Wei Li
- BTI Computational Biology CenterBoyce Thompson InstituteIthacaNew York14853USA
| | - Jing Zhang
- BTI Computational Biology CenterBoyce Thompson InstituteIthacaNew York14853USA
| | - Carrie M. Tribble
- School of Life SciencesUniversity of Hawaiʻi, MānoaHonoluluHawaiʻi96822USA
| | - Nisa Karimi
- Department of BotanyUniversity of Wisconsin–MadisonMadisonWisconsin53706USA
| | - Patricia Chan
- Department of BotanyUniversity of Wisconsin–MadisonMadisonWisconsin53706USA
| | - Thomas Givnish
- Department of BotanyUniversity of Wisconsin–MadisonMadisonWisconsin53706USA
| | - Susan R. Strickler
- BTI Computational Biology CenterBoyce Thompson InstituteIthacaNew York14853USA
- Present address:
Plant Science and ConservationChicago Botanic GardenGlencoeIllinois60022USA
- Present address:
Plant Biology and Conservation ProgramNorthwestern UniversityEvanstonIllinois60208USA
| | - Chelsea D. Specht
- School of Integrative Plant Science, Section of Plant Biology and the L. H. Bailey HortoriumCornell UniversityIthacaNew York14853USA
| |
Collapse
|
15
|
Ramesh B, Small CM, Healey H, Johnson B, Barker E, Currey M, Bassham S, Myers M, Cresko WA, Jones AG. Improvements to the Gulf pipefish Syngnathus scovelli genome. GIGABYTE 2023; 2023:gigabyte76. [PMID: 36969711 PMCID: PMC10038202 DOI: 10.46471/gigabyte.76] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 02/03/2023] [Indexed: 02/22/2023] Open
Abstract
The Gulf pipefish Syngnathus scovelli has emerged as an important species for studying sexual selection, development, and physiology. Comparative evolutionary genomics research involving fishes from Syngnathidae depends on having a high-quality genome assembly and annotation. However, the first S. scovelli genome assembled using short-read sequences and a smaller RNA-sequence dataset has limited contiguity and a relatively poor annotation. Here, using PacBio long-read high-fidelity sequences and a proximity ligation library, we generate an improved assembly to obtain 22 chromosome-level scaffolds. Compared to the first assembly, the gaps in the improved assembly are smaller, the N75 is larger, and our genome is ~95% BUSCO complete. Using a large body of RNA-Seq reads from different tissue types and NCBI's Eukaryotic Annotation Pipeline, we discovered 28,162 genes, of which 8,061 are non-coding genes. Our new genome assembly and annotation are tagged as a RefSeq genome by NCBI and provide enhanced resources for research work involving S. scovelli..
Collapse
Affiliation(s)
- Balan Ramesh
- Department of Biological Sciences, University of Idaho, Moscow, ID 83844, USA
| | - Clay M. Small
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
- Presidential Initiative in Data Science, University of Oregon, Eugene, OR 97403, USA
| | - Hope Healey
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
| | - Bernadette Johnson
- Department of Biological Sciences, University of Idaho, Moscow, ID 83844, USA
| | - Elyse Barker
- Department of Biological Sciences, University of Idaho, Moscow, ID 83844, USA
| | - Mark Currey
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
| | - Susan Bassham
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
| | - Megean Myers
- Department of Biological Sciences, University of Idaho, Moscow, ID 83844, USA
| | - William A. Cresko
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR 97403, USA
- Presidential Initiative in Data Science, University of Oregon, Eugene, OR 97403, USA
| | - Adam Gregory Jones
- Department of Biological Sciences, University of Idaho, Moscow, ID 83844, USA
| |
Collapse
|
16
|
Insight into the Organization of the B10v3 Cucumber Genome by Integration of Biological and Bioinformatic Data. Int J Mol Sci 2023; 24:ijms24044011. [PMID: 36835427 PMCID: PMC9961470 DOI: 10.3390/ijms24044011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 02/13/2023] [Accepted: 02/14/2023] [Indexed: 02/19/2023] Open
Abstract
The availability of a well-organized and annotated reference genome is essential for genome research and the analysis of re-sequencing approaches. The B10v3 cucumber (Cucumis sativus L.) reference genome has been sequenced and assembled into 8035 contigs, a small fraction of which have been mapped to individual chromosomes. Currently, bioinformatics methods based on comparative homology have made it possible to re-order the sequenced contigs by mapping them to the reference genomes. The B10v3 genome (North-European, Borszczagowski line) was rearranged against the genomes of cucumber 9930 ('Chinese Long' line) and Gy14 (North American line). Furthermore, a better insight into the organization of the B10v3 genome was obtained by integrating the data available in the literature on the assignment of contigs to chromosomes in the B10v3 genome with the results of the bioinformatic analysis. The combination of information on the markers used in the assembly of the B10v3 genome and the results of FISH and DArT-seq experiments confirmed the reliability of the in silico assignment. Approximately 98% of the protein-coding genes within the chromosomes were assigned and a significant proportion of the repetitive fragments in the sequenced B10v3 genome were identified using the RagTag programme. In addition, BLAST analyses provided comparative information between the B10v3 genome and the 9930 and Gy14 data sets. This revealed both similarities and differences in the functional proteins found between the coding sequences region in the genomes. This study contributes to better knowledge and understanding of cucumber genome line B10v3.
Collapse
|
17
|
Alonge M, Lebeigle L, Kirsche M, Jenike K, Ou S, Aganezov S, Wang X, Lippman ZB, Schatz MC, Soyk S. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol 2022; 23:258. [PMID: 36522651 PMCID: PMC9753292 DOI: 10.1186/s13059-022-02823-7] [Citation(s) in RCA: 284] [Impact Index Per Article: 94.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 11/28/2022] [Indexed: 12/23/2022] Open
Abstract
Advancing crop genomics requires efficient genetic systems enabled by high-quality personalized genome assemblies. Here, we introduce RagTag, a toolset for automating assembly scaffolding and patching, and we establish chromosome-scale reference genomes for the widely used tomato genotype M82 along with Sweet-100, a new rapid-cycling genotype that we developed to accelerate functional genomics and genome editing in tomato. This work outlines strategies to rapidly expand genetic systems and genomic resources in other plant species.
Collapse
Affiliation(s)
- Michael Alonge
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Ludivine Lebeigle
- Center for Integrative Genomics, University of Lausanne, CH-1015, Lausanne, Switzerland
| | - Melanie Kirsche
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Katie Jenike
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Shujun Ou
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Xingang Wang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Zachary B Lippman
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
- Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA.
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA.
- Department of Biology, Johns Hopkins University, Baltimore, MD, 21218, USA.
| | - Sebastian Soyk
- Center for Integrative Genomics, University of Lausanne, CH-1015, Lausanne, Switzerland.
| |
Collapse
|
18
|
Kukkar D, Sharma PK, Kim KH. Recent advances in metagenomic analysis of different ecological niches for enhanced biodegradation of recalcitrant lignocellulosic biomass. ENVIRONMENTAL RESEARCH 2022; 215:114369. [PMID: 36165858 DOI: 10.1016/j.envres.2022.114369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 09/06/2022] [Accepted: 09/15/2022] [Indexed: 06/16/2023]
Abstract
Lignocellulose wastes stemming from agricultural residues can offer an excellent opportunity as alternative energy solutions in addition to fossil fuels. Besides, the unrestrained burning of agricultural residues can lead to the destruction of the soil microflora and associated soil sterilization. However, the difficulties associated with the biodegradation of lignocellulose biomasses remain as a formidable challenge for their sustainable management. In this respect, metagenomics can be used as an effective option to resolve such dilemma because of its potential as the next generation sequencing technology and bioinformatics tools to harness novel microbial consortia from diverse environments (e.g., soil, alpine forests, and hypersaline/acidic/hot sulfur springs). In light of the challenges associated with the bulk-scale biodegradation of lignocellulose-rich agricultural residues, this review is organized to help delineate the fundamental aspects of metagenomics towards the assessment of the microbial consortia and novel molecules (such as biocatalysts) which are otherwise unidentifiable by conventional laboratory culturing techniques. The discussion is extended further to highlight the recent advancements (e.g., from 2011 to 2022) in metagenomic approaches for the isolation and purification of lignocellulolytic microbes from different ecosystems along with the technical challenges and prospects associated with their wide implementation and scale-up. This review should thus be one of the first comprehensive reports on the metagenomics-based analysis of different environmental samples for the isolation and purification of lignocellulose degrading enzymes.
Collapse
Affiliation(s)
- Deepak Kukkar
- Department of Biotechnology, Chandigarh University, Gharuan, Mohali - 140413, Punjab, India; University Centre for Research and Development, Chandigarh University, Gharuan, Mohali - 140413, Punjab, India.
| | | | - Ki-Hyun Kim
- Department of Civil and Environmental Engineering, Hanyang University, Seongdong-gu, Wangsimni-ro, Seoul - 04763, South Korea.
| |
Collapse
|
19
|
Harris RA, Raveendran M, Lyfoung DT, Sedlazeck FJ, Mahmoud M, Prall TM, Karl JA, Doddapaneni H, Meng Q, Han Y, Muzny D, Wiseman RW, O'Connor DH, Rogers J. Construction of a new chromosome-scale, long-read reference genome assembly for the Syrian hamster, Mesocricetus auratus. Gigascience 2022; 11:giac039. [PMID: 35640223 PMCID: PMC9155146 DOI: 10.1093/gigascience/giac039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Revised: 11/03/2021] [Accepted: 03/29/2022] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND The Syrian hamster (Mesocricetus auratus) has been suggested as a useful mammalian model for a variety of diseases and infections, including infection with respiratory viruses such as SARS-CoV-2. The MesAur1.0 genome assembly was generated in 2013 using whole-genome shotgun sequencing with short-read sequence data. Current more advanced sequencing technologies and assembly methods now permit the generation of near-complete genome assemblies with higher quality and greater continuity. FINDINGS Here, we report an improved assembly of the M. auratus genome (BCM_Maur_2.0) using Oxford Nanopore Technologies long-read sequencing to produce a chromosome-scale assembly. The total length of the new assembly is 2.46 Gb, similar to the 2.50-Gb length of a previous assembly of this genome, MesAur1.0. BCM_Maur_2.0 exhibits significantly improved continuity, with a scaffold N50 that is 6.7 times greater than MesAur1.0. Furthermore, 21,616 protein-coding genes and 10,459 noncoding genes are annotated in BCM_Maur_2.0 compared to 20,495 protein-coding genes and 4,168 noncoding genes in MesAur1.0. This new assembly also improves the unresolved regions as measured by nucleotide ambiguities, where ∼17.11% of bases in MesAur1.0 were unresolved compared to BCM_Maur_2.0, in which the number of unresolved bases is reduced to 3.00%. CONCLUSIONS Access to a more complete reference genome with improved accuracy and continuity will facilitate more detailed, comprehensive, and meaningful research results for a wide variety of future studies using Syrian hamsters as models.
Collapse
Affiliation(s)
- R Alan Harris
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Muthuswamy Raveendran
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Dustin T Lyfoung
- Wisconsin National Primate Research Center, University of Wisconsin, 1220 Capitol Court, Madison, WI 53711, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Trent M Prall
- Department of Pathology and Laboratory Medicine, University of Wisconsin, 3170 UW Medical Foundation Centennial Building (MFCB), 1685 Highland Avenue, Madison, WI 53711, USA
| | - Julie A Karl
- Department of Pathology and Laboratory Medicine, University of Wisconsin, 3170 UW Medical Foundation Centennial Building (MFCB), 1685 Highland Avenue, Madison, WI 53711, USA
| | - Harshavardhan Doddapaneni
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Qingchang Meng
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Yi Han
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Donna Muzny
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Roger W Wiseman
- Wisconsin National Primate Research Center, University of Wisconsin, 1220 Capitol Court, Madison, WI 53711, USA
- Department of Pathology and Laboratory Medicine, University of Wisconsin, 3170 UW Medical Foundation Centennial Building (MFCB), 1685 Highland Avenue, Madison, WI 53711, USA
| | - David H O'Connor
- Wisconsin National Primate Research Center, University of Wisconsin, 1220 Capitol Court, Madison, WI 53711, USA
- Department of Pathology and Laboratory Medicine, University of Wisconsin, 3170 UW Medical Foundation Centennial Building (MFCB), 1685 Highland Avenue, Madison, WI 53711, USA
| | - Jeffrey Rogers
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
20
|
Wittmeyer KT, Oppenheim SJ, Hopper KR. Assemblies of the genomes of parasitic wasps using meta-assembly and scaffolding with genetic linkage. G3 (BETHESDA, MD.) 2021; 12:6423991. [PMID: 34751385 PMCID: PMC8727961 DOI: 10.1093/g3journal/jkab386] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Accepted: 10/25/2021] [Indexed: 01/09/2023]
Abstract
Safe, effective biological-control introductions against invasive pests depend on narrowly host-specific natural enemies with the ability to adapt to a changing environment. As part of a project on the genetic architectures of these traits, we assembled and annotated the genomes of two aphid parasitoids, Aphelinus atriplicis and Aphelinus certus. We report here several assemblies of A. atriplicis made with Illumina and PacBio data, which we combined into a meta-assembly. We scaffolded the meta-assembly with markers from a genetic map of hybrids between A. atriplicis and A. certus. We used this genetic-linkage scaffolded (GLS) assembly of A. atriplicis to scaffold a de novo assembly of A. certus. The de novo assemblies of A. atriplicis differed in contiguity, and the meta-assembly of these assemblies was more contiguous than the best de novo assembly. Scaffolding with genetic-linkage data allowed chromosomal-level assembly of the A. atriplicis genome and scaffolding a de novo assembly of A. certus with this GLS assembly, greatly increased the contiguity of the A. certus assembly to the point where it was also at the chromosomal-level. However, completeness of the A. atriplicis assembly, as measured by percent complete, single-copy BUSCO hymenopteran genes, varied little among de novo assemblies and was not increased by meta-assembly or genetic scaffolding. Furthermore, the greater contiguity of the meta-assembly and GLS assembly had little or no effect on the numbers of genes identified, the proportions with homologs or functional annotations. Increased contiguity of the A. certus assembly provided modest improvement in assembly completeness, as measured by percent complete, single-copy BUSCO hymenopteran genes. The total genic sequence increased, and while the number of genes declined, gene length increased, which together suggest greater accuracy of gene models. More contiguous assemblies provide uses other than gene annotation, for example, identifying the genes associated with quantitative trait loci and understanding of chromosomal rearrangements associated with speciation.
Collapse
Affiliation(s)
- Kameron T Wittmeyer
- USDA-ARS, Beneficial Insect Introductions Research Unit, Newark, DE 19713, USA
| | | | - Keith R Hopper
- USDA-ARS, Beneficial Insect Introductions Research Unit, Newark, DE 19713, USA,Corresponding author: USDA-ARS, Beneficial Insect Introductions Research Unit, 501 South Chapel Street, Newark, DE 19713, USA.
| |
Collapse
|
21
|
Thind AS, Monga I, Thakur PK, Kumari P, Dindhoria K, Krzak M, Ranson M, Ashford B. Demystifying emerging bulk RNA-Seq applications: the application and utility of bioinformatic methodology. Brief Bioinform 2021; 22:6330938. [PMID: 34329375 DOI: 10.1093/bib/bbab259] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Revised: 06/14/2021] [Accepted: 06/18/2021] [Indexed: 12/13/2022] Open
Abstract
Significant innovations in next-generation sequencing techniques and bioinformatics tools have impacted our appreciation and understanding of RNA. Practical RNA sequencing (RNA-Seq) applications have evolved in conjunction with sequence technology and bioinformatic tools advances. In most projects, bulk RNA-Seq data is used to measure gene expression patterns, isoform expression, alternative splicing and single-nucleotide polymorphisms. However, RNA-Seq holds far more hidden biological information including details of copy number alteration, microbial contamination, transposable elements, cell type (deconvolution) and the presence of neoantigens. Recent novel and advanced bioinformatic algorithms developed the capacity to retrieve this information from bulk RNA-Seq data, thus broadening its scope. The focus of this review is to comprehend the emerging bulk RNA-Seq-based analyses, emphasizing less familiar and underused applications. In doing so, we highlight the power of bulk RNA-Seq in providing biological insights.
Collapse
Affiliation(s)
- Amarinder Singh Thind
- University of Wollongong, Wollongong, Australia.,Illawarra Health and Medical Research Institute, Wollongong, Australia
| | - Isha Monga
- Columbia University, New York City, NY, USA
| | | | - Pallawi Kumari
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Kiran Dindhoria
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | | | - Marie Ranson
- University of Wollongong, Wollongong, Australia.,Illawarra Health and Medical Research Institute, Wollongong, Australia
| | - Bruce Ashford
- University of Wollongong, Wollongong, Australia.,Illawarra Health and Medical Research Institute, Wollongong, Australia
| |
Collapse
|
22
|
Schrinner S, Goel M, Wulfert M, Spohr P, Schneeberger K, Klau GW. Using the longest run subsequence problem within homology-based scaffolding. Algorithms Mol Biol 2021; 16:11. [PMID: 34183036 PMCID: PMC8240273 DOI: 10.1186/s13015-021-00191-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 06/05/2021] [Indexed: 12/02/2022] Open
Abstract
Genome assembly is one of the most important problems in computational genomics. Here, we suggest addressing an issue that arises in homology-based scaffolding, that is, when linking and ordering contigs to obtain larger pseudo-chromosomes by means of a second incomplete assembly of a related species. The idea is to use alignments of binned regions in one contig to find the most homologous contig in the other assembly. We show that ordering the contigs of the other assembly can be expressed by a new string problem, the longest run subsequence problem (LRS). We show that LRS is NP-hard and present reduction rules and two algorithmic approaches that, together, are able to solve large instances of LRS to provable optimality. All data used in the experiments as well as our source code are freely available. We demonstrate its usefulness within an existing larger scaffolding approach by solving realistic instances resulting from partial Arabidopsis thaliana assemblies in short computation time.
Collapse
|
23
|
Larson PA, Bartlett ML, Garcia K, Chitty J, Balkema-Buschmann A, Towner J, Kugelman J, Palacios G, Sanchez-Lockhart M. Genomic features of humoral immunity support tolerance model in Egyptian rousette bats. Cell Rep 2021; 35:109140. [PMID: 34010652 DOI: 10.1016/j.celrep.2021.109140] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 10/08/2020] [Accepted: 04/26/2021] [Indexed: 01/05/2023] Open
Abstract
Bats asymptomatically harbor many viruses that can cause severe human diseases. The Egyptian rousette bat (ERB) is the only known reservoir for Marburgviruses and Sosuga virus, making it an exceptional animal model to study antiviral mechanisms in an asymptomatic host. With this goal in mind, we constructed and annotated the immunoglobulin heavy chain locus, finding an expansion on immunoglobulin variable genes associated with protective human antibodies to different viruses. We also annotated two functional and distinct immunoglobulin epsilon genes and four distinctive functional immunoglobulin gamma genes. We described the Fc receptor repertoire in ERBs, including features that may affect activation potential, and discovered the lack of evolutionary conserved short pentraxins. These findings reinforce the hypothesis that a differential threshold of regulation and/or absence of key immune mediators may promote tolerance and decrease inflammation in ERBs.
Collapse
Affiliation(s)
- Peter A Larson
- Center for Genome Sciences, US Army Medical Research Institute of Infectious Diseases, Frederick, MD 21702, USA
| | - Maggie L Bartlett
- Center for Genome Sciences, US Army Medical Research Institute of Infectious Diseases, Frederick, MD 21702, USA; Department of Pathology and Microbiology, University of Nebraska Medical Center, Omaha, NE 68198, USA
| | - Karla Garcia
- Center for Genome Sciences, US Army Medical Research Institute of Infectious Diseases, Frederick, MD 21702, USA; Department of Pathology and Microbiology, University of Nebraska Medical Center, Omaha, NE 68198, USA
| | - Joseph Chitty
- Center for Genome Sciences, US Army Medical Research Institute of Infectious Diseases, Frederick, MD 21702, USA
| | | | - Jonathan Towner
- Viral Special Pathogens Branch, Centers for Disease Control and Prevention, Atlanta, GA 30329, USA
| | - Jeffrey Kugelman
- Center for Genome Sciences, US Army Medical Research Institute of Infectious Diseases, Frederick, MD 21702, USA
| | - Gustavo Palacios
- Center for Genome Sciences, US Army Medical Research Institute of Infectious Diseases, Frederick, MD 21702, USA.
| | - Mariano Sanchez-Lockhart
- Center for Genome Sciences, US Army Medical Research Institute of Infectious Diseases, Frederick, MD 21702, USA; Department of Pathology and Microbiology, University of Nebraska Medical Center, Omaha, NE 68198, USA.
| |
Collapse
|
24
|
Xie L, Wong L. PDR: a new genome assembly evaluation metric based on genetics concerns. Bioinformatics 2021; 37:289-295. [PMID: 32761066 DOI: 10.1093/bioinformatics/btaa704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Revised: 06/30/2020] [Accepted: 07/30/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Existing genome assembly evaluation metrics provide only limited insight on specific aspects of genome assembly quality, and sometimes even disagree with each other. For better integrative comparison between assemblies, we propose, here, a new genome assembly evaluation metric, Pairwise Distance Reconstruction (PDR). It derives from a common concern in genetic studies, and takes completeness, contiguity, and correctness into consideration. We also propose an approximation implementation to accelerate PDR computation. RESULTS Our results on publicly available datasets affirm PDR's ability to integratively assess the quality of a genome assembly. In fact, this is guaranteed by its definition. The results also indicated the error introduced by approximation is extremely small and thus negligible. AVAILABILITYAND IMPLEMENTATION https://github.com/XLuyu/PDR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Luyu Xie
- Department of Computer Science, School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Limsoon Wong
- Department of Computer Science, School of Computing, National University of Singapore, Singapore 117417, Singapore
| |
Collapse
|
25
|
Marla SS, Mishra P, Maurya R, Singh M, Wankhede DP, Kumar A, Yadav MC, Subbarao N, Singh SK, Kumar R. Refinement of Draft Genome Assemblies of Pigeonpea ( Cajanus cajan). Front Genet 2020; 11:607432. [PMID: 33384719 PMCID: PMC7770131 DOI: 10.3389/fgene.2020.607432] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 11/23/2020] [Indexed: 11/13/2022] Open
Abstract
Genome assembly of short reads from large plant genomes remains a challenge in computational biology despite major developments in next generation sequencing. Of late several draft assemblies have been reported in sequenced plant genomes. The reported draft genome assemblies of Cajanus cajan have different levels of genome completeness, a large number of repeats, gaps, and segmental duplications. Draft assemblies with portions of genome missing are shorter than the referenced original genome. These assemblies come with low map accuracy affecting further functional annotation and the prediction of gene components as desired by crop researchers. Genome coverage, i.e., the number of sequenced raw reads mapped onto a certain location of the genome is an important quality indicator of completeness and assembly quality in draft assemblies. The present work aimed to improve the coverage in reported de novo sequenced draft genomes (GCA_000340665.1 and GCA_000230855.2) of pigeonpea, a legume widely cultivated in India. The two recently sequenced assemblies, A1 and A2 comprised 72% and 75% of the estimated coverage of the genome, respectively. We employed an assembly reconciliation approach to compare the draft assemblies and merge them, filling the gaps by employing an algorithm size sorting mate-pair library to generate a high quality and near complete assembly with enhanced contiguity. The majority of gaps present within scaffolds were filled with right-sized mate-pair reads. The improved assembly reduced the number of gaps than those reported in draft assemblies resulting in an improved genome coverage of 82.4%. Map accuracy of the improved assembly was evaluated using various quality metrics and for the presence of specific trait-related functional genes. Employed pair-end and mate-pair local libraries helped us to reduce gaps, repeats, and other sequence errors resulting in lengthier scaffolds compared to the two draft assemblies. We reported the prediction of putative host resistance genes against Fusarium wilt disease by their performance and evaluated them both in wet laboratory and field phenotypic conditions.
Collapse
Affiliation(s)
- Soma S. Marla
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| | - Pallavi Mishra
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| | - Ranjeet Maurya
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| | - Mohar Singh
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| | | | - Anil Kumar
- Directorate of Education, Rani Lakshmi Bai Central Agricultural University, Jhansi, India
| | - Mahesh C. Yadav
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| | - N. Subbarao
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Sanjeev K. Singh
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| | - Rajesh Kumar
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| |
Collapse
|
26
|
Jung H, Ventura T, Chung JS, Kim WJ, Nam BH, Kong HJ, Kim YO, Jeon MS, Eyun SI. Twelve quick steps for genome assembly and annotation in the classroom. PLoS Comput Biol 2020; 16:e1008325. [PMID: 33180771 PMCID: PMC7660529 DOI: 10.1371/journal.pcbi.1008325] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Eukaryotic genome sequencing and de novo assembly, once the exclusive domain of well-funded international consortia, have become increasingly affordable, thus fitting the budgets of individual research groups. Third-generation long-read DNA sequencing technologies are increasingly used, providing extensive genomic toolkits that were once reserved for a few select model organisms. Generating high-quality genome assemblies and annotations for many aquatic species still presents significant challenges due to their large genome sizes, complexity, and high chromosome numbers. Indeed, selecting the most appropriate sequencing and software platforms and annotation pipelines for a new genome project can be daunting because tools often only work in limited contexts. In genomics, generating a high-quality genome assembly/annotation has become an indispensable tool for better understanding the biology of any species. Herein, we state 12 steps to help researchers get started in genome projects by presenting guidelines that are broadly applicable (to any species), sustainable over time, and cover all aspects of genome assembly and annotation projects from start to finish. We review some commonly used approaches, including practical methods to extract high-quality DNA and choices for the best sequencing platforms and library preparations. In addition, we discuss the range of potential bioinformatics pipelines, including structural and functional annotations (e.g., transposable elements and repetitive sequences). This paper also includes information on how to build a wide community for a genome project, the importance of data management, and how to make the data and results Findable, Accessible, Interoperable, and Reusable (FAIR) by submitting them to a public repository and sharing them with the research community.
Collapse
Affiliation(s)
- Hyungtaek Jung
- School of Biological Sciences, The University of Queensland, St Lucia, Queensland, Australia
- Centre for Agriculture and Bioeconomy, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Tomer Ventura
- Genecology Research Centre, School of Science and Engineering, University of the Sunshine Coast, Sippy Downs, Queensland, Australia
| | - J. Sook Chung
- Institute of Marine and Environmental Technology, University of Maryland Center for Environmental Science, Baltimore, Maryland, United States of America
| | - Woo-Jin Kim
- Genetics and Breeding Research Center, National Institute of Fisheries Science, Geoje, Korea
| | - Bo-Hye Nam
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Hee Jeong Kong
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Young-Ok Kim
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Min-Seung Jeon
- Department of Life Science, Chung-Ang University, Seoul, Korea
| | - Seong-il Eyun
- Department of Life Science, Chung-Ang University, Seoul, Korea
| |
Collapse
|
27
|
Naranpanawa DNU, Chandrasekara CHWMRB, Bandaranayake PCG, Bandaranayake AU. Raw transcriptomics data to gene specific SSRs: a validated free bioinformatics workflow for biologists. Sci Rep 2020; 10:18236. [PMID: 33106560 PMCID: PMC7588437 DOI: 10.1038/s41598-020-75270-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2019] [Accepted: 09/21/2020] [Indexed: 02/07/2023] Open
Abstract
Recent advances in next-generation sequencing technologies have paved the path for a considerable amount of sequencing data at a relatively low cost. This has revolutionized the genomics and transcriptomics studies. However, different challenges are now created in handling such data with available bioinformatics platforms both in assembly and downstream analysis performed in order to infer correct biological meaning. Though there are a handful of commercial software and tools for some of the procedures, cost of such tools has made them prohibitive for most research laboratories. While individual open-source or free software tools are available for most of the bioinformatics applications, those components usually operate standalone and are not combined for a user-friendly workflow. Therefore, beginners in bioinformatics might find analysis procedures starting from raw sequence data too complicated and time-consuming with the associated learning-curve. Here, we outline a procedure for de novo transcriptome assembly and Simple Sequence Repeats (SSR) primer design solely based on tools that are available online for free use. For validation of the developed workflow, we used Illumina HiSeq reads of different tissue samples of Santalum album (sandalwood), generated from a previous transcriptomics project. A portion of the designed primers were tested in the lab with relevant samples and all of them successfully amplified the targeted regions. The presented bioinformatics workflow can accurately assemble quality transcriptomes and develop gene specific SSRs. Beginner biologists and researchers in bioinformatics can easily utilize this workflow for research purposes.
Collapse
Affiliation(s)
- D N U Naranpanawa
- Agricultural Biotechnology Centre, Faculty of Agriculture, University of Peradeniya, Peradeniya, 20400, Sri Lanka
- Postgraduate Institute of Science, University of Peradeniya, Peradeniya, 20400, Sri Lanka
| | - C H W M R B Chandrasekara
- Agricultural Biotechnology Centre, Faculty of Agriculture, University of Peradeniya, Peradeniya, 20400, Sri Lanka
| | - P C G Bandaranayake
- Agricultural Biotechnology Centre, Faculty of Agriculture, University of Peradeniya, Peradeniya, 20400, Sri Lanka
| | - A U Bandaranayake
- Department of Computer Engineering, Faculty of Engineering, University of Peradeniya, Peradeniya, 20400, Sri Lanka.
| |
Collapse
|
28
|
Yu T, Mu Z, Fang Z, Liu X, Gao X, Liu J. TransBorrow: genome-guided transcriptome assembly by borrowing assemblies from different assemblers. Genome Res 2020; 30:1181-1190. [PMID: 32817072 PMCID: PMC7462071 DOI: 10.1101/gr.257766.119] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 06/18/2020] [Indexed: 12/12/2022]
Abstract
RNA-seq technology is widely used in various transcriptomic studies and provides great opportunities to reveal the complex structures of transcriptomes. To effectively analyze RNA-seq data, we introduce a novel transcriptome assembler, TransBorrow, which borrows the assemblies from different assemblers to search for reliable subsequences by building a colored graph from those borrowed assemblies. Then, by seeding reliable subsequences, a newly designed path extension strategy accurately searches for a transcript-representing path cover over each splicing graph. TransBorrow was tested on both simulated and real data sets and showed great superiority over all the compared leading assemblers.
Collapse
Affiliation(s)
- Ting Yu
- School of Mathematics and Statistics, Shandong University (Weihai), Weihai 264209, China
| | - Zengchao Mu
- School of Mathematics and Statistics, Shandong University (Weihai), Weihai 264209, China
| | - Zhaoyuan Fang
- Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, China
| | - Xiaoping Liu
- School of Mathematics and Statistics, Shandong University (Weihai), Weihai 264209, China
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Juntao Liu
- School of Mathematics and Statistics, Shandong University (Weihai), Weihai 264209, China
| |
Collapse
|
29
|
Jung H, Jeon MS, Hodgett M, Waterhouse P, Eyun SI. Comparative Evaluation of Genome Assemblers from Long-Read Sequencing for Plants and Crops. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2020; 68:7670-7677. [PMID: 32530283 DOI: 10.1021/acs.jafc.0c01647] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
The availability of recent state-of-the-art long-read sequencing technologies has significantly increased the ease and speed of producing high-quality plant genome assemblies. A wide variety of genome-related software tools are now available and they are typically benchmarked using microbial or model eukaryotic genomes such as Arabidopsis and rice. However, many plant species have much larger and more complex genomes than these, and the choice of tools, parameters, and/or strategies that can be used is not always obvious. Thus, we have compared the metrics of assemblies generated by various pipelines to discuss how assembly quality can be affected by two different assembly strategies. First, we focused on optimizing read preprocessing and assembler variables using eight different de novo assemblers on five different Pacific Biosciences long-read datasets of diploid and tetraploid species. Then, we examined a single scaffolding tool (quickmerge) that has been employed for the postprocessing step. We then merged the outputs from multiple assemblies to produce a higher quality consensus assembly. Then, we benchmarked the assemblies for completeness and accuracy (assembly metrics and BUSCO), computer memory, and CPU times. Two lightweight assemblers, Miniasm/Minimap/Racon and WTDBG, were deemed good for novice users because they involved smaller required learning curves and light computational resources. However, two heavyweight tools, CANU and Flye, should be the first choice when the goal is to achieve accurate and complete assemblies. Our results will provide valuable guidance in future plant genome projects and beyond.
Collapse
Affiliation(s)
- Hyungtaek Jung
- Centre for Agriculture and Biocommodities, Queensland University of Technology, Brisbane, Queensland 4001, Australia
| | - Min-Seung Jeon
- Department of Life Science, Chung-Ang University, Seoul 06974, Korea
| | - Matthew Hodgett
- Information Technology Services, Queensland University of Technology, Brisbane, Queensland 4001, Australia
| | - Peter Waterhouse
- Centre for Agriculture and Biocommodities, Queensland University of Technology, Brisbane, Queensland 4001, Australia
| | - Seong-Il Eyun
- Department of Life Science, Chung-Ang University, Seoul 06974, Korea
| |
Collapse
|
30
|
instaGRAAL: chromosome-level quality scaffolding of genomes using a proximity ligation-based scaffolder. Genome Biol 2020; 21:148. [PMID: 32552806 PMCID: PMC7386250 DOI: 10.1186/s13059-020-02041-z] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 05/11/2020] [Indexed: 02/06/2023] Open
Abstract
Hi-C exploits contact frequencies between pairs of loci to bridge and order contigs during genome assembly, resulting in chromosome-level assemblies. Because few robust programs are available for this type of data, we developed instaGRAAL, a complete overhaul of the GRAAL program, which has adapted the latter to allow efficient assembly of large genomes. instaGRAAL features a number of improvements over GRAAL, including a modular correction approach that optionally integrates independent data. We validate the program using data for two brown algae, and human, to generate near-complete assemblies with minimal human intervention.
Collapse
|
31
|
Gruenstaeudl M, Jenke N. PACVr: plastome assembly coverage visualization in R. BMC Bioinformatics 2020; 21:207. [PMID: 32448146 PMCID: PMC7245912 DOI: 10.1186/s12859-020-3475-0] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 03/31/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Plastid genomes typically display a circular, quadripartite structure with two inverted repeat regions, which challenges automatic assembly procedures. The correct assembly of plastid genomes is a prerequisite for the validity of subsequent analyses on genome structure and evolution. The average coverage depth of a genome assembly is often used as an indicator of assembly quality. Visualizing coverage depth across a draft genome is a critical step, which allows users to inspect the quality of the assembly and, where applicable, identify regions of reduced assembly confidence. Despite the interplay between genome structure and assembly quality, no contemporary, user-friendly software tool can visualize the coverage depth of a plastid genome assembly while taking its quadripartite genome structure into account. A software tool is needed that fills this void. RESULTS We introduce 'PACVr', an R package that visualizes the coverage depth of a plastid genome assembly in relation to the circular, quadripartite structure of the genome as well as the individual plastome genes. By using a variable window approach, the tool allows visualizations on different calculation scales. It also confirms sequence equality of, as well as visualizes gene synteny between, the inverted repeat regions of the input genome. As a tool for plastid genomics, PACVr provides the functionality to identify regions of coverage depth above or below user-defined threshold values and helps to identify non-identical IR regions. To allow easy integration into bioinformatic workflows, PACVr can be invoked from a Unix shell, facilitating its use in automated quality control. We illustrate the application of PACVr on four empirical datasets and compare visualizations generated by PACVr with those of alternative software tools. CONCLUSIONS PACVr provides a user-friendly tool to visualize (a) the coverage depth of a plastid genome assembly on a circular, quadripartite plastome map and in relation to individual plastome genes, and (b) gene synteny across the inverted repeat regions. It contributes to optimizing plastid genome assemblies and increasing the reliability of publicly available plastome sequences. The software, example datasets, technical documentation, and a tutorial are available with the package at https://cran.r-project.org/package=PACVr.
Collapse
Affiliation(s)
- Michael Gruenstaeudl
- Institut für Biologie, Systematische Botanik und Pflanzengeographie, Freie Universität Berlin, Berlin, 14195 Germany
| | - Nils Jenke
- Institut für Bioinformatik, Freie Universität Berlin, Berlin, 14195 Germany
| |
Collapse
|
32
|
Moran RL, Catchen JM, Fuller RC. Genomic Resources for Darters (Percidae: Etheostominae) Provide Insight into Postzygotic Barriers Implicated in Speciation. Mol Biol Evol 2020; 37:711-729. [PMID: 31688927 PMCID: PMC7038671 DOI: 10.1093/molbev/msz260] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Comparative genomic approaches are increasingly being used to study the evolution of reproductive barriers in nonmodel species. Although numerous studies have examined prezygotic isolation in darters (Percidae), investigations into postzygotic barriers have remained rare due to long generation times and a lack of genomic resources. Orangethroat and rainbow darters naturally hybridize and provide a remarkable example of male-driven speciation via character displacement. Backcross hybrids suffer from high mortality, which appears to promote behavioral isolation in sympatry. To investigate the genomic architecture of postzygotic isolation, we used Illumina and PacBio sequencing to generate a chromosome-level, annotated assembly of the orangethroat darter genome and high-density linkage maps for orangethroat and rainbow darters. We also analyzed genome-wide RADseq data from wild-caught adults of both species and laboratory-generated backcrosses to identify genomic regions associated with hybrid incompatibles. Several putative chromosomal translocations and inversions were observed between orangethroat and rainbow darters, suggesting structural rearrangements may underlie postzygotic isolation. We also found evidence of selection against recombinant haplotypes and transmission ratio distortion in backcross hybrid genomes, providing further insight into the genomic architecture of genetic incompatibilities. Notably, regions with high levels of genetic divergence between species were enriched for genes associated with developmental and meiotic processes, providing strong candidates for postzygotic isolating barriers. These findings mark significant contributions to our understanding of the genetic basis of reproductive isolation between species undergoing character displacement. Furthermore, the genomic resources presented here will be instrumental for studying speciation in darters, the most diverse vertebrate group in North America.
Collapse
Affiliation(s)
- Rachel L Moran
- Program in Ecology, Evolution, and Conservation Biology, Department of Animal Biology, University of Illinois at Urbana-Champaign, Champaign, IL
- Department of Ecology, Evolution, and Behavior, University of Minnesota, St. Paul, MN
| | - Julian M Catchen
- Program in Ecology, Evolution, and Conservation Biology, Department of Animal Biology, University of Illinois at Urbana-Champaign, Champaign, IL
| | - Rebecca C Fuller
- Program in Ecology, Evolution, and Conservation Biology, Department of Animal Biology, University of Illinois at Urbana-Champaign, Champaign, IL
| |
Collapse
|
33
|
Molina-Mora JA, Campos-Sánchez R, Rodríguez C, Shi L, García F. High quality 3C de novo assembly and annotation of a multidrug resistant ST-111 Pseudomonas aeruginosa genome: Benchmark of hybrid and non-hybrid assemblers. Sci Rep 2020; 10:1392. [PMID: 31996747 PMCID: PMC6989561 DOI: 10.1038/s41598-020-58319-6] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Accepted: 01/06/2020] [Indexed: 12/14/2022] Open
Abstract
Genotyping methods and genome sequencing are indispensable to reveal genomic structure of bacterial species displaying high level of genome plasticity. However, reconstruction of genome or assembly is not straightforward due to data complexity, including repeats, mobile and accessory genetic elements of bacterial genomes. Moreover, since the solution to this problem is strongly influenced by sequencing technology, bioinformatics pipelines, and selection criteria to assess assemblers, there is no systematic way to select a priori the optimal assembler and parameter settings. To assembly the genome of Pseudomonas aeruginosa strain AG1 (PaeAG1), short reads (Illumina) and long reads (Oxford Nanopore) sequencing data were used in 13 different non-hybrid and hybrid approaches. PaeAG1 is a multiresistant high-risk sequence type 111 (ST-111) clone that was isolated from a Costa Rican hospital and it was the first report of an isolate of P. aeruginosa carrying both blaVIM-2 and blaIMP-18 genes encoding for metallo-β-lactamases (MBL) enzymes. To assess the assemblies, multiple metrics regard to contiguity, correctness and completeness (3C criterion, as we define here) were used for benchmarking the 13 approaches and select a definitive assembly. In addition, annotation was done to identify genes (coding and RNA regions) and to describe the genomic content of PaeAG1. Whereas long reads and hybrid approaches showed better performances in terms of contiguity, higher correctness and completeness metrics were obtained for short read only and hybrid approaches. A manually curated and polished hybrid assembly gave rise to a single circular sequence with 100% of core genes and known regions identified, >98% of reads mapped back, no gaps, and uniform coverage. The strategy followed to obtain this high-quality 3C assembly is detailed in the manuscript and we provide readers with an all-in-one script to replicate our results or to apply it to other troublesome cases. The final 3C assembly revealed that the PaeAG1 genome has 7,190,208 bp, a 65.7% GC content and 6,709 genes (6,620 coding sequences), many of which are included in multiple mobile genomic elements, such as 57 genomic islands, six prophages, and two complete integrons with blaVIM-2 and blaIMP-18 MBL genes. Up to 250 and 60 of the predicted genes are anticipated to play a role in virulence (adherence, quorum sensing and secretion) or antibiotic resistance (β-lactamases, efflux pumps, etc). Altogether, the assembly and annotation of the PaeAG1 genome provide new perspectives to continue studying the genomic diversity and gene content of this important human pathogen.
Collapse
Affiliation(s)
- José Arturo Molina-Mora
- Centro de Investigación en Enfermedades Tropicales, Facultad de Microbiología, Universidad de Costa Rica, San José, Costa Rica.
| | - Rebeca Campos-Sánchez
- Centro de Investigación en Biología Celular y Molecular, Facultad de Microbiología, Universidad de Costa Rica, San José, Costa Rica
| | - César Rodríguez
- Centro de Investigación en Enfermedades Tropicales, Facultad de Microbiología, Universidad de Costa Rica, San José, Costa Rica
| | - Leming Shi
- Human Phenome Institute of Fudan University, Shanghai, China
| | - Fernando García
- Centro de Investigación en Enfermedades Tropicales, Facultad de Microbiología, Universidad de Costa Rica, San José, Costa Rica
| |
Collapse
|
34
|
Marla SS, Mishra P, Maurya R, Singh M, Wankhede DP, Kumar A, Yadav MC, Subbarao N, Singh SK, Kumar R. Refinement of Draft Genome Assemblies of Pigeonpea ( Cajanus cajan). Front Genet 2020. [PMID: 33384719 DOI: 10.1101/2020.08.10.243949] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/28/2023] Open
Abstract
Genome assembly of short reads from large plant genomes remains a challenge in computational biology despite major developments in next generation sequencing. Of late several draft assemblies have been reported in sequenced plant genomes. The reported draft genome assemblies of Cajanus cajan have different levels of genome completeness, a large number of repeats, gaps, and segmental duplications. Draft assemblies with portions of genome missing are shorter than the referenced original genome. These assemblies come with low map accuracy affecting further functional annotation and the prediction of gene components as desired by crop researchers. Genome coverage, i.e., the number of sequenced raw reads mapped onto a certain location of the genome is an important quality indicator of completeness and assembly quality in draft assemblies. The present work aimed to improve the coverage in reported de novo sequenced draft genomes (GCA_000340665.1 and GCA_000230855.2) of pigeonpea, a legume widely cultivated in India. The two recently sequenced assemblies, A1 and A2 comprised 72% and 75% of the estimated coverage of the genome, respectively. We employed an assembly reconciliation approach to compare the draft assemblies and merge them, filling the gaps by employing an algorithm size sorting mate-pair library to generate a high quality and near complete assembly with enhanced contiguity. The majority of gaps present within scaffolds were filled with right-sized mate-pair reads. The improved assembly reduced the number of gaps than those reported in draft assemblies resulting in an improved genome coverage of 82.4%. Map accuracy of the improved assembly was evaluated using various quality metrics and for the presence of specific trait-related functional genes. Employed pair-end and mate-pair local libraries helped us to reduce gaps, repeats, and other sequence errors resulting in lengthier scaffolds compared to the two draft assemblies. We reported the prediction of putative host resistance genes against Fusarium wilt disease by their performance and evaluated them both in wet laboratory and field phenotypic conditions.
Collapse
Affiliation(s)
- Soma S Marla
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| | - Pallavi Mishra
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| | - Ranjeet Maurya
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| | - Mohar Singh
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| | | | - Anil Kumar
- Directorate of Education, Rani Lakshmi Bai Central Agricultural University, Jhansi, India
| | - Mahesh C Yadav
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| | - N Subbarao
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Sanjeev K Singh
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| | - Rajesh Kumar
- Indian Council for Agricultural Research (ICAR)-National Bureau of Plant Genetic Resources, New Delhi, India
| |
Collapse
|
35
|
Pan W, Wanamaker SI, Ah-Fong AMV, Judelson HS, Lonardi S. Novo&Stitch: accurate reconciliation of genome assemblies via optical maps. Bioinformatics 2019; 34:i43-i51. [PMID: 29949964 PMCID: PMC6022655 DOI: 10.1093/bioinformatics/bty255] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Motivation De novo genome assembly is a challenging computational problem due to the high repetitive content of eukaryotic genomes and the imperfections of sequencing technologies (i.e. sequencing errors, uneven sequencing coverage and chimeric reads). Several assembly tools are currently available, each of which has strengths and weaknesses in dealing with the trade-off between maximizing contiguity and minimizing assembly errors (e.g. mis-joins). To obtain the best possible assembly, it is common practice to generate multiple assemblies from several assemblers and/or parameter settings and try to identify the highest quality assembly. Unfortunately, often there is no assembly that both maximizes contiguity and minimizes assembly errors, so one has to compromise one for the other. Results The concept of assembly reconciliation has been proposed as a way to obtain a higher quality assembly by merging or reconciling all the available assemblies. While several reconciliation methods have been introduced in the literature, we have shown in one of our recent papers that none of them can consistently produce assemblies that are better than the assemblies provided in input. Here we introduce Novo&Stitch, a novel method that takes advantage of optical maps to accurately carry out assembly reconciliation (assuming that the assembled contigs are sufficiently long to be reliably aligned to the optical maps, e.g. 50 Kbp or longer). Experimental results demonstrate that Novo&Stitch can double the contiguity (N50) of the input assemblies without introducing mis-joins or reducing genome completeness. Availability and implementation Novo&Stitch can be obtained from https://github.com/ucrbioinfo/Novo_Stitch.
Collapse
Affiliation(s)
- Weihua Pan
- Department of Computer Science and Engineering, UC Riverside, CA, USA
| | | | | | - Howard S Judelson
- Department of Plant Pathology and Microbiology, UC Riverside, CA, USA
| | - Stefano Lonardi
- Department of Computer Science and Engineering, UC Riverside, CA, USA
| |
Collapse
|
36
|
Jung H, Winefield C, Bombarely A, Prentis P, Waterhouse P. Tools and Strategies for Long-Read Sequencing and De Novo Assembly of Plant Genomes. TRENDS IN PLANT SCIENCE 2019; 24:700-724. [PMID: 31208890 DOI: 10.1016/j.tplants.2019.05.003] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/06/2019] [Revised: 05/01/2019] [Accepted: 05/10/2019] [Indexed: 05/16/2023]
Abstract
The commercial release of third-generation sequencing technologies (TGSTs), giving long and ultra-long sequencing reads, has stimulated the development of new tools for assembling highly contiguous genome sequences with unprecedented accuracy across complex repeat regions. We survey here a wide range of emerging sequencing platforms and analytical tools for de novo assembly, provide background information for each of their steps, and discuss the spectrum of available options. Our decision tree recommends workflows for the generation of a high-quality genome assembly when used in combination with the specific needs and resources of a project.
Collapse
Affiliation(s)
- Hyungtaek Jung
- Centre for Tropical Crops and Biocommodities, Queensland University of Technology, Brisbane, QLD 4001, Australia.
| | - Christopher Winefield
- Department of Wine, Food, and Molecular Biosciences, Lincoln University, 7647 Christchurch, New Zealand
| | - Aureliano Bombarely
- Department of Bioscience, University of Milan, Milan 20133, Italy; School of Plants and Environmental Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - Peter Prentis
- School of Earth, Environmental, and Biological Sciences, Queensland University of Technology, Brisbane, QLD, 4001, Australia
| | - Peter Waterhouse
- Centre for Tropical Crops and Biocommodities, Queensland University of Technology, Brisbane, QLD 4001, Australia; School of Biological Sciences, University of Sydney, Sydney, NSW 2006, Australia.
| |
Collapse
|
37
|
Marijon P, Chikhi R, Varré JS. Graph analysis of fragmented long-read bacterial genome assemblies. Bioinformatics 2019; 35:4239-4246. [DOI: 10.1093/bioinformatics/btz219] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Revised: 02/19/2019] [Accepted: 03/26/2019] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
Long-read genome assembly tools are expected to reconstruct bacterial genomes nearly perfectly; however, they still produce fragmented assemblies in some cases. It would be beneficial to understand whether these cases are intrinsically impossible to resolve, or if assemblers are at fault, implying that genomes could be refined or even finished with little to no additional experimental cost.
Results
We propose a set of computational techniques to assist inspection of fragmented bacterial genome assemblies, through careful analysis of assembly graphs. By finding paths of overlapping raw reads between pairs of contigs, we recover potential short-range connections between contigs that were lost during the assembly process. We show that our procedure recovers 45% of missing contig adjacencies in fragmented Canu assemblies, on samples from the NCTC bacterial sequencing project. We also observe that a simple procedure based on enumerating weighted Hamiltonian cycles can suggest likely contig orderings. In our tests, the correct contig order is ranked first in half of the cases and within the top-three predictions in nearly all evaluated cases, providing a direction for finishing fragmented long-read assemblies.
Availability and implementation
https://gitlab.inria.fr/pmarijon/knot .
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pierre Marijon
- Inria, Université de Lille, CNRS, Centrale Lille, UMR 9189 – CRIStAL, Lille F-59000, France
| | - Rayan Chikhi
- Institut Pasteur, C3BI USR 3756 IP CNRS, Paris, France
| | - Jean-Stéphane Varré
- Université de Lille, CNRS, Centrale Lille, Inria, UMR 9189 – CRIStAL, Lille F-59000, France
| |
Collapse
|
38
|
Polanco C, Sáenz de Miera LE, González AI, García P, Fratini R, Vaquero F, Vences FJ, Pérez de la Vega M. Construction of a high-density interspecific (Lens culinaris x L. odemensis) genetic map based on functional markers for mapping morphological and agronomical traits, and QTLs affecting resistance to Ascochyta in lentil. PLoS One 2019; 14:e0214409. [PMID: 30917174 PMCID: PMC6436743 DOI: 10.1371/journal.pone.0214409] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Accepted: 03/12/2019] [Indexed: 01/13/2023] Open
Abstract
Usage of high-throughput sequencing approaches allow for the generation and characterization of reference transcriptome datasets that support gene-based marker discovery, which in turn can be used to build genetic maps among other purposes. We have obtained a transcriptome assembly including 49,453 genes for the lentil (Lens culinaris Medik.) cultivar Alpo using RNAseq methodology. This transcriptome was used as reference to obtain 6,306 quality polymorphic markers (SNPs and short indels) analyzing genotype data from a RIL population at F7 generation derived from the interspecific cross between L. culinaris cv. Alpo and L. odemensis accession ILWL235. L. odemensis is a wild species included in the secondary gene pool and can be used as a source for gene introgression in lentil breeding programs. Marker data were used to construct the first genetic interspecific map between these two species. This linkage map has been used to precisely identify regions of the CDC-Redberry lentil draft genome in which the candidate genes for some qualitative traits (seed coat spotting pattern, flower color, and stem pigmentation) could be located. The genome regions corresponding to a significant single quantitative trait locus (QTL) controlling "time to flowering" located in chromosome 6 and three QTLs regulating seed size and positioned in chromosomes 1 and 5 (two QTLs) were also identified. Significant QTLs for Ascochyta blight resistance in lentil were mapped to chromosome 6 in the genome region or close to it where QTLs for Ascochyta blight resistance have previously been reported.
Collapse
Affiliation(s)
- Carlos Polanco
- Área de Genética, Departamento de Biología Molecular, Universidad de León, León, Spain
- * E-mail:
| | | | - Ana Isabel González
- Área de Genética, Departamento de Biología Molecular, Universidad de León, León, Spain
| | - Pedro García
- Área de Genética, Departamento de Biología Molecular, Universidad de León, León, Spain
| | - Richard Fratini
- Área de Genética, Departamento de Biología Molecular, Universidad de León, León, Spain
| | - Francisca Vaquero
- Área de Genética, Departamento de Biología Molecular, Universidad de León, León, Spain
| | | | | |
Collapse
|
39
|
A continuous genome assembly of the corkwing wrasse (Symphodus melops). Genomics 2018; 110:399-403. [DOI: 10.1016/j.ygeno.2018.04.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Revised: 04/06/2018] [Accepted: 04/10/2018] [Indexed: 12/20/2022]
|
40
|
Baptista RP, Reis-Cunha JL, DeBarry JD, Chiari E, Kissinger JC, Bartholomeu DC, Macedo AM. Assembly of highly repetitive genomes using short reads: the genome of discrete typing unit III Trypanosoma cruzi strain 231. Microb Genom 2018; 4. [PMID: 29442617 PMCID: PMC5989580 DOI: 10.1099/mgen.0.000156] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Next-generation sequencing (NGS) methods are low-cost high-throughput technologies that produce thousands to millions of sequence reads. Despite the high number of raw sequence reads, their short length, relative to Sanger, PacBio or Nanopore reads, complicates the assembly of genomic repeats. Many genome tools are available, but the assembly of highly repetitive genome sequences using only NGS short reads remains challenging. Genome assembly of organisms responsible for important neglected diseases such as Trypanosoma cruzi, the aetiological agent of Chagas disease, is known to be challenging because of their repetitive nature. Only three of six recognized discrete typing units (DTUs) of the parasite have their draft genomes published and therefore genome evolution analyses in the taxon are limited. In this study, we developed a computational workflow to assemble highly repetitive genomes via a combination of de novo and reference-based assembly strategies to better overcome the intrinsic limitations of each, based on Illumina reads. The highly repetitive genome of the human-infecting parasite T. cruzi 231 strain was used as a test subject. The combined-assembly approach shown in this study benefits from the reference-based assembly ability to resolve highly repetitive sequences and from the de novo capacity to assemble genome-specific regions, improving the quality of the assembly. The acceptable confidence obtained by analyzing our results showed that our combined approach is an attractive option to assemble highly repetitive genomes with NGS short reads. Phylogenomic analysis including the 231 strain, the first representative of DTU III whose genome was sequenced, was also performed and provides new insights into T. cruzi genome evolution.
Collapse
Affiliation(s)
- Rodrigo P Baptista
- 1Center for Tropical and Emerging Global Diseases, University of Georgia, Athens, GA, USA
- 2Institute of Bioinformatics, University of Georgia, Athens, USA
| | - Joao Luis Reis-Cunha
- 3Departamento de Parasitologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Jeremy D DeBarry
- 1Center for Tropical and Emerging Global Diseases, University of Georgia, Athens, GA, USA
| | - Egler Chiari
- 3Departamento de Parasitologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Jessica C Kissinger
- 1Center for Tropical and Emerging Global Diseases, University of Georgia, Athens, GA, USA
- 2Institute of Bioinformatics, University of Georgia, Athens, USA
- 4Department of Genetics, University of Georgia, Athens, USA
| | - Daniella C Bartholomeu
- 3Departamento de Parasitologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Andrea M Macedo
- 5Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| |
Collapse
|