1
|
Do V, Nguyen S, Le D, Nguyen T, Nguyen C, Ho T, Vo N, Nguyen T, Nguyen H, Cao M. Pasa: leveraging population pangenome graph to scaffold prokaryote genome assemblies. Nucleic Acids Res 2024; 52:e15. [PMID: 38084888 PMCID: PMC10853769 DOI: 10.1093/nar/gkad1170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 11/07/2023] [Accepted: 11/22/2023] [Indexed: 02/10/2024] Open
Abstract
Whole genome sequencing has increasingly become the essential method for studying the genetic mechanisms of antimicrobial resistance and for surveillance of drug-resistant bacterial pathogens. The majority of bacterial genomes sequenced to date have been sequenced with Illumina sequencing technology, owing to its high-throughput, excellent sequence accuracy, and low cost. However, because of the short-read nature of the technology, these assemblies are fragmented into large numbers of contigs, hindering the obtaining of full information of the genome. We develop Pasa, a graph-based algorithm that utilizes the pangenome graph and the assembly graph information to improve scaffolding quality. By leveraging the population information of the bacteria species, Pasa is able to utilize the linkage information of the gene families of the species to resolve the contig graph of the assembly. We show that our method outperforms the current state of the arts in terms of accuracy, and at the same time, is computationally efficient to be applied to a large number of existing draft assemblies.
Collapse
Affiliation(s)
- Van Hoan Do
- Center for Applied Mathematics and Informatics, Le Quy Don Technical University, Hanoi, Vietnam
| | | | - Duc Quang Le
- Faculty of IT, Hanoi University of Civil Engineering, Hanoi, Vietnam
| | - Tam Thi Nguyen
- Oxford University Clinical Research Unit, Hanoi, Vietnam
| | - Canh Hao Nguyen
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Japan
| | - Tho Huu Ho
- Department of Medical Microbiology, The 103 Military Hospital, Vietnam Military Medical University, Hanoi, Vietnam
- Department of Genomics & Cytogenetics, Institute of Biomedicine & Pharmacy, Vietnam Military Medical University, Hanoi, Vietnam
| | - Nam S Vo
- Center for Biomedical Informatics, Vingroup Big Data Institute, Hanoi, Vietnam
| | | | | | | |
Collapse
|
2
|
Chen P, Lian JY, Wu B, Cao HL, Li ZH, Wang ZF. Draft genome of Castanopsis chinensis, a dominant species safeguarding biodiversity in subtropical broadleaved evergreen forests. BMC Genom Data 2023; 24:78. [PMID: 38097945 PMCID: PMC10722680 DOI: 10.1186/s12863-023-01183-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 12/08/2023] [Indexed: 12/17/2023] Open
Abstract
OBJECTIVES Castanopsis is the third largest genus in the Fagaceae family and is essentially tropical or subtropical in origin. The species in this genus are mainly canopy-dominant trees, and the key components of evergreen broadleaved forests play a crucial role in the maintenance of local biodiversity. Castanopsis chinensis, distributed from South China to Vietnam, is a representative species. It currently suffers from a high disturbance of human activity and climate change. Here, we present its assembled genome to facilitate its preliminary conservation and breeding on the genome level. DATA DESCRIPTION The C. chinensis genome was assembled and annotated by Nanopore and MGI whole-genome sequencing and RNA-seq reads using leaf tissues. The assembly was 888,699,661 bp in length, consisting of 133 contigs and a contig N50 of 23,395,510 bp. A completeness assessment of the assembly with Benchmarking Universal Single-Copy Orthologs (BUSCO) indicated a score of 98.3%. Repetitive elements comprised 471,006,885 bp, accounting for 55.9% of the assembled sequences. A total of 51,406 genes that coded for 54,310 proteins were predicted. Multiple databases were used to functionally annotate the protein sequences.
Collapse
Affiliation(s)
- Pan Chen
- Guangdong Forestry Survey and Planning Institute, Guangzhou, 510520, China
| | - Ju-Yu Lian
- Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China.
- Key Laboratory of Vegetation Restoration and Management of Degraded Ecosystems, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China.
- South China National Botanical Garden, Guangzhou, 510650, China.
| | - Bin Wu
- Guangdong Forestry Survey and Planning Institute, Guangzhou, 510520, China
| | - Hong-Lin Cao
- Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China
- Key Laboratory of Vegetation Restoration and Management of Degraded Ecosystems, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China
- South China National Botanical Garden, Guangzhou, 510650, China
| | - Zhi-Hong Li
- Guangdong Forestry Survey and Planning Institute, Guangzhou, 510520, China
| | - Zheng-Feng Wang
- Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China.
- Key Laboratory of Vegetation Restoration and Management of Degraded Ecosystems, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China.
- South China National Botanical Garden, Guangzhou, 510650, China.
| |
Collapse
|
3
|
Romanenko SA, Kliver SF, Serdyukova NA, Perelman PL, Trifonov VA, Seluanov A, Gorbunova V, Azpurua J, Pereira JC, Ferguson-Smith MA, Graphodatsky AS. Integration of fluorescence in situ hybridization and chromosome-length genome assemblies revealed synteny map for guinea pig, naked mole-rat, and human. Sci Rep 2023; 13:21055. [PMID: 38030702 PMCID: PMC10687270 DOI: 10.1038/s41598-023-46595-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 11/02/2023] [Indexed: 12/01/2023] Open
Abstract
Descriptions of karyotypes of many animal species are currently available. In addition, there has been a significant increase in the number of sequenced genomes and an ever-improving quality of genome assembly. To close the gap between genomic and cytogenetic data we applied fluorescent in situ hybridization (FISH) and Hi-C technology to make the first full chromosome-level genome comparison of the guinea pig (Cavia porcellus), naked mole-rat (Heterocephalus glaber), and human. Comparative chromosome maps obtained by FISH with chromosome-specific probes link genomic scaffolds to individual chromosomes and orient them relative to centromeres and heterochromatic blocks. Hi-C assembly made it possible to close all gaps on the comparative maps and to reveal additional rearrangements that distinguish the karyotypes of the three species. As a result, we integrated the bioinformatic and cytogenetic data and adjusted the previous comparative maps and genome assemblies of the guinea pig, naked mole-rat, and human. Syntenic associations in the two hystricomorphs indicate features of their putative ancestral karyotype. We postulate that the two approaches applied in this study complement one another and provide complete information about the organization of these genomes at the chromosome level.
Collapse
Affiliation(s)
- Svetlana A Romanenko
- Institute of Molecular and Cellular Biology, Russian Academy of Sciences, Siberian Branch, Novosibirsk, Russia.
| | - Sergei F Kliver
- Center for Evolutionary Hologenomics, The Globe Institute, The University of Copenhagen, Copenhagen, Denmark
| | - Natalia A Serdyukova
- Institute of Molecular and Cellular Biology, Russian Academy of Sciences, Siberian Branch, Novosibirsk, Russia
| | - Polina L Perelman
- Institute of Molecular and Cellular Biology, Russian Academy of Sciences, Siberian Branch, Novosibirsk, Russia
| | - Vladimir A Trifonov
- Institute of Molecular and Cellular Biology, Russian Academy of Sciences, Siberian Branch, Novosibirsk, Russia
- Novosibirsk State University, Novosibirsk, Russia
| | - Andrei Seluanov
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Vera Gorbunova
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Jorge Azpurua
- Department of Biochemistry and Molecular Medicine, The George Washington University, Washington, DC, USA
| | - Jorge C Pereira
- Animal and Veterinary Research Centre, University of Trás-os-Montes and Alto Douro, Vila Real, Portugal
- Cambridge Resource Centre for Comparative Genomics, Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| | - Malcolm A Ferguson-Smith
- Cambridge Resource Centre for Comparative Genomics, Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| | - Alexander S Graphodatsky
- Institute of Molecular and Cellular Biology, Russian Academy of Sciences, Siberian Branch, Novosibirsk, Russia
| |
Collapse
|
4
|
Bringloe TT, Parent GJ. Contrasting new and available reference genomes to highlight uncertainties in assemblies and areas for future improvement: an example with monodontid species. BMC Genomics 2023; 24:693. [PMID: 37985969 PMCID: PMC10659057 DOI: 10.1186/s12864-023-09779-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 10/31/2023] [Indexed: 11/22/2023] Open
Abstract
BACKGROUND Reference genomes provide a foundational framework for evolutionary investigations, ecological analysis, and conservation science, yet uncertainties in the assembly of reference genomes are difficult to assess, and by extension rarely quantified. Reference genomes for monodontid cetaceans span a wide spectrum of data types and analytical approaches, providing the context to derive broader insights related to discrepancies and regions of uncertainty in reference genome assembly. We generated three beluga (Delphinapterus leucas) and one narwhal (Monodon monoceros) reference genomes and contrasted these with published chromosomal scale assemblies for each species to quantify discrepancies associated with genome assemblies. RESULTS The new reference genomes achieved chromosomal scale assembly using a combination of PacBio long reads, Illumina short reads, and Hi-C scaffolding data. For beluga, we identified discrepancies in the order and orientation of contigs in 2.2-3.7% of the total genome depending on the pairwise comparison of references. In addition, unsupported higher order scaffolding was identified in published reference genomes. In contrast, we estimated 8.2% of the compared narwhal genomes featured discrepancies, with inversions being notably abundant (5.3%). Discrepancies were linked to repetitive elements in both species. CONCLUSIONS We provide several new reference genomes for beluga (Delphinapterus leucas), while highlighting potential avenues for improvements. In particular, additional layers of data providing information on ultra-long genomic distances are needed to resolve persistent errors in reference genome construction. The comparative analyses of monodontid reference genomes suggested that the three new reference genomes for beluga are more accurate compared to the currently published reference genome, but that the new narwhal genome is less accurate than one published. We also present a conceptual summary for improving the accuracy of reference genomes with relevance to end-user needs and how they relate to levels of assembly quality and uncertainty.
Collapse
Affiliation(s)
- Trevor T Bringloe
- Laboratory of Genomics, Maurice Lamontagne Institute, Fisheries and Oceans Canada, Mont-Joli, QC, Canada.
| | - Geneviève J Parent
- Laboratory of Genomics, Maurice Lamontagne Institute, Fisheries and Oceans Canada, Mont-Joli, QC, Canada.
| |
Collapse
|
5
|
Majidian S, Agustinho DP, Chin CS, Sedlazeck FJ, Mahmoud M. Genomic variant benchmark: if you cannot measure it, you cannot improve it. Genome Biol 2023; 24:221. [PMID: 37798733 PMCID: PMC10552390 DOI: 10.1186/s13059-023-03061-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 09/18/2023] [Indexed: 10/07/2023] Open
Abstract
Genomic benchmark datasets are essential to driving the field of genomics and bioinformatics. They provide a snapshot of the performances of sequencing technologies and analytical methods and highlight future challenges. However, they depend on sequencing technology, reference genome, and available benchmarking methods. Thus, creating a genomic benchmark dataset is laborious and highly challenging, often involving multiple sequencing technologies, different variant calling tools, and laborious manual curation. In this review, we discuss the available benchmark datasets and their utility. Additionally, we focus on the most recent benchmark of genes with medical relevance and challenging genomic complexity.
Collapse
Affiliation(s)
- Sina Majidian
- Department of Computational Biology, University of Lausanne, 1015, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | | | | | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, 77030, USA.
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, USA.
| | - Medhat Mahmoud
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, 77030, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
6
|
Wang J, Veldsman WP, Fang X, Huang Y, Xie X, Lyu A, Zhang L. Benchmarking multi-platform sequencing technologies for human genome assembly. Brief Bioinform 2023; 24:bbad300. [PMID: 37594299 DOI: 10.1093/bib/bbad300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 07/12/2023] [Accepted: 07/26/2023] [Indexed: 08/19/2023] Open
Abstract
Genome assembly is a computational technique that involves piecing together deoxyribonucleic acid (DNA) fragments generated by sequencing technologies to create a comprehensive and precise representation of the entire genome. Generating a high-quality human reference genome is a crucial prerequisite for comprehending human biology, and it is also vital for downstream genomic variation analysis. Many efforts have been made over the past few decades to create a complete and gapless reference genome for humans by using a diverse range of advanced sequencing technologies. Several available tools are aimed at enhancing the quality of haploid and diploid human genome assemblies, which include contig assembly, polishing of contig errors, scaffolding and variant phasing. Selecting the appropriate tools and technologies remains a daunting task despite several studies have investigated the pros and cons of different assembly strategies. The goal of this paper was to benchmark various strategies for human genome assembly by combining sequencing technologies and tools on two publicly available samples (NA12878 and NA24385) from Genome in a Bottle. We then compared their performances in terms of continuity, accuracy, completeness, variant calling and phasing. We observed that PacBio HiFi long-reads are the optimal choice for generating an assembly with low base errors. On the other hand, we were able to produce the most continuous contigs with Oxford Nanopore long-reads, but they may require further polishing to improve on quality. We recommend using short-reads rather than long-reads themselves to improve the base accuracy of contigs from Oxford Nanopore long-reads. Hi-C is the best choice for chromosome-level scaffolding because it can capture the longest-range DNA connectedness compared to 10× linked-reads and Bionano optical maps. However, a combination of multiple technologies can be used to further improve the quality and completeness of genome assembly. For diploid assembly, hifiasm is the best tool for human diploid genome assembly using PacBio HiFi and Hi-C data. Looking to the future, we expect that further advancements in human diploid assemblers will leverage the power of PacBio HiFi reads and other technologies with long-range DNA connectedness to enable the generation of high-quality, chromosome-level and haplotype-resolved human genome assemblies.
Collapse
Affiliation(s)
- Jingjing Wang
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | - Werner Pieter Veldsman
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | | | | | | | - Aiping Lyu
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
- Institute for Research and Continuing Education, Hong Kong Baptist University, Shenzhen, China
| |
Collapse
|
7
|
Poisson W, Bastien A, Gilbert I, Carrier A, Prunier J, Robert C. Cytogenetic screening of a Canadian swine breeding nucleus using a newly developed karyotyping method named oligo-banding. Genet Sel Evol 2023; 55:47. [PMID: 37430194 DOI: 10.1186/s12711-023-00819-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Accepted: 06/23/2023] [Indexed: 07/12/2023] Open
Abstract
BACKGROUND The frequency of chromosomal rearrangements in Canadian breeding boars has been estimated at 0.91 to 1.64%. These abnormalities are widely recognized as a potential cause of subfertility in livestock production. Since artificial insemination is practiced in almost all intensive pig production systems, the use of elite boars carrying cytogenetic defects that have an impact on fertility can lead to major economic losses. To avoid keeping subfertile boars in artificial insemination centres and spreading chromosomal defects within populations, cytogenetic screening of boars is crucial. Different techniques are used for this purpose, but several issues are frequently encountered, i.e. environmental factors can influence the quality of results, the lack of genomic information outputted by these techniques, and the need for prior cytogenetic skills. The aim of this study was to develop a new pig karyotyping method based on fluorescent banding patterns. RESULTS The use of 207,847 specific oligonucleotides generated 96 fluorescent bands that are distributed across the 18 autosomes and the sex chromosomes. Tested alongside conventional G-banding, this oligo-banding method allowed us to identify four chromosomal translocations and a rare unbalanced chromosomal rearrangement that was not detected by conventional banding. In addition, this method allowed us to investigate chromosomal imbalance in spermatozoa. CONCLUSIONS The use of oligo-banding was found to be appropriate for detecting chromosomal aberrations in a Canadian pig nucleus and its convenient design and use make it an interesting tool for livestock karyotyping and cytogenetic studies.
Collapse
Affiliation(s)
- William Poisson
- Département des sciences animales, Faculté des sciences de l'agriculture et de l'alimentation, Université Laval, Québec, QC, Canada
- Centre de recherche en reproduction, développement et santé intergénérationnelle, Québec, QC, Canada
| | - Alexandre Bastien
- Plateforme d'imagerie et microscopie, Institut de biologie intégrative et des systèmes, Université Laval, Québec, QC, Canada
| | - Isabelle Gilbert
- Département des sciences animales, Faculté des sciences de l'agriculture et de l'alimentation, Université Laval, Québec, QC, Canada
- Centre de recherche en reproduction, développement et santé intergénérationnelle, Québec, QC, Canada
| | - Alexandra Carrier
- Département des sciences animales, Faculté des sciences de l'agriculture et de l'alimentation, Université Laval, Québec, QC, Canada
- Centre de recherche en reproduction, développement et santé intergénérationnelle, Québec, QC, Canada
| | - Julien Prunier
- Département de médecine moléculaire, Faculté de médecine, Université Laval, Québec, QC, Canada
| | - Claude Robert
- Département des sciences animales, Faculté des sciences de l'agriculture et de l'alimentation, Université Laval, Québec, QC, Canada.
- Centre de recherche en reproduction, développement et santé intergénérationnelle, Québec, QC, Canada.
| |
Collapse
|
8
|
Luo J, Guan T, Chen G, Yu Z, Zhai H, Yan C, Luo H. SLHSD: hybrid scaffolding method based on short and long reads. Brief Bioinform 2023; 24:7152317. [PMID: 37141142 DOI: 10.1093/bib/bbad169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Revised: 01/08/2023] [Accepted: 04/12/2023] [Indexed: 05/05/2023] Open
Abstract
In genome assembly, scaffolding can obtain more complete and continuous scaffolds. Current scaffolding methods usually adopt one type of read to construct a scaffold graph and then orient and order contigs. However, scaffolding with the strengths of two or more types of reads seems to be a better solution to some tricky problems. Combining the advantages of different types of data is significant for scaffolding. Here, a hybrid scaffolding method (SLHSD) is present that simultaneously leverages the precision of short reads and the length advantage of long reads. Building an optimal scaffold graph is an important foundation for getting scaffolds. SLHSD uses a new algorithm that combines long and short read alignment information to determine whether to add an edge and how to calculate the edge weight in a scaffold graph. In addition, SLHSD develops a strategy to ensure that edges with high confidence can be added to the graph with priority. Then, a linear programming model is used to detect and remove remaining false edges in the graph. We compared SLHSD with other scaffolding methods on five datasets. Experimental results show that SLHSD outperforms other methods. The open-source code of SLHSD is available at https://github.com/luojunwei/SLHSD.
Collapse
Affiliation(s)
- Junwei Luo
- School of Software, Henan Polytechnic University, Jiaozuo 454003, China
| | - Ting Guan
- School of Software, Henan Polytechnic University, Jiaozuo 454003, China
| | - Guolin Chen
- School of Software, Henan Polytechnic University, Jiaozuo 454003, China
| | - Zhonghua Yu
- School of Software, Henan Polytechnic University, Jiaozuo 454003, China
| | - Haixia Zhai
- School of Software, Henan Polytechnic University, Jiaozuo 454003, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng 475001, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng 475001, China
| |
Collapse
|
9
|
Cai P, Liu S, Zhang D, Xing H, Han M, Liu D, Gong L, Hu QN. SynBioTools: a one-stop facility for searching and selecting synthetic biology tools. BMC Bioinformatics 2023; 24:152. [PMID: 37069545 PMCID: PMC10111727 DOI: 10.1186/s12859-023-05281-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 04/11/2023] [Indexed: 04/19/2023] Open
Abstract
BACKGROUND The rapid development of synthetic biology relies heavily on the use of databases and computational tools, which are also developing rapidly. While many tool registries have been created to facilitate tool retrieval, sharing, and reuse, no relatively comprehensive tool registry or catalog addresses all aspects of synthetic biology. RESULTS We constructed SynBioTools, a comprehensive collection of synthetic biology databases, computational tools, and experimental methods, as a one-stop facility for searching and selecting synthetic biology tools. SynBioTools includes databases, computational tools, and methods extracted from reviews via SCIentific Table Extraction, a scientific table-extraction tool that we built. Approximately 57% of the resources that we located and included in SynBioTools are not mentioned in bio.tools, the dominant tool registry. To improve users' understanding of the tools and to enable them to make better choices, the tools are grouped into nine modules (each with subdivisions) based on their potential biosynthetic applications. Detailed comparisons of similar tools in every classification are included. The URLs, descriptions, source references, and the number of citations of the tools are also integrated into the system. CONCLUSIONS SynBioTools is freely available at https://synbiotools.lifesynther.com/ . It provides end-users and developers with a useful resource of categorized synthetic biology databases, tools, and methods to facilitate tool retrieval and selection.
Collapse
Affiliation(s)
- Pengli Cai
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Sheng Liu
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Dachuan Zhang
- Ecological Systems Design, Institute of Environmental Engineering, ETH Zurich, 8093, Zurich, Switzerland
| | - Huadong Xing
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Mengying Han
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Dongliang Liu
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Linlin Gong
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Qian-Nan Hu
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
| |
Collapse
|
10
|
Poisson W, Prunier J, Carrier A, Gilbert I, Mastromonaco G, Albert V, Taillon J, Bourret V, Droit A, Côté SD, Robert C. Chromosome-level assembly of the Rangifer tarandus genome and validation of cervid and bovid evolution insights. BMC Genomics 2023; 24:142. [PMID: 36959567 PMCID: PMC10037892 DOI: 10.1186/s12864-023-09189-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 02/14/2023] [Indexed: 03/25/2023] Open
Abstract
BACKGROUND Genome assembly into chromosomes facilitates several analyses including cytogenetics, genomics and phylogenetics. Despite rapid development in bioinformatics, however, assembly beyond scaffolds remains challenging, especially in species without closely related well-assembled and available reference genomes. So far, four draft genomes of Rangifer tarandus (caribou or reindeer, a circumpolar distributed cervid species) have been published, but none with chromosome-level assembly. This emblematic northern species is of high interest in ecological studies and conservation since most populations are declining. RESULTS We have designed specific probes based on Oligopaint FISH technology to upgrade the latest published reindeer and caribou chromosome-level genomes. Using this oligonucleotide-based method, we found six mis-assembled scaffolds and physically mapped 68 of the largest scaffolds representing 78% of the most recent R. tarandus genome assembly. Combining physical mapping and comparative genomics, it was possible to document chromosomal evolution among Cervidae and closely related bovids. CONCLUSIONS Our results provide validation for the current chromosome-level genome assembly as well as resources to use chromosome banding in studies of Rangifer tarandus.
Collapse
Affiliation(s)
- William Poisson
- Département des sciences animales, Faculté des sciences de l'agriculture et de l'alimentation, Université Laval, Québec, QC, Canada
- Centre de Recherche en Reproduction, Développement et Santé Intergénérationnelle, Québec, QC, Canada
- Réseau Québécois en reproduction, QC, Saint-Hyacinthe, Canada
| | - Julien Prunier
- Département de biochimie, microbiologie et bio-informatique, Faculté des sciences et de génie, Université Laval, Québec, QC, Canada
| | - Alexandra Carrier
- Département des sciences animales, Faculté des sciences de l'agriculture et de l'alimentation, Université Laval, Québec, QC, Canada
- Centre de Recherche en Reproduction, Développement et Santé Intergénérationnelle, Québec, QC, Canada
- Réseau Québécois en reproduction, QC, Saint-Hyacinthe, Canada
| | - Isabelle Gilbert
- Département des sciences animales, Faculté des sciences de l'agriculture et de l'alimentation, Université Laval, Québec, QC, Canada
- Centre de Recherche en Reproduction, Développement et Santé Intergénérationnelle, Québec, QC, Canada
- Réseau Québécois en reproduction, QC, Saint-Hyacinthe, Canada
| | | | - Vicky Albert
- Ministère des Forêts, de la Faune et des Parcs du Québec (MFFP), Québec, QC, Canada
| | - Joëlle Taillon
- Ministère des Forêts, de la Faune et des Parcs du Québec (MFFP), Québec, QC, Canada
| | - Vincent Bourret
- Ministère des Forêts, de la Faune et des Parcs du Québec (MFFP), Québec, QC, Canada
| | - Arnaud Droit
- Département de médecine moléculaire, Faculté de médecine, Université Laval, Québec, QC, Canada
| | - Steeve D Côté
- Caribou Ungava, Département de biologie and Centre d'études nordiques, Faculté des sciences et de génie, Université Laval, Québec, QC, Canada
| | - Claude Robert
- Département des sciences animales, Faculté des sciences de l'agriculture et de l'alimentation, Université Laval, Québec, QC, Canada.
- Centre de Recherche en Reproduction, Développement et Santé Intergénérationnelle, Québec, QC, Canada.
- Réseau Québécois en reproduction, QC, Saint-Hyacinthe, Canada.
| |
Collapse
|
11
|
Andrade P, Lyra ML, Zina J, Bastos DFO, Brunetti AE, Baêta D, Afonso S, Brunes TO, Taucce PPG, Carneiro M, Haddad CFB, Sequeira F. Draft genome and multi-tissue transcriptome assemblies of the Neotropical leaf-frog Phyllomedusa bahiana. G3 (BETHESDA, MD.) 2022; 12:jkac270. [PMID: 36205610 PMCID: PMC9713437 DOI: 10.1093/g3journal/jkac270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 09/07/2022] [Indexed: 12/05/2022]
Abstract
Amphibians are increasingly threatened worldwide, but the availability of genomic resources that could be crucial for implementing informed conservation practices lags well behind that for other vertebrate groups. Here, we describe draft de novo genome, mitogenome, and transcriptome assemblies for the Neotropical leaf-frog Phyllomedusa bahiana native to the Brazilian Atlantic Forest and Caatinga. We used a combination of PacBio long reads and Illumina sequencing to produce a 4.74-Gbp contig-level genome assembly, which has a contiguity comparable to other recent nonchromosome level assemblies. The assembled mitogenome comprises 16,239 bp and the gene content and arrangement are similar to other Neobratrachia. RNA-sequencing from 8 tissues resulted in a highly complete (86.3%) reference transcriptome. We further use whole-genome resequencing data from P. bahiana and from its sister species Phyllomedusa burmeisteri, to demonstrate how our assembly can be used as a backbone for population genomics studies within the P. burmeisteri species group. Our assemblies thus represent important additions to the catalog of genomic resources available from amphibians.
Collapse
Affiliation(s)
- Pedro Andrade
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, Vairão 4485-661, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Campus de Vairão, Vairão 4485-661, Portugal
| | - Mariana L Lyra
- Departamento de Biodiversidade and Centro de Aquicultura, Instituto de Biociências, Universidade Estadual Paulista (UNESP), Rio Claro 13506-900, Brazil
| | - Juliana Zina
- Departamento de Ciências Biológicas, Universidade Estadual do Sudoeste da Bahia, Jequié 45206-190, Brazil
| | - Deivson F O Bastos
- Departamento de Ciências Biológicas, Universidade Estadual do Sudoeste da Bahia, Jequié 45206-190, Brazil
| | - Andrés E Brunetti
- Laboratory of Evolutionary Genetics, Institute of Subtropical Biology, National University of Misiones (UNaM-CONICET) Posadas N3300LQH, Misiones, Argentina
| | - Délio Baêta
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, Vairão 4485-661, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Campus de Vairão, Vairão 4485-661, Portugal
- Departamento de Biodiversidade and Centro de Aquicultura, Instituto de Biociências, Universidade Estadual Paulista (UNESP), Rio Claro 13506-900, Brazil
| | - Sandra Afonso
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, Vairão 4485-661, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Campus de Vairão, Vairão 4485-661, Portugal
| | - Tuliana O Brunes
- Departamento de Zoologia, Instituto de Biociências, Universidade de São Paulo, São Paulo 05508-090, Brazil
| | - Pedro P G Taucce
- Departamento de Biodiversidade and Centro de Aquicultura, Instituto de Biociências, Universidade Estadual Paulista (UNESP), Rio Claro 13506-900, Brazil
| | - Miguel Carneiro
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, Vairão 4485-661, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Campus de Vairão, Vairão 4485-661, Portugal
| | - Célio F B Haddad
- Departamento de Biodiversidade and Centro de Aquicultura, Instituto de Biociências, Universidade Estadual Paulista (UNESP), Rio Claro 13506-900, Brazil
| | - Fernando Sequeira
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, Vairão 4485-661, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Campus de Vairão, Vairão 4485-661, Portugal
| |
Collapse
|
12
|
Peel E, Silver L, Brandies P, Zhu Y, Cheng Y, Hogg CJ, Belov K. Best genome sequencing strategies for annotation of complex immune gene families in wildlife. Gigascience 2022; 11:6780307. [PMID: 36310247 PMCID: PMC9618407 DOI: 10.1093/gigascience/giac100] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 08/10/2022] [Accepted: 09/29/2022] [Indexed: 11/04/2022] Open
Abstract
BACKGROUND The biodiversity crisis and increasing impact of wildlife disease on animal and human health provides impetus for studying immune genes in wildlife. Despite the recent boom in genomes for wildlife species, immune genes are poorly annotated in nonmodel species owing to their high level of polymorphism and complex genomic organisation. Our research over the past decade and a half on Tasmanian devils and koalas highlights the importance of genomics and accurate immune annotations to investigate disease in wildlife. Given this, we have increasingly been asked the minimum levels of genome quality required to effectively annotate immune genes in order to study immunogenetic diversity. Here we set out to answer this question by manually annotating immune genes in 5 marsupial genomes and 1 monotreme genome to determine the impact of sequencing data type, assembly quality, and automated annotation on accurate immune annotation. RESULTS Genome quality is directly linked to our ability to annotate complex immune gene families, with long reads and scaffolding technologies required to reassemble immune gene clusters and elucidate evolution, organisation, and true gene content of the immune repertoire. Draft-quality genomes generated from short reads with HiC or 10× Chromium linked reads were unable to achieve this. Despite mammalian BUSCOv5 scores of up to 94.1% amongst the 6 genomes, automated annotation pipelines incorrectly annotated up to 59% of manually annotated immune genes regardless of assembly quality or method of automated annotation. CONCLUSIONS Our results demonstrate that long reads and scaffolding technologies, alongside manual annotation, are required to accurately study the immune gene repertoire of wildlife species.
Collapse
Affiliation(s)
- Emma Peel
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia,Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, University of Sydney, Sydney NSW 2006, Australia
| | - Luke Silver
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Parice Brandies
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Ying Zhu
- Sichuan Provincial Academy of Natural Resource Sciences, Chengdu, Sichuan 610000, China
| | - Yuanyuan Cheng
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Carolyn J Hogg
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia,Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, University of Sydney, Sydney NSW 2006, Australia
| | - Katherine Belov
- Correspondence address. Katherine Belov, School of Life and Environmental Sciences, Rm 206, RMC Gunn Building (B19), The University of Sydney, Sydney, NSW 2006, Australia. E-mail:
| |
Collapse
|
13
|
Competitive Exclusion Bacterial Culture Derived from the Gut Microbiome of Nile Tilapia ( Oreochromis niloticus) as a Resource to Efficiently Recover Probiotic Strains: Taxonomic, Genomic, and Functional Proof of Concept. Microorganisms 2022; 10:microorganisms10071376. [PMID: 35889095 PMCID: PMC9321352 DOI: 10.3390/microorganisms10071376] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Revised: 07/02/2022] [Accepted: 07/04/2022] [Indexed: 01/27/2023] Open
Abstract
This study aims to mine a previously developed continuous-flow competitive exclusion culture (CFCEC) originating from the Tilapia gut microbiome as a rational and efficient autochthonous probiotic strain recovery source. Three isolated strains were tested on their adaptability to host gastrointestinal conditions, their antibacterial activities against aquaculture bacterial pathogens, and their antibiotic susceptibility patterns. Their genomes were fully sequenced, assembled, annotated, and relevant functions inferred, such as those related to pinpointed probiotic activities and phylogenomic comparative analyses to the closer reported strains/species relatives. The strains are possible candidates of novel genus/species taxa inside Lactococcus spp. and Priestia spp. (previously known as Bacillus spp.) These results were consistent with reports on strains inside these phyla exhibiting probiotic features, and the strains we found are expanding their known diversity. Furthermore, their pangenomes showed that these bacteria have indeed a set of so far uncharacterized genes that may play a role in the antagonism to competing strains or specific symbiotic adaptations to the fish host. In conclusion, CFCEC proved to effectively allow the enrichment and further pure culture isolation of strains with probiotic potential.
Collapse
|
14
|
Liu SC, Ju YR, Lu CL. Multi-CSAR: a web server for scaffolding contigs using multiple reference genomes. Nucleic Acids Res 2022; 50:W500-W509. [PMID: 35524553 PMCID: PMC9252826 DOI: 10.1093/nar/gkac301] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 04/09/2022] [Accepted: 04/15/2022] [Indexed: 11/12/2022] Open
Abstract
Multi-CSAR is a web server that can efficiently and more accurately order and orient the contigs in the assembly of a target genome into larger scaffolds based on multiple reference genomes. Given a target genome and multiple reference genomes, Multi-CSAR first identifies sequence markers shared between the target genome and each reference genome, then utilizes these sequence markers to compute a scaffold for the target genome based on each single reference genome, and finally combines all the single reference-derived scaffolds into a multiple reference-derived scaffold. To run Multi-CSAR, the users need to upload a target genome to be scaffolded and one or more reference genomes in multi-FASTA format. The users can also choose to use the ‘weighting scheme of reference genomes’ for Multi-CSAR to automatically calculate different weights for the reference genomes and choose either ‘NUCmer on nucleotides’ or ‘PROmer on translated amino acids’ for Multi-CSAR to identify sequence markers. In the output page, Multi-CSAR displays its multiple reference-derived scaffold in two graphical representations (i.e. Circos plot and dotplot) for the users to visually validate the correctness of scaffolded contigs and in a tabular representation to further validate the scaffold in detail. Multi-CSAR is available online at http://genome.cs.nthu.edu.tw/Multi-CSAR/.
Collapse
Affiliation(s)
- Shu-Cheng Liu
- Department of Computer Science, National Tsing Hua University, Hsinchu 30013, Taiwan
| | - Yan-Ru Ju
- Department of Computer Science, National Tsing Hua University, Hsinchu 30013, Taiwan
| | - Chin Lung Lu
- Department of Computer Science, National Tsing Hua University, Hsinchu 30013, Taiwan
| |
Collapse
|
15
|
Computational biotechnology guides elucidation of the biosynthesis of the plant anticancer drug camptothecin. Comput Struct Biotechnol J 2021; 19:3659-3663. [PMID: 34257844 PMCID: PMC8254074 DOI: 10.1016/j.csbj.2021.06.028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 06/18/2021] [Accepted: 06/18/2021] [Indexed: 11/09/2022] Open
Abstract
Camptothecin is a clinically important monoterpene indole alkaloid (MIAs) used for treating various cancers. Currently, the production of this biopharmaceutical hinges on its extraction from camptothecin-producing plants, leading to high market prices and supply bottlenecks. While synthetic biology combined with metabolic approaches could represent an attractive alternative approach to manufacturing, it requires firstly a complete biosynthetic pathway elucidation, which is, unfortunately, severely missing in species naturally accumulating camptothecin. This knowledge gap can be attributed to the lack of high-quality genomic resources of medicinal plant species. In such a perspective, Yamazaki and colleagues produced the first described and experimentally validated chromosome-level plant genome assembly of Ophiorrhiza pumila, a prominent source plant of camptothecin for the pharmaceutical industry. More specifically, they have developed a method incorporating Illumina reads, PacBio single-molecule reads, optical mapping and Hi-C sequencing, followed by the experimental validation of contig orientation within scaffolds, using fluorescence in situ hybridization (FISH) analysis. This relevant strategy resulted in the most contiguous and complete de novo plant reference genome described to date, which can streamline the sequencing of new plant genomes. Further mining approaches, including integrative omics analysis, phylogenetics, gene cluster evaluation and comparative genomics were successfully used to puzzle out the evolutionary origins of MIA metabolism and revealed a short-list of high confidence MIA biosynthetic genes for functional validation.
Collapse
|