1
|
Yu HJ, Byun YH, Park CK. Techniques for assessing telomere length: A methodological review. Comput Struct Biotechnol J 2024; 23:1489-1498. [PMID: 38633384 PMCID: PMC11021795 DOI: 10.1016/j.csbj.2024.04.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 04/04/2024] [Accepted: 04/05/2024] [Indexed: 04/19/2024] Open
Abstract
Telomeres are located at the ends of chromosomes and have specific sequences with a distinctive structure that safeguards genes. They possess capping structures that protect chromosome ends from fusion events and ensure chromosome stability. Telomeres shorten in length during each cycle of cell division. When this length reaches a certain threshold, it can lead to genomic instability, thus being implicated in various diseases, including cancer and neurodegenerative disorders. The possibility of telomeres serving as a biomarker for aging and age-related disease is being explored, and their significance is still under study. This is because post-mitotic cells, which are mature cells that do not undergo mitosis, do not experience telomere shortening due to age. Instead, other causes, for example, exposure to oxidative stress, can directly damage the telomeres, causing genomic instability. Nonetheless, a general agreement has been established that measuring telomere length offers valuable insights and forms a crucial foundation for analyzing gene expression and epigenetic data. Numerous approaches have been developed to accurately measure telomere lengths. In this review, we summarize various methods and their advantages and limitations for assessing telomere length.
Collapse
Affiliation(s)
- Hyeon Jong Yu
- Department of Neurosurgery, Seoul National University College of Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Yoon Hwan Byun
- Department of Neurosurgery, SMG-SNU Boramae Medical Center, Seoul, Republic of Korea
| | - Chul-Kee Park
- Department of Neurosurgery, Seoul National University College of Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| |
Collapse
|
2
|
Dobner J, Nguyen T, Pavez-Giani MG, Cyganek L, Distelmaier F, Krutmann J, Prigione A, Rossi A. mtDNA analysis using Mitopore. Mol Ther Methods Clin Dev 2024; 32:101231. [PMID: 38572068 PMCID: PMC10988129 DOI: 10.1016/j.omtm.2024.101231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 03/08/2024] [Indexed: 04/05/2024]
Abstract
Mitochondrial DNA (mtDNA) analysis is crucial for the diagnosis of mitochondrial disorders, forensic investigations, and basic research. Existing pipelines are complex, expensive, and require specialized personnel. In many cases, including the diagnosis of detrimental single nucleotide variants (SNVs), mtDNA analysis is still carried out using Sanger sequencing. Here, we developed a simple workflow and a publicly available webserver named Mitopore that allows the detection of mtDNA SNVs, indels, and haplogroups. To simplify mtDNA analysis, we tailored our workflow to process noisy long-read sequencing data for mtDNA analysis, focusing on sequence alignment and parameter optimization. We implemented Mitopore with eliBQ (eliminate bad quality reads), an innovative quality enhancement that permits the increase of per-base quality of over 20% for low-quality data. The whole Mitopore workflow and webserver were validated using patient-derived and induced pluripotent stem cells harboring mtDNA mutations. Mitopore streamlines mtDNA analysis as an easy-to-use fast, reliable, and cost-effective analysis method for both long- and short-read sequencing data. This significantly enhances the accessibility of mtDNA analysis and reduces the cost per sample, contributing to the progress of mtDNA-related research and diagnosis.
Collapse
Affiliation(s)
- Jochen Dobner
- Institut für Umweltmedizinische Forschung (IUF)-Leibniz Research Institute for Environmental Medicine, 40225 Düsseldorf, Germany
| | - Thach Nguyen
- Institut für Umweltmedizinische Forschung (IUF)-Leibniz Research Institute for Environmental Medicine, 40225 Düsseldorf, Germany
| | - Mario Gustavo Pavez-Giani
- Clinic for Cardiology and Pneumology, University Medical Center Göttingen, 37075 Göttingen, Germany
- DZHK (German Center for Cardiovascular Research), Partner Site Göttingen, 37075 Göttingen, Germany
| | - Lukas Cyganek
- Clinic for Cardiology and Pneumology, University Medical Center Göttingen, 37075 Göttingen, Germany
- DZHK (German Center for Cardiovascular Research), Partner Site Göttingen, 37075 Göttingen, Germany
- Cluster of Excellence “Multiscale Bioimaging: from Molecular Machines to Networks of Excitable Cells” (MBExC), University of Göttingen, 37075 Göttingen, Germany
| | - Felix Distelmaier
- Department of General Pediatrics, Neonatology and Pediatric Cardiology, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Jean Krutmann
- Institut für Umweltmedizinische Forschung (IUF)-Leibniz Research Institute for Environmental Medicine, 40225 Düsseldorf, Germany
- Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Alessandro Prigione
- Department of General Pediatrics, Neonatology and Pediatric Cardiology, Medical Faculty, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Andrea Rossi
- Institut für Umweltmedizinische Forschung (IUF)-Leibniz Research Institute for Environmental Medicine, 40225 Düsseldorf, Germany
| |
Collapse
|
3
|
LeMaster C, Schwendinger-Schreck C, Ge B, Cheung W, McLennan R, Johnston J, Pastinen T, Smail C. Mapping structural variants to rare disease genes using long-read whole genome sequencing and trait-relevant polygenic scores. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.15.24304216. [PMID: 38562793 PMCID: PMC10984062 DOI: 10.1101/2024.03.15.24304216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Recent studies have revealed the pervasive landscape of rare structural variants (rSVs) present in human genomes. rSVs can have extreme effects on the expression of proximal genes and, in a rare disease context, have been implicated in patient cases where no diagnostic single nucleotide variant (SNV) was found. Approaches for integrating rSVs to date have focused on targeted approaches in known Mendelian rare disease genes. This approach is intractable for rare diseases with many causal loci or patients with complex, multi-phenotype syndromes. We hypothesized that integrating trait-relevant polygenic scores (PGS) would provide a substantial reduction in the number of candidate disease genes in which to assess rSV effects. We further implemented a method for ranking PGS genes to define a set of core/key genes where a rSV has the potential to exert relatively larger effects on disease risk. Among a subset of patients enrolled in the Genomic Answers for Kids (GA4K) rare disease program (N=497), we used PacBio HiFi long-read whole genome sequencing (lrWGS) to identify rSVs intersecting genes in trait-relevant PGSs. Illustrating our approach in Autism (N=54 cases), we identified 22,019 deletions, 2,041 duplications, 87,826 insertions, and 214 inversions overlapping putative core/key PGS genes. Additionally, by integrating genomic constraint annotations from gnomAD, we observed that rare duplications overlapping putative core/key PGS genes were frequently in higher constraint regions compared to controls (P = 1x10-03). This difference was not observed in the lowest-ranked gene set (P = 0.15). Overall, our study provides a framework for the annotation of long-read rSVs from lrWGS data and prioritization of disease-linked genomic regions for downstream functional validation of rSV impacts. To enable reuse by other researchers, we have made SV allele frequencies and gene associations freely available.
Collapse
|
4
|
Wang J, Xu Y, Peng Y, Wang Y, Kang Z, Zhao J. A fully haplotype-resolved and nearly gap-free genome assembly of wheat stripe rust fungus. Sci Data 2024; 11:508. [PMID: 38755209 PMCID: PMC11099153 DOI: 10.1038/s41597-024-03361-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 05/10/2024] [Indexed: 05/18/2024] Open
Abstract
Stripe rust fungus Puccinia striiformis f. sp. tritici (Pst) is a destructive pathogen of wheat worldwide. Pst has a macrocyclic-heteroecious lifecycle, in which one-celled urediniospores are dikaryotic, each nucleus containing one haploid genome. We successfully generated the first fully haplotype-resolved and nearly gap-free chromosome-scale genome assembly of Pst by combining PacBio HiFi sequencing and trio-binning strategy. The genome size of the two haploid assemblies was 75.59 Mb and 75.91 Mb with contig N50 of 4.17 Mb and 4.60 Mb, and both had 18 pseudochromosomes. The high consensus quality values of 55.57 and 59.02 for both haplotypes confirmed the correctness of the assembly. Of the total 18 chromosomes, 15 and 16 were gapless while there were only five and two gaps for the remaining chromosomes of the two haplotypes, respectively. In total, 15,046 and 15,050 protein-coding genes were predicted for the two haplotypes, and the complete BUSCO scores achieved 97.7% and 97.9%, respectively. The genome will lay the foundation for further research on genetic variations and the evolution of rust fungi.
Collapse
Affiliation(s)
- Jierong Wang
- College of Plant Protection, Northwest A&F University, Yangling, Shaanxi, 712100, China
- State Key Laboratory of Crop Stress Biology for Arid Areas, Northwest A&F University, Yangling, Shaanxi, 712100, China
- College of Life Science, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Yiwen Xu
- College of Plant Protection, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Yuxi Peng
- College of Plant Protection, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Yiping Wang
- College of Plant Protection, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Zhensheng Kang
- College of Plant Protection, Northwest A&F University, Yangling, Shaanxi, 712100, China.
- State Key Laboratory of Crop Stress Biology for Arid Areas, Northwest A&F University, Yangling, Shaanxi, 712100, China.
| | - Jing Zhao
- College of Plant Protection, Northwest A&F University, Yangling, Shaanxi, 712100, China.
- State Key Laboratory of Crop Stress Biology for Arid Areas, Northwest A&F University, Yangling, Shaanxi, 712100, China.
| |
Collapse
|
5
|
Tang T, Liu Y, Zheng B, Li R, Zhang X, Liu Y. Integration of hybrid and self-correction method improves the quality of long-read sequencing data. Brief Funct Genomics 2024; 23:249-255. [PMID: 37340778 DOI: 10.1093/bfgp/elad026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 06/04/2023] [Accepted: 06/05/2023] [Indexed: 06/22/2023] Open
Abstract
Third-generation sequencing (TGS) technologies have revolutionized genome science in the past decade. However, the long-read data produced by TGS platforms suffer from a much higher error rate than that of the previous technologies, thus complicating the downstream analysis. Several error correction tools for long-read data have been developed; these tools can be categorized into hybrid and self-correction tools. So far, these two types of tools are separately investigated, and their interplay remains understudied. Here, we integrate hybrid and self-correction methods for high-quality error correction. Our procedure leverages the inter-similarity between long-read data and high-accuracy information from short reads. We compare the performance of our method and state-of-the-art error correction tools on Escherichia coli and Arabidopsis thaliana datasets. The result shows that the integration approach outperformed the existing error correction methods and holds promise for improving the quality of downstream analyses in genomic research.
Collapse
Affiliation(s)
- Tao Tang
- School of Mordern Posts, Nanjing University of Posts and Telecommunications, 9 Wenyuan Rd, Qixia District, 210023, Jiangsu, China
| | - Yiping Liu
- College of Computer Science and Electronic Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Changsha, China
| | - Binshuang Zheng
- School of Mordern Posts, Nanjing University of Posts and Telecommunications, 9 Wenyuan Rd, Qixia District, 210023, Jiangsu, China
| | - Rong Li
- School of Mordern Posts, Nanjing University of Posts and Telecommunications, 9 Wenyuan Rd, Qixia District, 210023, Jiangsu, China
| | - Xiaocai Zhang
- Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), 138632, Singapore, Singapore
| | - Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Changsha, China
| |
Collapse
|
6
|
Wang YC, Mao Y, Fu HM, Wang J, Weng X, Liu ZH, Xu XW, Yan P, Fang F, Guo JS, Shen Y, Chen YP. New insights into functional divergence and adaptive evolution of uncultured bacteria in anammox community by complete genome-centric analysis. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 924:171530. [PMID: 38453092 DOI: 10.1016/j.scitotenv.2024.171530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 11/13/2023] [Accepted: 03/04/2024] [Indexed: 03/09/2024]
Abstract
Anaerobic ammonium-oxidation (anammox) bacteria play a crucial role in global nitrogen cycling and wastewater nitrogen removal, but they share symbiotic relationships with various other microorganisms. Functional divergence and adaptive evolution of uncultured bacteria in anammox community remain underexplored. Although shotgun metagenomics based on short reads has been widely used in anammox research, metagenome-assembled genomes (MAGs) are often discontinuous and highly contaminated, which limits in-depth analyses of anammox communities. Here, for the first time, we performed Pacific Biosciences high-fidelity (HiFi) long-read sequencing on the anammox granule sludge sample from a lab-scale bioreactor, and obtained 30 accurate and complete metagenome-assembled genomes (cMAGs). These cMAGs were obtained by selecting high-quality circular contigs from initial assemblies of long reads generated by HiFi sequencing, eliminating the need for Illumina short reads, binning, and reassembly. One new anammox species affiliated with Candidatus Jettenia and three species affiliated with novel families were found in this anammox community. cMAG-centric analysis revealed functional divergence in general and nitrogen metabolism among the anammox community members, and they might adopt a cross-feeding strategy in organic matter, cofactors, and vitamins. Furthermore, we identified 63 mobile genetic elements (MGEs) and 50 putative horizontal gene transfer (HGT) events within these cMAGs. The results suggest that HGT events and MGEs related to phage and integration or excision, particularly transposons containing tnpA in anammox bacteria, might play important roles in the adaptive evolution of this anammox community. The cMAGs generated in the present study could be used to establish of a comprehensive database for anammox bacteria and associated microorganisms. These findings highlight the advantages of HiFi sequencing for the studies of complex mixed cultures and advance the understanding of anammox communities.
Collapse
Affiliation(s)
- Yi-Cheng Wang
- Key Laboratory of the Three Gorges Reservoir Region's Eco-Environments of MOE, Chongqing University, Chongqing 400045, China
| | - Yanping Mao
- College of Chemistry and Environmental Engineering, Shenzhen University, Shenzhen 518071, Guangdong, China
| | - Hui-Min Fu
- Key Laboratory of the Three Gorges Reservoir Region's Eco-Environments of MOE, Chongqing University, Chongqing 400045, China; National Research Base of Intelligent Manufacturing Service, Chongqing Technology and Business University, Chongqing 400067, China
| | - Jin Wang
- Key Laboratory of the Three Gorges Reservoir Region's Eco-Environments of MOE, Chongqing University, Chongqing 400045, China
| | - Xun Weng
- Key Laboratory of the Three Gorges Reservoir Region's Eco-Environments of MOE, Chongqing University, Chongqing 400045, China
| | - Zi-Hao Liu
- Key Laboratory of the Three Gorges Reservoir Region's Eco-Environments of MOE, Chongqing University, Chongqing 400045, China
| | - Xiao-Wei Xu
- Key Laboratory of the Three Gorges Reservoir Region's Eco-Environments of MOE, Chongqing University, Chongqing 400045, China
| | - Peng Yan
- Key Laboratory of the Three Gorges Reservoir Region's Eco-Environments of MOE, Chongqing University, Chongqing 400045, China
| | - Fang Fang
- Key Laboratory of the Three Gorges Reservoir Region's Eco-Environments of MOE, Chongqing University, Chongqing 400045, China
| | - Jin-Song Guo
- Key Laboratory of the Three Gorges Reservoir Region's Eco-Environments of MOE, Chongqing University, Chongqing 400045, China
| | - Yu Shen
- National Research Base of Intelligent Manufacturing Service, Chongqing Technology and Business University, Chongqing 400067, China
| | - You-Peng Chen
- Key Laboratory of the Three Gorges Reservoir Region's Eco-Environments of MOE, Chongqing University, Chongqing 400045, China.
| |
Collapse
|
7
|
Pacheco MA, Cepeda AS, Miller EA, Beckerman S, Oswald M, London E, Mateus-Pinilla NE, Escalante AA. A new long-read mitochondrial-genome protocol (PacBio HiFi) for haemosporidian parasites: a tool for population and biodiversity studies. Malar J 2024; 23:134. [PMID: 38704592 PMCID: PMC11069185 DOI: 10.1186/s12936-024-04961-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 04/24/2024] [Indexed: 05/06/2024] Open
Abstract
BACKGROUND Studies on haemosporidian diversity, including origin of human malaria parasites, malaria's zoonotic dynamic, and regional biodiversity patterns, have used target gene approaches. However, current methods have a trade-off between scalability and data quality. Here, a long-read Next-Generation Sequencing protocol using PacBio HiFi is presented. The data processing is supported by a pipeline that uses machine-learning for analysing the reads. METHODS A set of primers was designed to target approximately 6 kb, almost the entire length of the haemosporidian mitochondrial genome. Amplicons from different samples were multiplexed in an SMRTbell® library preparation. A pipeline (HmtG-PacBio Pipeline) to process the reads is also provided; it integrates multiple sequence alignments, a machine-learning algorithm that uses modified variational autoencoders, and a clustering method to identify the mitochondrial haplotypes/species in a sample. Although 192 specimens could be studied simultaneously, a pilot experiment with 15 specimens is presented, including in silico experiments where multiple data combinations were tested. RESULTS The primers amplified various haemosporidian parasite genomes and yielded high-quality mt genome sequences. This new protocol allowed the detection and characterization of mixed infections and co-infections in the samples. The machine-learning approach converged into reproducible haplotypes with a low error rate, averaging 0.2% per read (minimum of 0.03% and maximum of 0.46%). The minimum recommended coverage per haplotype is 30X based on the detected error rates. The pipeline facilitates inspecting the data, including a local blast against a file of provided mitochondrial sequences that the researcher can customize. CONCLUSIONS This is not a diagnostic approach but a high-throughput method to study haemosporidian sequence assemblages and perform genotyping by targeting the mitochondrial genome. Accordingly, the methodology allowed for examining specimens with multiple infections and co-infections of different haemosporidian parasites. The pipeline enables data quality assessment and comparison of the haplotypes obtained to those from previous studies. Although a single locus approach, whole mitochondrial data provide high-quality information to characterize species pools of haemosporidian parasites.
Collapse
Affiliation(s)
- M Andreína Pacheco
- Biology Department/Institute of Genomics and Evolutionary Medicine (iGEM), Temple University, (SERC - 645), 1925 N. 12 St, Philadelphia, PA, 19122-1801, USA.
| | - Axl S Cepeda
- Biology Department/Institute of Genomics and Evolutionary Medicine (iGEM), Temple University, (SERC - 645), 1925 N. 12 St, Philadelphia, PA, 19122-1801, USA
| | - Erica A Miller
- University of Pennsylvania, Wildlife Futures Program, Kennett Square, Philadelphia, PA, 19348, USA
| | | | | | - Evan London
- Department of Animal Sciences, University of Illinois at Urbana-Champaign, Champaign, IL, 61801, USA
| | - Nohra E Mateus-Pinilla
- Department of Animal Sciences, University of Illinois at Urbana-Champaign, Champaign, IL, 61801, USA
- Illinois Natural History Survey-Prairie Research Institute, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA
- Department of Natural Resources and Environmental Sciences, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA
- Department of Pathobiology, College of Veterinary Medicine, University of Illinois at Urbana-Champaign, Champaign, IL, 61802, USA
| | - Ananias A Escalante
- Biology Department/Institute of Genomics and Evolutionary Medicine (iGEM), Temple University, (SERC - 645), 1925 N. 12 St, Philadelphia, PA, 19122-1801, USA.
| |
Collapse
|
8
|
Schulz T, Medvedev P. ESKEMAP: exact sketch-based read mapping. Algorithms Mol Biol 2024; 19:19. [PMID: 38704605 PMCID: PMC11069465 DOI: 10.1186/s13015-024-00261-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 03/19/2024] [Indexed: 05/06/2024] Open
Abstract
BACKGROUND Given a sequencing read, the broad goal of read mapping is to find the location(s) in the reference genome that have a "similar sequence". Traditionally, "similar sequence" was defined as having a high alignment score and read mappers were viewed as heuristic solutions to this well-defined problem. For sketch-based mappers, however, there has not been a problem formulation to capture what problem an exact sketch-based mapping algorithm should solve. Moreover, there is no sketch-based method that can find all possible mapping positions for a read above a certain score threshold. RESULTS In this paper, we formulate the problem of read mapping at the level of sequence sketches. We give an exact dynamic programming algorithm that finds all hits above a given similarity threshold. It runs in O ( | t | + | p | + ℓ 2 ) time and O ( ℓ log ℓ ) space, where |t| is the number of k -mers inside the sketch of the reference, |p| is the number of k -mers inside the read's sketch and ℓ is the number of times that k -mers from the pattern sketch occur in the sketch of the text. We evaluate our algorithm's performance in mapping long reads to the T2T assembly of human chromosome Y, where ampliconic regions make it desirable to find all good mapping positions. For an equivalent level of precision as minimap2, the recall of our algorithm is 0.88, compared to only 0.76 of minimap2.
Collapse
Affiliation(s)
- Tizian Schulz
- Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany.
- Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Bielefeld University, Bielefeld, Germany.
- Graduate School "Digital Infrastructure for the Life Sciences" (DILS), Bielefeld University, Bielefeld, Germany.
| | - Paul Medvedev
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, USA.
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, USA.
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, USA.
| |
Collapse
|
9
|
Renoz F, Parisot N, Baa-Puyoulet P, Gerlin L, Fakhour S, Charles H, Hance T, Calevro F. PacBio Hi-Fi genome assembly of Sipha maydis, a model for the study of multipartite mutualism in insects. Sci Data 2024; 11:450. [PMID: 38704391 PMCID: PMC11069519 DOI: 10.1038/s41597-024-03297-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Accepted: 04/23/2024] [Indexed: 05/06/2024] Open
Abstract
Dependence on multiple nutritional endosymbionts has evolved repeatedly in insects feeding on unbalanced diets. However, reference genomes for species hosting multi-symbiotic nutritional systems are lacking, even though they are essential for deciphering the processes governing cooperative life between insects and anatomically integrated symbionts. The cereal aphid Sipha maydis is a promising model for addressing these issues, as it has evolved a nutritional dependence on two bacterial endosymbionts that complement each other. In this study, we used PacBio High fidelity (HiFi) long-read sequencing to generate a highly contiguous genome assembly of S. maydis with a length of 410 Mb, 3,570 contigs with a contig N50 length of 187 kb, and BUSCO completeness of 95.5%. We identified 117 Mb of repetitive sequences, accounting for 29% of the genome assembly, and predicted 24,453 protein-coding genes, of which 2,541 were predicted enzymes included in an integrated metabolic network with the two aphid-associated endosymbionts. These resources provide valuable genetic and metabolic information for understanding the evolution and functioning of multi-symbiotic systems in insects.
Collapse
Affiliation(s)
- François Renoz
- Biodiversity Research Centre, Earth and Life Institute, UCLouvain, Louvain-la-Neuve, 1348, Belgium.
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR203, Villeurbanne, F-69621, France.
- Institute of Agrobiological Sciences, National Agriculture and Food Research Organization (NARO), Tsukuba, Ibaraki, 305-8634, Japan.
| | - Nicolas Parisot
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR203, Villeurbanne, F-69621, France.
| | | | - Léo Gerlin
- Univ Lyon, INRAE, INSA Lyon, BF2I, UMR203, Villeurbanne, F-69621, France
| | - Samir Fakhour
- Biodiversity Research Centre, Earth and Life Institute, UCLouvain, Louvain-la-Neuve, 1348, Belgium
- Department of Plant Protection, National Institute for Agricultural Research (INRA), Béni-Mellal, 23000, Morocco
| | - Hubert Charles
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR203, Villeurbanne, F-69621, France
| | - Thierry Hance
- Biodiversity Research Centre, Earth and Life Institute, UCLouvain, Louvain-la-Neuve, 1348, Belgium
| | - Federica Calevro
- Univ Lyon, INRAE, INSA Lyon, BF2I, UMR203, Villeurbanne, F-69621, France.
| |
Collapse
|
10
|
Bose E, Xiong S, Jones AN. Probing RNA structure and dynamics using nanopore and next generation sequencing. J Biol Chem 2024; 300:107317. [PMID: 38677514 DOI: 10.1016/j.jbc.2024.107317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 04/10/2024] [Accepted: 04/11/2024] [Indexed: 04/29/2024] Open
Abstract
It has become increasingly evident that the structures RNAs adopt are conformationally dynamic; the various structured states that RNAs sample govern their interactions with other nucleic acids, proteins, and ligands to regulate a myriad of biological processes. Although several biophysical approaches have been developed and used to study the dynamic landscape of structured RNAs, technical limitations have limited their application to all classes of RNA due to variable size and flexibility. Recent advances combining chemical probing experiments with next-generation- and direct sequencing have emerged as an alternative approach to exploring the conformational dynamics of RNA. In this review, we provide a methodological overview of the sequencing-based techniques used to study RNA conformational dynamics. We discuss how different techniques have enabled us to better understand the propensity of RNAs from a variety of different classes to sample multiple conformational states. Finally, we present examples of the ways these techniques have reshaped how we think about RNA structure.
Collapse
Affiliation(s)
- Emma Bose
- Department of Chemistry, New York University, New York, New York, USA
| | - Shengwei Xiong
- Department of Chemistry, New York University, New York, New York, USA
| | - Alisha N Jones
- Department of Chemistry, New York University, New York, New York, USA.
| |
Collapse
|
11
|
Yuan CU, Quah FX, Hemberg M. Single-cell and spatial transcriptomics: Bridging current technologies with long-read sequencing. Mol Aspects Med 2024; 96:101255. [PMID: 38368637 DOI: 10.1016/j.mam.2024.101255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 01/30/2024] [Accepted: 02/07/2024] [Indexed: 02/20/2024]
Abstract
Single-cell technologies have transformed biomedical research over the last decade, opening up new possibilities for understanding cellular heterogeneity, both at the genomic and transcriptomic level. In addition, more recent developments of spatial transcriptomics technologies have made it possible to profile cells in their tissue context. In parallel, there have been substantial advances in sequencing technologies, and the third generation of methods are able to produce reads that are tens of kilobases long, with error rates matching the second generation short reads. Long reads technologies make it possible to better map large genome rearrangements and quantify isoform specific abundances. This further improves our ability to characterize functionally relevant heterogeneity. Here, we show how researchers have begun to combine single-cell, spatial transcriptomics, and long-read technologies, and how this is resulting in powerful new approaches to profiling both the genome and the transcriptome. We discuss the achievements so far, and we highlight remaining challenges and opportunities.
Collapse
Affiliation(s)
- Chengwei Ulrika Yuan
- Department of Biochemistry, University of Cambridge, Cambridge, UK; Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Fu Xiang Quah
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Martin Hemberg
- Gene Lay Institute, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
12
|
Xie L, Gong X, Yang K, Huang Y, Zhang S, Shen L, Sun Y, Wu D, Ye C, Zhu QH, Fan L. Technology-enabled great leap in deciphering plant genomes. NATURE PLANTS 2024; 10:551-566. [PMID: 38509222 DOI: 10.1038/s41477-024-01655-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 02/20/2024] [Indexed: 03/22/2024]
Abstract
Plant genomes provide essential and vital basic resources for studying many aspects of plant biology and applications (for example, breeding). From 2000 to 2020, 1,144 genomes of 782 plant species were sequenced. In the past three years (2021-2023), 2,373 genomes of 1,031 plant species, including 793 newly sequenced species, have been assembled, representing a great leap. The 2,373 newly assembled genomes, of which 63 are telomere-to-telomere assemblies and 921 have been generated in pan-genome projects, cover the major phylogenetic clades. Substantial advances in read length, throughput, accuracy and cost-effectiveness have notably simplified the achievement of high-quality assemblies. Moreover, the development of multiple software tools using different algorithms offers the opportunity to generate more complete and complex assemblies. A database named N3: plants, genomes, technologies has been developed to accommodate the metadata associated with the 3,517 genomes that have been sequenced from 1,575 plant species since 2000. We also provide an outlook for emerging opportunities in plant genome sequencing.
Collapse
Affiliation(s)
- Lingjuan Xie
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China
- Hainan Institute of Zhejiang University, Yazhou Bay, Shanya, China
| | - Xiaojiao Gong
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Kun Yang
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Yujie Huang
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Shiyu Zhang
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Leti Shen
- Hainan Institute of Zhejiang University, Yazhou Bay, Shanya, China
| | - Yanqing Sun
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Dongya Wu
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Chuyu Ye
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Qian-Hao Zhu
- CSIRO Agriculture and Food, Black Mountain Laboratories, Canberra, Australia
| | - Longjiang Fan
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China.
- Hainan Institute of Zhejiang University, Yazhou Bay, Shanya, China.
| |
Collapse
|
13
|
Ermini L, Driguez P. The Application of Long-Read Sequencing to Cancer. Cancers (Basel) 2024; 16:1275. [PMID: 38610953 PMCID: PMC11011098 DOI: 10.3390/cancers16071275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 03/20/2024] [Accepted: 03/21/2024] [Indexed: 04/14/2024] Open
Abstract
Cancer is a multifaceted disease arising from numerous genomic aberrations that have been identified as a result of advancements in sequencing technologies. While next-generation sequencing (NGS), which uses short reads, has transformed cancer research and diagnostics, it is limited by read length. Third-generation sequencing (TGS), led by the Pacific Biosciences and Oxford Nanopore Technologies platforms, employs long-read sequences, which have marked a paradigm shift in cancer research. Cancer genomes often harbour complex events, and TGS, with its ability to span large genomic regions, has facilitated their characterisation, providing a better understanding of how complex rearrangements affect cancer initiation and progression. TGS has also characterised the entire transcriptome of various cancers, revealing cancer-associated isoforms that could serve as biomarkers or therapeutic targets. Furthermore, TGS has advanced cancer research by improving genome assemblies, detecting complex variants, and providing a more complete picture of transcriptomes and epigenomes. This review focuses on TGS and its growing role in cancer research. We investigate its advantages and limitations, providing a rigorous scientific analysis of its use in detecting previously hidden aberrations missed by NGS. This promising technology holds immense potential for both research and clinical applications, with far-reaching implications for cancer diagnosis and treatment.
Collapse
Affiliation(s)
- Luca Ermini
- NORLUX Neuro-Oncology Laboratory, Department of Cancer Research, Luxembourg Institute of Health, L-1210 Luxembourg, Luxembourg
| | - Patrick Driguez
- Bioscience Core Lab, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
14
|
Filipović I, Marshall JM, Rašić G. Finding divergent sequences of homomorphic sex chromosomes via diploidized nanopore-based assembly from a single male. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.29.582759. [PMID: 38464271 PMCID: PMC10925256 DOI: 10.1101/2024.02.29.582759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Although homomorphic sex chromosomes can have non-recombining regions with elevated sequence divergence between its complements, such divergence signals can be difficult to detect bioinformatically. If found in genomes of e.g. insect pests, these sequences could be targeted by the engineered genetic sexing and control systems. Here, we report an approach that can leverage long-read nanopore sequencing of a single XY male to identify divergent regions of homomorphic sex chromosomes. Long-read data are used for de novo genome assembly that is diploidized in a way that maximizes sex-specific differences between its haploid complements. We show that the correct assembly phasing is supported by the mapping of nanopore reads from the male's haploid Y-bearing sperm cells. The approach revealed a highly divergent region (HDR) near the centromere of the homomorphic sex chromosome of Aedes aegypti, the most important arboviral vector, for which there is a great interest in creating new genetic control tools. HDR is located ~5Mb downstream of the known male-determining locus on chromosome 1 and is significantly enriched for ovary-biased genes. While recombination in HDR ceased relatively recently (~1.4 MYA), HDR gametologs have divergent exons and introns of protein coding genes, and most lncRNA genes became X-specific. Megabases of previously invisible sex-linked sequences provide new putative targets for engineering the genetic systems to control this deadly mosquito. Broadly, our approach expands the toolbox for studying cryptic structure of sex chromosomes.
Collapse
Affiliation(s)
- Igor Filipović
- Mosquito Genomics, QIMR Berghofer Medical Research Institute, 300 Herston Road, Herston QLD 4006, Australia
- The University of Queensland, School of Biological Sciences, St Lucia, QLD, Australia
| | - John M Marshall
- Divisions of Biostatistics and Epidemiology, School of Public Health, University of California, Berkeley, CA, USA
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
| | - Gordana Rašić
- Mosquito Genomics, QIMR Berghofer Medical Research Institute, 300 Herston Road, Herston QLD 4006, Australia
| |
Collapse
|
15
|
Mo C, Wang H, Wei M, Zeng Q, Zhang X, Fei Z, Zhang Y, Kong Q. Complete genome assembly provides a high-quality skeleton for pan-NLRome construction in melon. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2024. [PMID: 38430487 DOI: 10.1111/tpj.16705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 02/16/2024] [Accepted: 02/22/2024] [Indexed: 03/03/2024]
Abstract
Melon (Cucumis melo L.), being under intensive domestication and selective breeding, displays an abundant phenotypic diversity. Wild germplasm with tolerance to stress represents an untapped genetic resource for discovery of disease-resistance genes. To comprehensively characterize resistance genes in melon, we generate a telomere-to-telomere (T2T) and gap-free genome of wild melon accession PI511890 (C. melo var. chito) with a total length of 375.0 Mb and a contig N50 of 31.24 Mb. The complete genome allows us to dissect genome architecture and identify resistance gene analogs. We construct a pan-NLRome using seven melon genomes, which include 208 variable and 18 core nucleotide-binding leucine-rich repeat receptors (NLRs). Multiple disease-related transcriptome analyses indicate that most up-regulated NLRs induced by pathogens are shell or cloud NLRs. The T2T gap-free assembly and the pan-NLRome not only serve as essential resources for genomic studies and molecular breeding of melon but also provide insights into the genome architecture and NLR diversity.
Collapse
Affiliation(s)
- Changjuan Mo
- National Key Laboratory for Germplasm Innovation and Utilization of Horticultural Crops, College of Horticulture and Forestry Sciences, Huazhong Agricultural University, Wuhan, 430070, China
| | - Haiyan Wang
- National Key Laboratory for Germplasm Innovation and Utilization of Horticultural Crops, College of Horticulture and Forestry Sciences, Huazhong Agricultural University, Wuhan, 430070, China
| | - Minghua Wei
- National Key Laboratory for Germplasm Innovation and Utilization of Horticultural Crops, College of Horticulture and Forestry Sciences, Huazhong Agricultural University, Wuhan, 430070, China
| | - Qingguo Zeng
- National Key Laboratory for Germplasm Innovation and Utilization of Horticultural Crops, College of Horticulture and Forestry Sciences, Huazhong Agricultural University, Wuhan, 430070, China
| | - Xuejun Zhang
- Hami-melon Research Center, Xinjiang Academy of Agricultural Sciences, Urumqi, 830091, China
| | | | - Yongbing Zhang
- Hami-melon Research Center, Xinjiang Academy of Agricultural Sciences, Urumqi, 830091, China
| | - Qiusheng Kong
- National Key Laboratory for Germplasm Innovation and Utilization of Horticultural Crops, College of Horticulture and Forestry Sciences, Huazhong Agricultural University, Wuhan, 430070, China
| |
Collapse
|
16
|
Carpinteyro-Ponce J, Machado CA. The Complex Landscape of Structural Divergence Between the Drosophila pseudoobscura and D. persimilis Genomes. Genome Biol Evol 2024; 16:evae047. [PMID: 38482945 PMCID: PMC10980976 DOI: 10.1093/gbe/evae047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/07/2024] [Indexed: 04/01/2024] Open
Abstract
Structural genomic variants are key drivers of phenotypic evolution. They can span hundreds to millions of base pairs and can thus affect large numbers of genetic elements. Although structural variation is quite common within and between species, its characterization depends upon the quality of genome assemblies and the proportion of repetitive elements. Using new high-quality genome assemblies, we report a complex and previously hidden landscape of structural divergence between the genomes of Drosophila persimilis and D. pseudoobscura, two classic species in speciation research, and study the relationships among structural variants, transposable elements, and gene expression divergence. The new assemblies confirm the already known fixed inversion differences between these species. Consistent with previous studies showing higher levels of nucleotide divergence between fixed inversions relative to collinear regions of the genome, we also find a significant overrepresentation of INDELs inside the inversions. We find that transposable elements accumulate in regions with low levels of recombination, and spatial correlation analyses reveal a strong association between transposable elements and structural variants. We also report a strong association between differentially expressed (DE) genes and structural variants and an overrepresentation of DE genes inside the fixed chromosomal inversions that separate this species pair. Interestingly, species-specific structural variants are overrepresented in DE genes involved in neural development, spermatogenesis, and oocyte-to-embryo transition. Overall, our results highlight the association of transposable elements with structural variants and their importance in driving evolutionary divergence.
Collapse
Affiliation(s)
| | - Carlos A Machado
- Department of Biology, University of Maryland, College Park, MD, USA
| |
Collapse
|
17
|
Garg D, Patel N, Rawat A, Rosado AS. Cutting edge tools in the field of soil microbiology. CURRENT RESEARCH IN MICROBIAL SCIENCES 2024; 6:100226. [PMID: 38425506 PMCID: PMC10904168 DOI: 10.1016/j.crmicr.2024.100226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2024] Open
Abstract
The study of the whole of the genetic material contained within the microbial populations found in a certain environment is made possible by metagenomics. This technique enables a thorough knowledge of the variety, function, and interactions of microbial communities that are notoriously difficult to research. Due to the limitations of conventional techniques such as culturing and PCR-based methodologies, soil microbiology is a particularly challenging field. Metagenomics has emerged as an effective technique for overcoming these obstacles and shedding light on the dynamic nature of the microbial communities in soil. This review focuses on the principle of metagenomics techniques, their potential applications and limitations in soil microbial diversity analysis. The effectiveness of target-based metagenomics in determining the function of individual genes and microorganisms in soil ecosystems is also highlighted. Targeted metagenomics, including high-throughput sequencing and stable-isotope probing, is essential for studying microbial taxa and genes in complex ecosystems. Shotgun metagenomics may reveal the diversity of soil bacteria, composition, and function impacted by land use and soil management. Sanger, Next Generation Sequencing, Illumina, and Ion Torrent sequencing revolutionise soil microbiome research. Oxford Nanopore Technology (ONT) and Pacific Biosciences (PacBio)'s third and fourth generation sequencing systems revolutionise long-read technology. GeoChip, clone libraries, metagenomics, and metabarcoding help comprehend soil microbial communities. The article indicates that metagenomics may improve environmental management and agriculture despite existing limitations.Metagenomics has revolutionised soil microbiology research by revealing the complete diversity, function, and interactions of microorganisms in soil. Metagenomics is anticipated to continue defining the future of soil microbiology research despite some limitations, such as the difficulty of locating the appropriate sequencing method for specific genes.
Collapse
Affiliation(s)
- Diksha Garg
- Department of Microbiology, Punjab Agricultural University, Ludhiana, Punjab, India
| | - Niketan Patel
- Red Sea Research Center, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Makkah, 23955, Saudi Arabia
- Computational Bioscience Research Center, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Makkah, 23955, Saudi Arabia
| | - Anamika Rawat
- Center of Desert Agriculture, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Makkah, 23955, Saudi Arabia
| | - Alexandre Soares Rosado
- Red Sea Research Center, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Makkah, 23955, Saudi Arabia
- Computational Bioscience Research Center, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Makkah, 23955, Saudi Arabia
| |
Collapse
|
18
|
Packiaraj J, Thakur J. DNA satellite and chromatin organization at mouse centromeres and pericentromeres. Genome Biol 2024; 25:52. [PMID: 38378611 PMCID: PMC10880262 DOI: 10.1186/s13059-024-03184-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 02/12/2024] [Indexed: 02/22/2024] Open
Abstract
BACKGROUND Centromeres are essential for faithful chromosome segregation during mitosis and meiosis. However, the organization of satellite DNA and chromatin at mouse centromeres and pericentromeres is poorly understood due to the challenges of assembling repetitive genomic regions. RESULTS Using recently available PacBio long-read sequencing data from the C57BL/6 strain, we find that contrary to the previous reports of their homogeneous nature, both centromeric minor satellites and pericentromeric major satellites exhibit a high degree of variation in sequence and organization within and between arrays. While most arrays are continuous, a significant fraction is interspersed with non-satellite sequences, including transposable elements. Using chromatin immunoprecipitation sequencing (ChIP-seq), we find that the occupancy of CENP-A and H3K9me3 chromatin at centromeric and pericentric regions, respectively, is associated with increased sequence enrichment and homogeneity at these regions. The transposable elements at centromeric regions are not part of functional centromeres as they lack significant CENP-A enrichment. Furthermore, both CENP-A and H3K9me3 nucleosomes occupy minor and major satellites spanning centromeric-pericentric junctions and a low yet significant amount of CENP-A spreads locally at centromere junctions on both pericentric and telocentric sides. Finally, while H3K9me3 nucleosomes display a well-phased organization on major satellite arrays, CENP-A nucleosomes on minor satellite arrays are poorly phased. Interestingly, the homogeneous class of major satellites also phase CENP-A and H3K27me3 nucleosomes, indicating that the nucleosome phasing is an inherent property of homogeneous major satellites. CONCLUSIONS Our findings reveal that mouse centromeres and pericentromeres display a high diversity in satellite sequence, organization, and chromatin structure.
Collapse
Affiliation(s)
- Jenika Packiaraj
- Department of Biology, Emory University, 1510 Clifton Rd, Atlanta, GA, 30322, USA
| | - Jitendra Thakur
- Department of Biology, Emory University, 1510 Clifton Rd, Atlanta, GA, 30322, USA.
| |
Collapse
|
19
|
Vancaester E, Blaxter ML. MarkerScan: Separation and assembly of cobionts sequenced alongside target species in biodiversity genomics projects. Wellcome Open Res 2024; 9:33. [PMID: 38617467 PMCID: PMC11016177 DOI: 10.12688/wellcomeopenres.20730.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/18/2023] [Indexed: 04/16/2024] Open
Abstract
Contamination of public databases by mislabelled sequences has been highlighted for many years and the avalanche of novel sequencing data now being deposited has the potential to make databases difficult to use effectively. It is therefore crucial that sequencing projects and database curators perform pre-submission checks to remove obvious contamination and avoid propagating erroneous taxonomic relationships. However, it is important also to recognise that biological contamination of a target sample with unexpected species' DNA can also lead to the discovery of fascinating biological phenomena through the identification of environmental organisms or endosymbionts. Here, we present a novel, integrated method for detection and generation of high-quality genomes of all non-target genomes co-sequenced in eukaryotic genome sequencing projects. After performing taxonomic profiling of an assembly from the raw data, and leveraging the identity of small rRNA sequences discovered therein as markers, a targeted classification approach retrieves and assembles high-quality genomes. The genomes of these cobionts are then not only removed from the target species' genome but also available for further interrogation. Source code is available from https://github.com/CobiontID/MarkerScan. MarkerScan is written in Python and is deployed as a Docker container.
Collapse
Affiliation(s)
| | - Mark L. Blaxter
- Tree of Life, Wellcome Sanger Institute, Hinxton, England, UK
| |
Collapse
|
20
|
Cook R, Brown N, Rihtman B, Michniewski S, Redgwell T, Clokie M, Stekel DJ, Chen Y, Scanlan DJ, Hobman JL, Nelson A, Jones MA, Smith D, Millard A. The long and short of it: benchmarking viromics using Illumina, Nanopore and PacBio sequencing technologies. Microb Genom 2024; 10:001198. [PMID: 38376377 PMCID: PMC10926689 DOI: 10.1099/mgen.0.001198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Accepted: 01/25/2024] [Indexed: 02/21/2024] Open
Abstract
Viral metagenomics has fuelled a rapid change in our understanding of global viral diversity and ecology. Long-read sequencing and hybrid assembly approaches that combine long- and short-read technologies are now being widely implemented in bacterial genomics and metagenomics. However, the use of long-read sequencing to investigate viral communities is still in its infancy. While Nanopore and PacBio technologies have been applied to viral metagenomics, it is not known to what extent different technologies will impact the reconstruction of the viral community. Thus, we constructed a mock bacteriophage community of previously sequenced phage genomes and sequenced them using Illumina, Nanopore and PacBio sequencing technologies and tested a number of different assembly approaches. When using a single sequencing technology, Illumina assemblies were the best at recovering phage genomes. Nanopore- and PacBio-only assemblies performed poorly in comparison to Illumina in both genome recovery and error rates, which both varied with the assembler used. The best Nanopore assembly had errors that manifested as SNPs and INDELs at frequencies 41 and 157 % higher than found in Illumina only assemblies, respectively. While the best PacBio assemblies had SNPs at frequencies 12 and 78 % higher than found in Illumina-only assemblies, respectively. Despite high-read coverage, long-read-only assemblies recovered a maximum of one complete genome from any assembly, unless reads were down-sampled prior to assembly. Overall the best approach was assembly by a combination of Illumina and Nanopore reads, which reduced error rates to levels comparable with short-read-only assemblies. When using a single technology, Illumina only was the best approach. The differences in genome recovery and error rates between technology and assembler had downstream impacts on gene prediction, viral prediction, and subsequent estimates of diversity within a sample. These findings will provide a starting point for others in the choice of reads and assembly algorithms for the analysis of viromes.
Collapse
Affiliation(s)
- Ryan Cook
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington Campus, College Road, Loughborough, Leicestershire, LE12 5RD, UK
| | - Nathan Brown
- Centre for Phage Research, Dept Genetics and Genome Biology, University of Leicester, University Road, Leicester, Leicestershire, LE1 7RH, UK
| | - Branko Rihtman
- School of Life Sciences, University of Warwick, Gibbet Hill Road, Coventry, CV4 7AL, UK
| | - Slawomir Michniewski
- Warwick Medical School, University of Warwick, Gibbet Hill Road, Coventry, CV4 7AL, UK
| | - Tamsin Redgwell
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Ledreborg Alle 34, 2820, Gentofte, Denmark
| | - Martha Clokie
- Centre for Phage Research, Dept Genetics and Genome Biology, University of Leicester, University Road, Leicester, Leicestershire, LE1 7RH, UK
| | - Dov J. Stekel
- School of Biosciences, University of Nottingham, Sutton Bonington Campus, College Road, Loughborough, Leicestershire, LE12 5RD, UK
- Department of Mathematics and Applied Mathematics, University of Johannesburg, Rossmore 2029, South Africa
| | - Yin Chen
- School of Life Sciences, University of Warwick, Gibbet Hill Road, Coventry, CV4 7AL, UK
| | - David J. Scanlan
- School of Life Sciences, University of Warwick, Gibbet Hill Road, Coventry, CV4 7AL, UK
| | - Jon L. Hobman
- School of Biosciences, University of Nottingham, Sutton Bonington Campus, College Road, Loughborough, Leicestershire, LE12 5RD, UK
| | - Andrew Nelson
- Faculty of Health and Life Sciences, University of Northumbria, Newcastle upon Tyne, NE1 8ST, UK
| | - Michael A. Jones
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington Campus, College Road, Loughborough, Leicestershire, LE12 5RD, UK
| | - Darren Smith
- Faculty of Health and Life Sciences, University of Northumbria, Newcastle upon Tyne, NE1 8ST, UK
| | - Andrew Millard
- Centre for Phage Research, Dept Genetics and Genome Biology, University of Leicester, University Road, Leicester, Leicestershire, LE1 7RH, UK
| |
Collapse
|
21
|
Kim C, Pongpanich M, Porntaveetus T. Unraveling metagenomics through long-read sequencing: a comprehensive review. J Transl Med 2024; 22:111. [PMID: 38282030 PMCID: PMC10823668 DOI: 10.1186/s12967-024-04917-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Accepted: 01/21/2024] [Indexed: 01/30/2024] Open
Abstract
The study of microbial communities has undergone significant advancements, starting from the initial use of 16S rRNA sequencing to the adoption of shotgun metagenomics. However, a new era has emerged with the advent of long-read sequencing (LRS), which offers substantial improvements over its predecessor, short-read sequencing (SRS). LRS produces reads that are several kilobases long, enabling researchers to obtain more complete and contiguous genomic information, characterize structural variations, and study epigenetic modifications. The current leaders in LRS technologies are Pacific Biotechnologies (PacBio) and Oxford Nanopore Technologies (ONT), each offering a distinct set of advantages. This review covers the workflow of long-read metagenomics sequencing, including sample preparation (sample collection, sample extraction, and library preparation), sequencing, processing (quality control, assembly, and binning), and analysis (taxonomic annotation and functional annotation). Each section provides a concise outline of the key concept of the methodology, presenting the original concept as well as how it is challenged or modified in the context of LRS. Additionally, the section introduces a range of tools that are compatible with LRS and can be utilized to execute the LRS process. This review aims to present the workflow of metagenomics, highlight the transformative impact of LRS, and provide researchers with a selection of tools suitable for this task.
Collapse
Affiliation(s)
- Chankyung Kim
- Center of Excellence in Genomics and Precision Dentistry, Department of Physiology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand
- Graduate Program in Bioinformatics and Computational Biology, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| | - Monnat Pongpanich
- Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
- Center of Excellence for Cancer and Inflammation, Chulalongkorn University, Bangkok, Thailand
| | - Thantrira Porntaveetus
- Center of Excellence in Genomics and Precision Dentistry, Department of Physiology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand.
- Graduate Program in Geriatric and Special Patients Care, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand.
| |
Collapse
|
22
|
Tournayre J, Polonais V, Wawrzyniak I, Akossi RF, Parisot N, Lerat E, Delbac F, Souvignet P, Reichstadt M, Peyretaillade E. MicroAnnot: A Dedicated Workflow for Accurate Microsporidian Genome Annotation. Int J Mol Sci 2024; 25:880. [PMID: 38255958 PMCID: PMC10815200 DOI: 10.3390/ijms25020880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 12/29/2023] [Accepted: 01/04/2024] [Indexed: 01/24/2024] Open
Abstract
With nearly 1700 species, Microsporidia represent a group of obligate intracellular eukaryotes with veterinary, economic and medical impacts. To help understand the biological functions of these microorganisms, complete genome sequencing is routinely used. Nevertheless, the proper prediction of their gene catalogue is challenging due to their taxon-specific evolutionary features. As innovative genome annotation strategies are needed to obtain a representative snapshot of the overall lifestyle of these parasites, the MicroAnnot tool, a dedicated workflow for microsporidian sequence annotation using data from curated databases of accurately annotated microsporidian genes, has been developed. Furthermore, specific modules have been implemented to perform small gene (<300 bp) and transposable element identification. Finally, functional annotation was performed using the signature-based InterProScan software. MicroAnnot's accuracy has been verified by the re-annotation of four microsporidian genomes for which structural annotation had previously been validated. With its comparative approach and transcriptional signal identification method, MicroAnnot provides an accurate prediction of translation initiation sites, an efficient identification of transposable elements, as well as high specificity and sensitivity for microsporidian genes, including those under 300 bp.
Collapse
Affiliation(s)
- Jérémy Tournayre
- INRAE, UMR Herbivores, Université Clermont Auvergne, VetAgro Sup, 63122 Saint-Genès-Champanelle, France; (J.T.); (P.S.); (M.R.)
| | - Valérie Polonais
- LMGE, CNRS, Université Clermont Auvergne, 63000 Clermont-Ferrand, France; (V.P.); (I.W.); (R.F.A.); (F.D.)
| | - Ivan Wawrzyniak
- LMGE, CNRS, Université Clermont Auvergne, 63000 Clermont-Ferrand, France; (V.P.); (I.W.); (R.F.A.); (F.D.)
| | - Reginald Florian Akossi
- LMGE, CNRS, Université Clermont Auvergne, 63000 Clermont-Ferrand, France; (V.P.); (I.W.); (R.F.A.); (F.D.)
| | - Nicolas Parisot
- UMR 203, BF2I, INRAE, INSA Lyon, Université de Lyon, 69621 Villeurbanne, France
| | - Emmanuelle Lerat
- VAS, CNRS, UMR5558, LBBE, Université Claude Bernard Lyon 1, 69622 Villeurbanne, France;
| | - Frédéric Delbac
- LMGE, CNRS, Université Clermont Auvergne, 63000 Clermont-Ferrand, France; (V.P.); (I.W.); (R.F.A.); (F.D.)
| | - Pierre Souvignet
- INRAE, UMR Herbivores, Université Clermont Auvergne, VetAgro Sup, 63122 Saint-Genès-Champanelle, France; (J.T.); (P.S.); (M.R.)
| | - Matthieu Reichstadt
- INRAE, UMR Herbivores, Université Clermont Auvergne, VetAgro Sup, 63122 Saint-Genès-Champanelle, France; (J.T.); (P.S.); (M.R.)
| | - Eric Peyretaillade
- LMGE, CNRS, Université Clermont Auvergne, 63000 Clermont-Ferrand, France; (V.P.); (I.W.); (R.F.A.); (F.D.)
| |
Collapse
|
23
|
Benoit G, Raguideau S, James R, Phillippy AM, Chikhi R, Quince C. High-quality metagenome assembly from long accurate reads with metaMDBG. Nat Biotechnol 2024:10.1038/s41587-023-01983-6. [PMID: 38168989 DOI: 10.1038/s41587-023-01983-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 09/08/2023] [Indexed: 01/05/2024]
Abstract
We introduce metaMDBG, a metagenomics assembler for PacBio HiFi reads. MetaMDBG combines a de Bruijn graph assembly in a minimizer space with an iterative assembly over sequences of minimizers to address variations in genome coverage depth and an abundance-based filtering strategy to simplify strain complexity. For complex communities, we obtained up to twice as many high-quality circularized prokaryotic metagenome-assembled genomes as existing methods and had better recovery of viruses and plasmids.
Collapse
Affiliation(s)
- Gaëtan Benoit
- Organisms and Ecosystems, Earlham Institute, Norwich, UK
| | | | - Robert James
- Gut Microbes and Health, Quadram Institute, Norwich, UK
| | - Adam M Phillippy
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD, USA
| | - Rayan Chikhi
- Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Paris, France
| | - Christopher Quince
- Organisms and Ecosystems, Earlham Institute, Norwich, UK.
- Gut Microbes and Health, Quadram Institute, Norwich, UK.
- School of Biological Sciences, University of East Anglia, Norwich, UK.
- Warwick Medical School, University of Warwick, Coventry, UK.
| |
Collapse
|
24
|
Salava H, Deák T, Czepe C, Maghuly F. Sample and Library Preparation for PacBio Long-Read Sequencing in Grapevine. Methods Mol Biol 2024; 2787:183-197. [PMID: 38656490 DOI: 10.1007/978-1-0716-3778-4_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
PacBio long-read sequencing is a third-generation technology that generates long reads up to 20 kilobases (kb), unlike short-read sequencing instruments that produce up to 600 bases. Long-read sequencing is particularly advantageous in higher organisms, such as humans and plants, where repetitive regions in the genome are more abundant. The PacBio long-read sequencing uses a single molecule, real-time approach where the SMRT cells contain several zero-mode waveguides (ZMWs). Each ZMW contains a single DNA molecule bound by a DNA polymerase. All ZMWs are flushed with deoxy nucleotides with a fluorophore specific to each nucleotide. As the sequencing proceeds, the detector detects the wavelength of the fluorescence and the nucleotides are read in real-time. This chapter describes the sample and library preparation for PacBio long-read sequencing for grapevine.
Collapse
Affiliation(s)
- Hymavathi Salava
- Plant Functional Genomics Lab, Institute of Molecular Biotechnology, Department of Biotechnology, University of Natural Resources and Life Sciences (BOKU), Vienna, Austria
| | - Tamás Deák
- Institute of Viticulture and Oenology, Hungarian University of Agriculture and Life Sciences (MATE), Budapest, Hungary
| | - Carmen Czepe
- Next Generation Sequencing Unit, Vienna Biocenter Core Facilities (VBCF), Vienna, Austria
| | - Fatemeh Maghuly
- Plant Functional Genomics Lab, Institute of Molecular Biotechnology, Department of Biotechnology, University of Natural Resources and Life Sciences (BOKU), Vienna, Austria
| |
Collapse
|
25
|
Feldmeyer B, Bornberg-Bauer E, Dohmen E, Fouks B, Heckenhauer J, Huylmans AK, Jones ARC, Stolle E, Harrison MC. Comparative Evolutionary Genomics in Insects. Methods Mol Biol 2024; 2802:473-514. [PMID: 38819569 DOI: 10.1007/978-1-0716-3838-5_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Genome sequencing quality, in terms of both read length and accuracy, is constantly improving. By combining long-read sequencing technologies with various scaffolding techniques, chromosome-level genome assemblies are now achievable at an affordable price for non-model organisms. Insects represent an exciting taxon for studying the genomic underpinnings of evolutionary innovations, due to ancient origins, immense species-richness, and broad phenotypic diversity. Here we summarize some of the most important methods for carrying out a comparative genomics study on insects. We describe available tools and offer concrete tips on all stages of such an endeavor from DNA extraction through genome sequencing, annotation, and several evolutionary analyses. Along the way we describe important insect-specific aspects, such as DNA extraction difficulties or gene families that are particularly difficult to annotate, and offer solutions. We describe results from several examples of comparative genomics analyses on insects to illustrate the fascinating questions that can now be addressed in this new age of genomics research.
Collapse
Affiliation(s)
- Barbara Feldmeyer
- Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Molecular Ecology, Frankfurt, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Elias Dohmen
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Bertrand Fouks
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Jacqueline Heckenhauer
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany
- Department of Terrestrial Zoology, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt, Germany
| | - Ann Kathrin Huylmans
- Institute of Organismic and Molecular Evolution, Johannes Gutenberg University, Mainz, Germany
| | - Alun R C Jones
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Eckart Stolle
- Museum Koenig, Leibniz Institute for the Analysis of Biodiversity Change (LIB), Bonn, Germany
| | - Mark C Harrison
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany.
| |
Collapse
|
26
|
Song H, Zou S, Huang Y, Jian C, Liu W, Tian L, Gong L, Chen Z, Sun Z, Wang Y. Salmonella Typhimurium with Eight Tandem Copies of blaNDM-1 on a HI2 Plasmid. Microorganisms 2023; 12:20. [PMID: 38257847 PMCID: PMC10819877 DOI: 10.3390/microorganisms12010020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Revised: 12/09/2023] [Accepted: 12/15/2023] [Indexed: 01/24/2024] Open
Abstract
Carbapenem-resistant Salmonella has recently aroused increasing attention. In this study, a total of four sequence type 36 Salmonella enterica subsp. enterica serovar Typhimurium (S. Typhimurium) isolates were consecutively isolated from an 11-month-old female patient with a gastrointestinal infection, of which one was sensitive to carbapenems and three were resistant to carbapenems. Via antibiotic susceptibility testing, a carbapenemases screening test, plasmid conjugation experiments, Illumina short-reads, and PacBio HiFi sequencing, we found that all four S. Typhimurium isolates contained a blaCTX-M-14-positive IncI1 plasmid. One carbapenem-sensitive S. Typhimurium isolate then obtained an IncHI2 plasmid carrying blaNDM-1 and an IncP plasmid without any resistance genes during the disease progression. The blaNDM-1 gene was located on a new 30 kb multiple drug resistance region, which is flanked by IS26 and TnAs2, respectively. In addition, the ST_F0903R isolate contained eight tandem copies of the ISCR1 unit (ISCR1-dsbD-trpF-ble-blaNDM-1-ISAba125Δ1), but an increase in MICs to carbapenems was not observed. Our work further provided evidence of the rapid spread and amplification of blaNDM-1 through plasmid. Prompting the recognition of carbapenem-resistant Enterobacterales and the initiation of appropriate infection control measures are essential to avoid the spread of these organisms.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Ziyong Sun
- Department of Laboratory Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China; (H.S.); (S.Z.); (Y.H.); (C.J.); (W.L.); (L.T.); (L.G.); (Z.C.); (Y.W.)
| | | |
Collapse
|
27
|
Landi M, Shah T, Falquet L, Niazi A, Stavolone L, Bongcam-Rudloff E, Gisel A. Haplotype-resolved genome of heterozygous African cassava cultivar TMEB117 (Manihot esculenta). Sci Data 2023; 10:887. [PMID: 38071206 PMCID: PMC10710486 DOI: 10.1038/s41597-023-02800-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 11/29/2023] [Indexed: 12/18/2023] Open
Abstract
Cassava (Manihot esculenta Crantz) is a vital tropical root crop providing essential dietary energy to over 800 million people in tropical and subtropical regions. As a climate-resilient crop, its significance grows as the human population expands. However, yield improvement faces challenges from biotic and abiotic stress and limited breeding. Advanced sequencing and assembly techniques enabled the generation of a highly accurate, nearly complete, haplotype-resolved genome of the African cassava cultivar TMEB117. It is the most accurate cassava genome sequence to date with a base-level accuracy of QV > 64, N50 > 35 Mbp, and 98.9% BUSCO completeness. Over 60% of the genome comprises repetitive elements. We predicted over 45,000 gene models for both haplotypes. This achievement offers valuable insights into the heterozygosity genome organization of the cassava genome, with improved accuracy, completeness, and phased genomes. Due to its high susceptibility to African Cassava Mosaic Virus (ACMV) infections compared to other cassava varieties, TMEB117 provides an ideal reference for studying virus resistance mechanisms, including epigenetic variations and smallRNA expressions.
Collapse
Affiliation(s)
- Michael Landi
- Department of Animal Breeding and Genetics, Bioinformatics, Swedish University of Agricultural Sciences, Uppsala, Sweden.
- International Institute of Tropical Agriculture, Nairobi, Kenya.
| | - Trushar Shah
- International Institute of Tropical Agriculture, Nairobi, Kenya
| | - Laurent Falquet
- Department of Biology, University of Fribourg, Fribourg, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Adnan Niazi
- Department of Animal Breeding and Genetics, Bioinformatics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Livia Stavolone
- International Institute of Tropical Agriculture, Ibadan, Nigeria
- Institute for Sustainable Plant Protection, Consiglio Nazionale delle Ricerche, Bari, Italy
| | - Erik Bongcam-Rudloff
- Department of Animal Breeding and Genetics, Bioinformatics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Andreas Gisel
- International Institute of Tropical Agriculture, Ibadan, Nigeria.
- Institute for Biomedical Technologies, Consiglio Nazionale delle Ricerche, Bari, Italy.
| |
Collapse
|
28
|
Yang Y, Wu Z, Wu Z, Li T, Shen Z, Zhou X, Wu X, Li G, Zhang Y. A near-complete assembly of asparagus bean provides insights into anthocyanin accumulation in pods. PLANT BIOTECHNOLOGY JOURNAL 2023; 21:2473-2489. [PMID: 37558431 PMCID: PMC10651155 DOI: 10.1111/pbi.14142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 07/11/2023] [Accepted: 07/23/2023] [Indexed: 08/11/2023]
Abstract
Asparagus bean (Vigna unguiculata ssp. sesquipedialis), a subspecies of V. unguiculata, is a vital legume crop widely cultivated in Asia for its tender pods consumed as vegetables. However, the existing asparagus bean assemblies still contain numerous gaps and unanchored sequences, which presents challenges to functional genomics research. Here, we present an improved reference genome sequence of an elite asparagus bean variety, Fengchan 6, achieved through the integration of nanopore ultra-long reads, PacBio high-fidelity reads, and Hi-C technology. The improved assembly is 521.3 Mb in length and demonstrates several enhancements, including a higher N50 length (46.4 Mb), an anchor ratio of 99.8%, and the presence of only one gap. Furthermore, we successfully assembled 14 telomeres and all 11 centromeres, including four telomere-to-telomere chromosomes. Remarkably, the centromeric regions cover a total length of 38.1 Mb, providing valuable insights into the complex architecture of centromeres. Among the 30 594 predicted protein-coding genes, we identified 2356 genes that are tandemly duplicated in segmental duplication regions. These findings have implications for defence responses and may contribute to evolutionary processes. By utilizing the reference genome, we were able to effectively identify the presence of the gene VuMYB114, which regulates the accumulation of anthocyanins, thereby controlling the purple coloration of the pods. This discovery holds significant implications for understanding the underlying mechanisms of color determination and the breeding process. Overall, the highly improved reference genome serves as crucial resource and lays a solid foundation for asparagus bean genomic studies and genetic improvement efforts.
Collapse
Affiliation(s)
- Yi Yang
- Vegetable Research InstituteGuangdong Academy of Agricultural SciencesGuangzhouChina
- Guangdong Key Laboratory for New Technology Research of VegetablesGuangzhouChina
| | - Zhikun Wu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic CenterSun Yat‐Sen UniversityGuangzhouChina
| | - Zengxiang Wu
- Vegetable Research InstituteGuangdong Academy of Agricultural SciencesGuangzhouChina
- Guangdong Key Laboratory for New Technology Research of VegetablesGuangzhouChina
| | - Tinyao Li
- Vegetable Research InstituteGuangdong Academy of Agricultural SciencesGuangzhouChina
- Guangdong Key Laboratory for New Technology Research of VegetablesGuangzhouChina
| | - Zhuo Shen
- Vegetable Research InstituteGuangdong Academy of Agricultural SciencesGuangzhouChina
- Guangdong Key Laboratory for New Technology Research of VegetablesGuangzhouChina
| | - Xuan Zhou
- Vegetable Research InstituteGuangdong Academy of Agricultural SciencesGuangzhouChina
- Guangdong Key Laboratory for New Technology Research of VegetablesGuangzhouChina
| | - Xinyi Wu
- Institute of VegetableZhejiang Academy of Agricultural SciencesHangzhouChina
| | - Guojing Li
- Institute of VegetableZhejiang Academy of Agricultural SciencesHangzhouChina
| | - Yan Zhang
- Vegetable Research InstituteGuangdong Academy of Agricultural SciencesGuangzhouChina
- Guangdong Key Laboratory for New Technology Research of VegetablesGuangzhouChina
| |
Collapse
|
29
|
Ferrer A, Stephens ZD, Kocher JPA. Experimental and Computational Approaches to Measure Telomere Length: Recent Advances and Future Directions. Curr Hematol Malig Rep 2023; 18:284-291. [PMID: 37947937 PMCID: PMC10709248 DOI: 10.1007/s11899-023-00717-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/30/2023] [Indexed: 11/12/2023]
Abstract
PURPOSE OF REVIEW The length of telomeres, protective structures at the chromosome ends, is a well-established biomarker for pathological conditions including multisystemic syndromes called telomere biology disorders. Approaches to measure telomere length (TL) differ on whether they estimate average, distribution, or chromosome-specific TL, and each presents their own advantages and limitations. RECENT FINDINGS The development of long-read sequencing and publication of the telomere-to-telomere human genome reference has allowed for scalable and high-resolution TL estimation in pre-existing sequencing datasets but is still impractical as a dedicated TL test. As sequencing costs continue to fall and strategies for selectively enriching telomere regions prior to sequencing improve, these approaches may become a promising alternative to classic methods. Measurement methods rely on probe hybridization, qPCR or more recently, computational methods using sequencing data. Refinements of existing techniques and new approaches have been recently developed but a test that is accurate, simple, and scalable is still lacking.
Collapse
Affiliation(s)
- Alejandro Ferrer
- Division of Hematology, Mayo Clinic, Rochester, 200 First Street SW, Rochester, MN, USA.
- Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA.
| | | | | |
Collapse
|
30
|
Rodriguez Ruiz A, Van Dam AR. Metagenomic binning of PacBio HiFi data prior to assembly reveals a complete genome of Cosmopolites sordidus (Germar) (Coleopterea: Curculionidae, Dryophthorinae) the most damaging arthropod pest of bananas and plantains. PeerJ 2023; 11:e16276. [PMID: 38025758 PMCID: PMC10676084 DOI: 10.7717/peerj.16276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 09/20/2023] [Indexed: 12/01/2023] Open
Abstract
PacBio HiFi sequencing was employed in combination with metagenomic binning to produce a high-quality reference genome of Cosmopolites sordidus. We compared k-mer and alignment reference based pre-binning and post-binning approaches to remove contamination. We were also interested to know if the post-binning approach had interspersed bacterial contamination within intragenic regions of Arthropoda binned contigs. Our analyses identified 3,433 genes that were composed with reads identified as of putative bacterial origins. The pre-binning approach yielded a C. sordidus genome of 1.07 Gb genome composed of 3,089 contigs with 98.6% and 97.1% complete and single copy genome and protein BUSCO scores respectively. In this article we demonstrate that in this case the pre-binning approach does not sacrifice assembly quality for more stringent metagenomic filtering. We also determine post-binning allows for increased intragenic contamination increased with increasing coverage, but the frequency of gene contamination increased with lower coverage. Future work should focus on developing reference free pre-binning approaches for HiFi reads produced from eukaryotic based metagenomic samples.
Collapse
Affiliation(s)
- Alfredo Rodriguez Ruiz
- Departamento de Biología, Universidad de Puerto Rico Recinto Universitario de Mayagüez, Mayagüez, Puerto Rico, United States of America
| | - Alex R. Van Dam
- Departamento de Biología, Universidad de Puerto Rico Recinto Universitario de Mayagüez, Mayagüez, Puerto Rico, United States of America
| |
Collapse
|
31
|
Zhang Y, Chu J, Cheng H, Li H. De novo reconstruction of satellite repeat units from sequence data. Genome Res 2023; 33:gr.278005.123. [PMID: 37918962 PMCID: PMC10760446 DOI: 10.1101/gr.278005.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 10/18/2023] [Indexed: 11/04/2023]
Abstract
Satellite DNA are long tandemly repeating sequences in a genome and may be organized as high-order repeats (HORs). They are enriched in centromeres and are challenging to assemble. Existing algorithms for identifying satellite repeats either require the complete assembly of satellites or only work for simple repeat structures without HORs. Here we describe Satellite Repeat Finder (SRF), a new algorithm for reconstructing satellite repeat units and HORs from accurate reads or assemblies without prior knowledge on repeat structures. Applying SRF to real sequence data, we show that SRF could reconstruct known satellites in human and well-studied model organisms. We also find satellite repeats are pervasive in various other species, accounting for up to 12% of their genome contents but are often underrepresented in assemblies. With the rapid progress in genome sequencing, SRF will help the annotation of new genomes and the study of satellite DNA evolution even if such repeats are not fully assembled.
Collapse
Affiliation(s)
- Yujie Zhang
- Harvard School of Public Health, Boston, Massachusetts 02115, USA
| | - Justin Chu
- Department of Data Science, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Haoyu Cheng
- Department of Data Science, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA;
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA
| |
Collapse
|
32
|
Mao Y, Zeineldin M, Usmani M, Jutla A, Shisler JL, Whitaker RJ, Nguyen TH. Local and Environmental Reservoirs of Salmonella enterica After Hurricane Florence Flooding. GEOHEALTH 2023; 7:e2023GH000877. [PMID: 37928215 PMCID: PMC10624599 DOI: 10.1029/2023gh000877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 08/28/2023] [Accepted: 10/13/2023] [Indexed: 11/07/2023]
Abstract
In many regions of the world, including the United States, human and animal fecal genetic markers have been found in flood waters. In this study, we use high-resolution whole genomic sequencing to examine the origin and distribution of Salmonella enterica after the 2018 Hurricane Florence flooding. We specifically asked whether S. enterica isolated from water samples collected near swine farms in North Carolina shortly after Hurricane Florence had evidence of swine origin. To investigate this, we isolated and fully sequenced 18 independent S. enterica strains from 10 locations (five flooded and five unflooded). We found that all strains have extremely similar chromosomes with only five single nucleotide polymorphisms (SNPs) and possessed two plasmids assigned bioinformatically to the incompatibility groups IncFIB and IncFII. The chromosomal core genome and the IncFIB plasmid are most closely related to environmental Salmonella strains isolated previously from the southeastern US. In contrast, the IncFII plasmid was found in environmental S. enterica strains whose genomes were more divergent, suggesting the IncFII plasmid is more promiscuous than the IncFIB type. We identified 65 antibiotic resistance genes (ARGs) in each of our 18 S. enterica isolates. All ARGs were located on the Salmonella chromosome, similar to other previously characterized environmental isolates. All isolates with different SNPs were resistant to a panel of commonly used antibiotics. These results highlight the importance of environmental sources of antibiotic-resistant S. enterica after extreme flood events.
Collapse
Affiliation(s)
- Yuqing Mao
- Department of Civil and Environmental EngineeringUniversity of Illinois at Urbana‐ChampaignILUrbanaUSA
- Carl R. Woese Institute for Genomic BiologyUniversity of Illinois at Urbana‐ChampaignILUrbanaUSA
| | - Mohamed Zeineldin
- Carl R. Woese Institute for Genomic BiologyUniversity of Illinois at Urbana‐ChampaignILUrbanaUSA
| | - Moiz Usmani
- Engineering School of Sustainable Infrastructure & EnvironmentUniversity of FloridaFLGainesvilleUSA
| | - Antarpreet Jutla
- Engineering School of Sustainable Infrastructure & EnvironmentUniversity of FloridaFLGainesvilleUSA
| | - Joanna L. Shisler
- Carl R. Woese Institute for Genomic BiologyUniversity of Illinois at Urbana‐ChampaignILUrbanaUSA
- Department of MicrobiologyUniversity of Illinois at Urbana‐ChampaignILUrbanaUSA
| | - Rachel J. Whitaker
- Carl R. Woese Institute for Genomic BiologyUniversity of Illinois at Urbana‐ChampaignILUrbanaUSA
- Department of MicrobiologyUniversity of Illinois at Urbana‐ChampaignILUrbanaUSA
| | - Thanh H. Nguyen
- Department of Civil and Environmental EngineeringUniversity of Illinois at Urbana‐ChampaignILUrbanaUSA
- Carl R. Woese Institute for Genomic BiologyUniversity of Illinois at Urbana‐ChampaignILUrbanaUSA
- Carle Illinois College of Medicine, University of Illinois at Urbana‐ChampaignUrbanaILUSA
| |
Collapse
|
33
|
Denoyes B, Prohaska A, Petit J, Rothan C. Deciphering the genetic architecture of fruit color in strawberry. JOURNAL OF EXPERIMENTAL BOTANY 2023; 74:6306-6320. [PMID: 37386925 PMCID: PMC10627153 DOI: 10.1093/jxb/erad245] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 06/28/2023] [Indexed: 07/01/2023]
Abstract
Fruits of Fragaria species usually have an appealing bright red color due to the accumulation of anthocyanins, water-soluble flavonoid pigments. Octoploid cultivated strawberry (Fragaria × ananassa) is a major horticultural crop for which fruit color and associated nutritional value are main breeding targets. Great diversity in fruit color intensity and pattern is observed not only in cultivated strawberry but also in wild relatives such as its octoploid progenitor F. chiloensis or the diploid woodland strawberry F. vesca, a model for fruit species in the Rosaceae. This review examines our understanding of fruit color formation in strawberry and how ongoing developments will advance it. Natural variations of fruit color as well as color changes during fruit development or in response to several cues have been used to explore the anthocyanin biosynthetic pathway and its regulation. So far, the successful identification of causal genetic variants has been largely driven by the availability of high-throughput genotyping tools and high-quality reference genomes of F. vesca and F. × ananassa. The current completion of haplotype-resolved genomes of F. × ananassa combined with QTL mapping will accelerate the exploitation of the untapped genetic diversity of fruit color and help translate the findings into strawberry improvement.
Collapse
Affiliation(s)
- Béatrice Denoyes
- INRAE and Univ. of Bordeaux, UMR 1332 Biologie du Fruit et Pathologie, F-33140 Villenave d’Ornon, France
| | - Alexandre Prohaska
- INRAE and Univ. of Bordeaux, UMR 1332 Biologie du Fruit et Pathologie, F-33140 Villenave d’Ornon, France
- INVENIO, MIN de Brienne, Bordeaux, France
| | - Johann Petit
- INRAE and Univ. of Bordeaux, UMR 1332 Biologie du Fruit et Pathologie, F-33140 Villenave d’Ornon, France
| | - Christophe Rothan
- INRAE and Univ. of Bordeaux, UMR 1332 Biologie du Fruit et Pathologie, F-33140 Villenave d’Ornon, France
| |
Collapse
|
34
|
Ding C, Zhang Z. Effective omics tools are still lacking for improvement of stress tolerance in polyploid crops. FRONTIERS IN PLANT SCIENCE 2023; 14:1295528. [PMID: 38023865 PMCID: PMC10646182 DOI: 10.3389/fpls.2023.1295528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Accepted: 10/18/2023] [Indexed: 12/01/2023]
Affiliation(s)
- Chao Ding
- Shanxi Center for Testing of Functional Agro-Products, Shanxi Agricultural University, Taiyuan, China
| | - Zhao Zhang
- Beijing Key Laboratory of Development and Quality Control of Ornamental Crops, Department of Ornamental Horticulture, China Agricultural University, Beijing, China
| |
Collapse
|
35
|
Li J, Cullis C. Comparative Analysis of Tylosema esculentum Mitochondrial DNA Revealed Two Distinct Genome Structures. BIOLOGY 2023; 12:1244. [PMID: 37759643 PMCID: PMC10525999 DOI: 10.3390/biology12091244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 09/11/2023] [Accepted: 09/13/2023] [Indexed: 09/29/2023]
Abstract
Tylosema esculentum, commonly known as the marama bean, is an underutilized legume with nutritious seeds, holding potential to enhance food security in southern Africa due to its resilience to prolonged drought and heat. To promote the selection of this agronomically valuable germplasm, this study assembled and compared the mitogenomes of 84 marama individuals, identifying variations in genome structure, single-nucleotide polymorphisms (SNPs), insertions/deletions (indels), heteroplasmy, and horizontal transfer. Two distinct germplasms were identified, and a novel mitogenome structure consisting of three circular molecules and one long linear chromosome was discovered. The structural variation led to an increased copy number of specific genes, nad5, nad9, rrnS, rrn5, trnC, and trnfM. The two mitogenomes also exhibited differences at 230 loci, with only one notable nonsynonymous substitution in the matR gene. Heteroplasmy was concentrated at certain loci on chromosome LS1 (OK638188). Moreover, the marama mitogenome contained an over 9 kb insertion of cpDNA, originating from chloroplast genomes, but had accumulated mutations and lost gene functionality. The evolutionary and comparative genomics analysis indicated that mitogenome divergence in marama might not be solely constrained by geographical factors. Additionally, marama, as a member from the Cercidoideae subfamily, tends to possess a more complete set of mitochondrial genes than Faboideae legumes.
Collapse
Affiliation(s)
| | - Christopher Cullis
- Department of Biology, Case Western Reserve University, Cleveland, OH 44106, USA;
| |
Collapse
|
36
|
Espinosa E, Bautista R, Fernandez I, Larrosa R, Zapata EL, Plata O. Comparing assembly strategies for third-generation sequencing technologies across different genomes. Genomics 2023; 115:110700. [PMID: 37598732 DOI: 10.1016/j.ygeno.2023.110700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 08/07/2023] [Accepted: 08/16/2023] [Indexed: 08/22/2023]
Abstract
The recent advent of long-read sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore technology (ONT), has led to substantial accuracy and computational cost improvements. However, de novo whole-genome assembly still presents significant challenges related to the computational cost and the quality of the results. Accordingly, sequencing accuracy and throughput continue to improve, and many tools are constantly emerging. Therefore, selecting the correct sequencing platform, the proper sequencing depth and the assembly tools are necessary to perform high-quality assembly. This paper evaluates the primary assembly reconstruction from recent hybrid and non-hybrid pipelines on different genomes. We find that using PacBio high-fidelity long-read (HiFi) plays an essential role in haplotype construction with respect to ONT reads. However, we observe a substantial improvement in the correctness of the assembly from high-fidelity ONT datasets and combining it with HiFi or short-reads.
Collapse
Affiliation(s)
- Elena Espinosa
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain.
| | - Rocio Bautista
- Supercomputing and Bioinnovation Center, University of Malaga, C. Severo Ochoa, 34, Malaga 29590, Spain.
| | - Ivan Fernandez
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain; Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, C. Jordi Girona, 1-3, Barcelona 08034, Spain.
| | - Rafael Larrosa
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain; Supercomputing and Bioinnovation Center, University of Malaga, C. Severo Ochoa, 34, Malaga 29590, Spain.
| | - Emilio L Zapata
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain; Supercomputing and Bioinnovation Center, University of Malaga, C. Severo Ochoa, 34, Malaga 29590, Spain.
| | - Oscar Plata
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain.
| |
Collapse
|
37
|
Gorman Z, Chen J, de Leon AAP, Wallis CM. Comparison of assembly platforms for the assembly of the nuclear genome of Trichoderma harzianum strain PAR3. BMC Genomics 2023; 24:454. [PMID: 37568116 PMCID: PMC10416523 DOI: 10.1186/s12864-023-09544-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 07/28/2023] [Indexed: 08/13/2023] Open
Abstract
BACKGROUND Trichoderma is a diverse genus of fungi that includes several species that possess biotechnological and agricultural applications, including the biocontrol of pathogenic fungi and nematodes. The mitochondrial genome of a putative strain of Trichoderma harzianum called PAR3 was analyzed after isolation from the roots of Scarlet Royal grapevine scion grafted to Freedom rootstock, located in a grapevine vineyard in Parlier, CA, USA. Here, we report the sequencing, comparative assembly, and annotation of the nuclear genome of PAR3 and confirm its identification as a strain of T. harzianum. We subsequently compared the genes found in T. harzianum PAR3 to other known T. harzianum strains. Assembly of Illumina and/or Oxford Nanopore reads by the popular long-read assemblers, Flye and Canu, and the hybrid assemblers, SPAdes and MaSuRCA, was performed and the quality of the resulting assemblies were compared to ascertain which assembler generated the highest quality draft genome assembly. RESULTS MaSuRCA produced the most complete and high-fidelity assembly yielding a nuclear genome of 40.7 Mb comprised of 112 scaffolds. Subsequent annotation of this assembly produced 12,074 gene models and 210 tRNAs. This included 221 genes that did not have equivalent genes in other T. harzainum strains. Phylogenetic analysis of ITS, rpb2, and tef1a sequences from PAR3 and established Trichoderma spp. showed that all three sequences from PAR3 possessed more than 99% identity to those of Trichoderma harzianum, confirming that PAR3 is an isolate of Trichoderma harzianum. We also found that comparison of gene models between T. harzianum PAR3 and other T. harzianum strains resulted in the identification of significant differences in gene type and number, with 221 unique genes identified in the PAR3 strain. CONCLUSIONS This study gives insight into the efficacy of several popular assembly platforms for assembly of fungal nuclear genomes, and found that the hybrid assembler, MaSuRCA, was the most effective program for genome assembly. The annotated draft nuclear genome and the identification of genes not found in other T. harzainum strains could be used to investigate the potential applications of T. harzianum PAR3 for biocontrol of grapevine fungal canker pathogens and as source of anti-microbial compounds.
Collapse
Affiliation(s)
- Zachary Gorman
- Crop Diseases, Pests and Genetics Research Unit, USDA-ARS San Joaquin Valley Agricultural Sciences Center, Parlier, CA, 93648, USA
| | - Jianchi Chen
- Crop Diseases, Pests and Genetics Research Unit, USDA-ARS San Joaquin Valley Agricultural Sciences Center, Parlier, CA, 93648, USA
| | - Adalberto A Perez de Leon
- Crop Diseases, Pests and Genetics Research Unit, USDA-ARS San Joaquin Valley Agricultural Sciences Center, Parlier, CA, 93648, USA
| | - Christopher Michael Wallis
- Crop Diseases, Pests and Genetics Research Unit, USDA-ARS San Joaquin Valley Agricultural Sciences Center, Parlier, CA, 93648, USA.
| |
Collapse
|
38
|
Zhang C, Johnson NA, Hall N, Tian X, Yu Q, Patterson EL. Subtelomeric 5-enolpyruvylshikimate-3-phosphate synthase copy number variation confers glyphosate resistance in Eleusine indica. Nat Commun 2023; 14:4865. [PMID: 37567866 PMCID: PMC10421919 DOI: 10.1038/s41467-023-40407-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 07/25/2023] [Indexed: 08/13/2023] Open
Abstract
Genomic structural variation (SV) has profound effects on organismal evolution; often serving as a source of novel genetic variation. Gene copy number variation (CNV), one type of SV, has repeatedly been associated with adaptive evolution in eukaryotes, especially with environmental stress. Resistance to the widely used herbicide, glyphosate, has evolved through target-site CNV in many weedy plant species, including the economically important grass, Eleusine indica (goosegrass); however, the origin and mechanism of these CNVs remain elusive in many weed species due to limited genetic and genomic resources. To study this CNV in goosegrass, we present high-quality reference genomes for glyphosate-susceptible and -resistant goosegrass lines and fine-assembles of the duplication of glyphosate's target site gene 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS). We reveal a unique rearrangement of EPSPS involving chromosome subtelomeres. This discovery adds to the limited knowledge of the importance of subtelomeres as genetic variation generators and provides another unique example for herbicide resistance evolution.
Collapse
Affiliation(s)
- Chun Zhang
- Guangdong Provincial Key Laboratory of High Technology for Plant Protection, Institute of Plant Protection, Guangdong Academy of Agricultural Sciences, Guangzhou, Guangdong, P.R. China
| | - Nicholas A Johnson
- Department of Plant, Soil, and Microbial Sciences, Michigan State University, East Lansing, MI, USA
| | - Nathan Hall
- Department of Plant, Soil, and Microbial Sciences, Michigan State University, East Lansing, MI, USA
| | - Xingshan Tian
- Guangdong Provincial Key Laboratory of High Technology for Plant Protection, Institute of Plant Protection, Guangdong Academy of Agricultural Sciences, Guangzhou, Guangdong, P.R. China.
| | - Qin Yu
- Australian Herbicide Resistance Initiative (AHRI), School of Agriculture and Environment, University of Western Australia (UWA), Perth, Australia.
| | - Eric L Patterson
- Department of Plant, Soil, and Microbial Sciences, Michigan State University, East Lansing, MI, USA.
| |
Collapse
|
39
|
Jin X, Du H, Zhu C, Wan H, Liu F, Ruan J, Mower JP, Zhu A. Haplotype-resolved genomes of wild octoploid progenitors illuminate genomic diversifications from wild relatives to cultivated strawberry. NATURE PLANTS 2023; 9:1252-1266. [PMID: 37537397 DOI: 10.1038/s41477-023-01473-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2022] [Accepted: 07/03/2023] [Indexed: 08/05/2023]
Abstract
Strawberry is an emerging model for studying polyploid genome evolution and rapid domestication of fruit crops. Here we report haplotype-resolved genomes of two wild octoploids (Fragaria chiloensis and Fragaria virginiana), the progenitor species of cultivated strawberry. Substantial variation is identified between species and between haplotypes. We redefine the four subgenomes and track the genetic contributions of diploid species by additional sequencing of the diploid F. nipponica genome. We provide multiple lines of evidence that F. vesca and F. iinumae, rather than other described extant species, are the closest living relatives of these wild and cultivated octoploids. In response to coexistence with quadruplicate gene copies, the octoploid strawberries have experienced subgenome dominance, homoeologous exchanges and coordinated expression of homoeologous genes. However, some homoeologues have substantially altered expression bias after speciation and during domestication. These findings enhance our understanding of the origin, genome evolution and domestication of strawberries.
Collapse
Affiliation(s)
- Xin Jin
- Germplasm Bank of Wild Species, Yunnan Key Laboratory of Crop Wild Relatives Omics, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Haiyuan Du
- Germplasm Bank of Wild Species, Yunnan Key Laboratory of Crop Wild Relatives Omics, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Chumeng Zhu
- Germplasm Bank of Wild Species, Yunnan Key Laboratory of Crop Wild Relatives Omics, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Hong Wan
- Horticultural Research Institute, Yunnan Academy of Agricultural Sciences, Kunming, China
| | - Fang Liu
- Germplasm Bank of Wild Species, Yunnan Key Laboratory of Crop Wild Relatives Omics, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
| | - Jiwei Ruan
- Flower Research Institute, Yunnan Academy of Agricultural Sciences, Kunming, China.
| | - Jeffrey P Mower
- Center for Plant Science Innovation, University of Nebraska, Lincoln, NE, USA.
- Department of Agronomy and Horticulture, University of Nebraska, Lincoln, NE, USA.
| | - Andan Zhu
- Germplasm Bank of Wild Species, Yunnan Key Laboratory of Crop Wild Relatives Omics, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China.
| |
Collapse
|
40
|
Huff M, Hulse-Kemp AM, Scheffler BE, Youngblood RC, Simpson SA, Babiker E, Staton M. Long-read, chromosome-scale assembly of Vitis rotundifolia cv. Carlos and its unique resistance to Xylella fastidiosa subsp. fastidiosa. BMC Genomics 2023; 24:409. [PMID: 37474911 PMCID: PMC10357881 DOI: 10.1186/s12864-023-09514-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 07/13/2023] [Indexed: 07/22/2023] Open
Abstract
BACKGROUND Muscadine grape (Vitis rotundifolia) is resistant to many of the pathogens that negatively impact the production of common grape (V. vinifera), including the bacterial pathogen Xylella fastidiosa subsp. fastidiosa (Xfsf), which causes Pierce's Disease (PD). Previous studies in common grape have indicated Xfsf delays host immune response with a complex O-chain antigen produced by the wzy gene. Muscadine cultivars range from tolerant to completely resistant to Xfsf, but the mechanism is unknown. RESULTS We assembled and annotated a new, long-read genome assembly for 'Carlos', a cultivar of muscadine that exhibits tolerance, to build upon the existing genetic resources available for muscadine. We used these resources to construct an initial pan-genome for three cultivars of muscadine and one cultivar of common grape. This pan-genome contains a total of 34,970 synteny-constrained entries containing genes of similar structure. Comparison of resistance gene content between the 'Carlos' and common grape genomes indicates an expansion of resistance (R) genes in 'Carlos.' We further identified genes involved in Xfsf response by transcriptome sequencing 'Carlos' plants inoculated with Xfsf. We observed 234 differentially expressed genes with functions related to lipid catabolism, oxidation-reduction signaling, and abscisic acid (ABA) signaling as well as seven R genes. Leveraging public data from previous experiments of common grape inoculated with Xfsf, we determined that most differentially expressed genes in the muscadine response were not found in common grape, and three of the R genes identified as differentially expressed in muscadine do not have an ortholog in the common grape genome. CONCLUSIONS Our results support the utility of a pan-genome approach to identify candidate genes for traits of interest, particularly disease resistance to Xfsf, within and between muscadine and common grape.
Collapse
Affiliation(s)
- Matthew Huff
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, 37996, USA
| | - Amanda M Hulse-Kemp
- Genomics and Bioinformatics Research Unit, USDA-ARS, Raleigh, NC, 27606, USA
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC, 27606, USA
| | - Brian E Scheffler
- Genomics and Bioinformatics Research Unit, USDA-ARS, Stoneville, MS, 38776, USA
| | - Ramey C Youngblood
- Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Starkville, MS, 39762, USA
| | - Sheron A Simpson
- Genomics and Bioinformatics Research Unit, USDA-ARS, Stoneville, MS, 38776, USA
| | - Ebrahiem Babiker
- USDA-ARS Thad Cochran Southern Horticultural Laboratory, Poplarville, MS, 39470, USA.
| | - Margaret Staton
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, 37996, USA.
| |
Collapse
|
41
|
Packiaraj J, Thakur J. DNA satellite and chromatin organization at house mouse centromeres and pericentromeres. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.18.549612. [PMID: 37503200 PMCID: PMC10370071 DOI: 10.1101/2023.07.18.549612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Centromeres are essential for faithful chromosome segregation during mitosis and meiosis. However, the organization of satellite DNA and chromatin at mouse centromeres and pericentromeres is poorly understood due to the challenges of sequencing and assembling repetitive genomic regions. Using recently available PacBio long-read sequencing data from the C57BL/6 strain and chromatin profiling, we found that contrary to the previous reports of their highly homogeneous nature, centromeric and pericentromeric satellites display varied sequences and organization. We find that both centromeric minor satellites and pericentromeric major satellites exhibited sequence variations within and between arrays. While most arrays are continuous, a significant fraction is interspersed with non-satellite sequences, including transposable elements. Additionally, we investigated CENP-A and H3K9me3 chromatin organization at centromeres and pericentromeres using Chromatin immunoprecipitation sequencing (ChIP-seq). We found that the occupancy of CENP-A and H3K9me3 chromatin at centromeric and pericentric regions, respectively, is associated with increased sequence abundance and homogeneity at these regions. Furthermore, the transposable elements at centromeric regions are not part of functional centromeres as they lack CENP-A enrichment. Finally, we found that while H3K9me3 nucleosomes display a well-phased organization on major satellite arrays, CENP-A nucleosomes on minor satellite arrays lack phased organization. Interestingly, the homogeneous class of major satellites phase CENP-A and H3K27me3 nucleosomes as well, indicating that the nucleosome phasing is an inherent property of homogeneous major satellites. Overall, our findings reveal that house mouse centromeres and pericentromeres, which were previously thought to be highly homogenous, display significant diversity in satellite sequence, organization, and chromatin structure.
Collapse
Affiliation(s)
- Jenika Packiaraj
- Department of Biology, Emory University, 1510 Clifton Rd, Atlanta, GA 30322
| | - Jitendra Thakur
- Department of Biology, Emory University, 1510 Clifton Rd, Atlanta, GA 30322
| |
Collapse
|
42
|
Gogoi A, Rossmann SL, Lysøe E, Stensvand A, Brurberg MB. Genome analysis of Phytophthora cactorum strains associated with crown- and leather-rot in strawberry. Front Microbiol 2023; 14:1214924. [PMID: 37465018 PMCID: PMC10351607 DOI: 10.3389/fmicb.2023.1214924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Accepted: 06/12/2023] [Indexed: 07/20/2023] Open
Abstract
Phytophthora cactorum has two distinct pathotypes that cause crown rot and leather rot in strawberry (Fragaria × ananassa). Strains of the crown rot pathotype can infect both the rhizome (crown) and fruit tissues, while strains of the leather rot pathotype can only infect the fruits of strawberry. The genome of a highly virulent crown rot strain, a low virulent crown rot strain, and three leather rot strains were sequenced using PacBio high fidelity (HiFi) long read sequencing. The reads were de novo assembled to 66.4-67.6 megabases genomes in 178-204 contigs, with N50 values ranging from 892 to 1,036 kilobases. The total number of predicted complete genes in the five P. cactorum genomes ranged from 17,286 to 17,398. Orthology analysis identified a core secretome of 8,238 genes. Comparative genomic analysis revealed differences in the composition of potential virulence effectors, such as putative RxLR and Crinklers, between the crown rot and the leather rot pathotypes. Insertions, deletions, and amino acid substitutions were detected in genes encoding putative elicitors such as beta elicitin and cellulose-binding domain proteins from the leather rot strains compared to the highly virulent crown rot strain, suggesting a potential mechanism for the crown rot strain to escape host recognition during compatible interaction with strawberry. The results presented here highlight several effectors that may facilitate the tissue-specific colonization of P. cactorum in strawberry.
Collapse
Affiliation(s)
- Anupam Gogoi
- Department of Plant Sciences, Faculty of Biosciences (BIOVIT), Norwegian University of Life Sciences (NMBU), Ås, Norway
- Division of Biotechnology and Plant Health, Norwegian Institute of Bioeconomy Research (NIBIO), Ås, Norway
| | - Simeon L. Rossmann
- Division of Biotechnology and Plant Health, Norwegian Institute of Bioeconomy Research (NIBIO), Ås, Norway
| | - Erik Lysøe
- Division of Biotechnology and Plant Health, Norwegian Institute of Bioeconomy Research (NIBIO), Ås, Norway
| | - Arne Stensvand
- Department of Plant Sciences, Faculty of Biosciences (BIOVIT), Norwegian University of Life Sciences (NMBU), Ås, Norway
- Division of Biotechnology and Plant Health, Norwegian Institute of Bioeconomy Research (NIBIO), Ås, Norway
| | - May Bente Brurberg
- Department of Plant Sciences, Faculty of Biosciences (BIOVIT), Norwegian University of Life Sciences (NMBU), Ås, Norway
- Division of Biotechnology and Plant Health, Norwegian Institute of Bioeconomy Research (NIBIO), Ås, Norway
| |
Collapse
|
43
|
Schmeing S, Robinson MD. Gapless provides combined scaffolding, gap filling, and assembly correction with long reads. Life Sci Alliance 2023; 6:e202201471. [PMID: 37142439 PMCID: PMC10166144 DOI: 10.26508/lsa.202201471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 04/17/2023] [Accepted: 04/18/2023] [Indexed: 05/06/2023] Open
Abstract
Continuity, correctness, and completeness of genome assemblies are important for many biological projects. Long reads represent a major driver towards delivering high-quality genomes, but not everybody can achieve the necessary coverage for good long read-only assemblies. Therefore, improving existing assemblies with low-coverage long reads is a promising alternative. The improvements include correction, scaffolding, and gap filling. However, most tools perform only one of these tasks and the useful information of reads that supported the scaffolding is lost when running separate programs successively. Therefore, we propose a new tool for combined execution of all three tasks using PacBio or Oxford Nanopore reads. gapless is available at: https://github.com/schmeing/gapless.
Collapse
Affiliation(s)
- Stephan Schmeing
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Mark D Robinson
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| |
Collapse
|
44
|
Shinde SS, Sharma A, Vijay N. Decoding the fibromelanosis locus complex chromosomal rearrangement of black-bone chicken: genetic differentiation, selective sweeps and protein-coding changes in Kadaknath chicken. Front Genet 2023; 14:1180658. [PMID: 37424723 PMCID: PMC10325862 DOI: 10.3389/fgene.2023.1180658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Accepted: 06/05/2023] [Indexed: 07/11/2023] Open
Abstract
Black-bone chicken (BBC) meat is popular for its distinctive taste and texture. A complex chromosomal rearrangement at the fibromelanosis (Fm) locus on the 20th chromosome results in increased endothelin-3 (EDN3) gene expression and is responsible for melanin hyperpigmentation in BBC. We use public long-read sequencing data of the Silkie breed to resolve high-confidence haplotypes at the Fm locus spanning both Dup1 and Dup2 regions and establish that the Fm_2 scenario is correct of the three possible scenarios of the complex chromosomal rearrangement. The relationship between Chinese and Korean BBC breeds with Kadaknath native to India is underexplored. Our data from whole-genome re-sequencing establish that all BBC breeds, including Kadaknath, share the complex chromosomal rearrangement junctions at the fibromelanosis (Fm) locus. We also identify two Fm locus proximal regions (∼70 Kb and ∼300 Kb) with signatures of selection unique to Kadaknath. These regions harbor several genes with protein-coding changes, with the bactericidal/permeability-increasing-protein-like gene having two Kadaknath-specific changes within protein domains. Our results indicate that protein-coding changes in the bactericidal/permeability-increasing-protein-like gene hitchhiked with the Fm locus in Kadaknath due to close physical linkage. Identifying this Fm locus proximal selective sweep sheds light on the genetic distinctiveness of Kadaknath compared to other BBC.
Collapse
Affiliation(s)
| | | | - Nagarjun Vijay
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal, Bhauri, Madhya Pradesh, India
| |
Collapse
|
45
|
Pardo-Palacios FJ, Arzalluz-Luque A, Kondratova L, Salguero P, Mestre-Tomás J, Amorín R, Estevan-Morió E, Liu T, Nanni A, McIntyre L, Tseng E, Conesa A. SQANTI3: curation of long-read transcriptomes for accurate identification of known and novel isoforms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.17.541248. [PMID: 37398077 PMCID: PMC10312485 DOI: 10.1101/2023.05.17.541248] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
The emergence of long-read RNA sequencing (lrRNA-seq) has provided an unprecedented opportunity to analyze transcriptomes at isoform resolution. However, the technology is not free from biases, and transcript models inferred from these data require quality control and curation. In this study, we introduce SQANTI3, a tool specifically designed to perform quality analysis on transcriptomes constructed using lrRNA-seq data. SQANTI3 provides an extensive naming framework to describe transcript model diversity in comparison to the reference transcriptome. Additionally, the tool incorporates a wide range of metrics to characterize various structural properties of transcript models, such as transcription start and end sites, splice junctions, and other structural features. These metrics can be utilized to filter out potential artifacts. Moreover, SQANTI3 includes a Rescue module that prevents the loss of known genes and transcripts exhibiting evidence of expression but displaying low-quality features. Lastly, SQANTI3 incorporates IsoAnnotLite, which enables functional annotation at the isoform level and facilitates functional iso-transcriptomics analyses. We demonstrate the versatility of SQANTI3 in analyzing different data types, isoform reconstruction pipelines, and sequencing platforms, and how it provides novel biological insights into isoform biology. The SQANTI3 software is available at https://github.com/ConesaLab/SQANTI3 .
Collapse
|
46
|
Zheng Y, Shang X. SVcnn: an accurate deep learning-based method for detecting structural variation based on long-read data. BMC Bioinformatics 2023; 24:213. [PMID: 37221476 DOI: 10.1186/s12859-023-05324-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 05/06/2023] [Indexed: 05/25/2023] Open
Abstract
BACKGROUND Structural variations (SVs) refer to variations in an organism's chromosome structure that exceed a length of 50 base pairs. They play a significant role in genetic diseases and evolutionary mechanisms. While long-read sequencing technology has led to the development of numerous SV caller methods, their performance results have been suboptimal. Researchers have observed that current SV callers often miss true SVs and generate many false SVs, especially in repetitive regions and areas with multi-allelic SVs. These errors are due to the messy alignments of long-read data, which are affected by their high error rate. Therefore, there is a need for a more accurate SV caller method. RESULT We propose a new method-SVcnn, a more accurate deep learning-based method for detecting SVs by using long-read sequencing data. We run SVcnn and other SV callers in three real datasets and find that SVcnn improves the F1-score by 2-8% compared with the second-best method when the read depth is greater than 5×. More importantly, SVcnn has better performance for detecting multi-allelic SVs. CONCLUSIONS SVcnn is an accurate deep learning-based method to detect SVs. The program is available at https://github.com/nwpuzhengyan/SVcnn .
Collapse
Affiliation(s)
- Yan Zheng
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi'an, 710072, China.
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, West Youyi Road 127, Xi'an, 710072, China.
| |
Collapse
|
47
|
Wong J, Coombe L, Nikolić V, Zhang E, Nip KM, Sidhu P, Warren RL, Birol I. Linear time complexity de novo long read genome assembly with GoldRush. Nat Commun 2023; 14:2906. [PMID: 37217507 DOI: 10.1038/s41467-023-38716-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 05/11/2023] [Indexed: 05/24/2023] Open
Abstract
Current state-of-the-art de novo long read genome assemblers follow the Overlap-Layout-Consensus paradigm. While read-to-read overlap - its most costly step - was improved in modern long read genome assemblers, these tools still often require excessive RAM when assembling a typical human dataset. Our work departs from this paradigm, foregoing all-vs-all sequence alignments in favor of a dynamic data structure implemented in GoldRush, a de novo long read genome assembly algorithm with linear time complexity. We tested GoldRush on Oxford Nanopore Technologies long sequencing read datasets with different base error profiles sourced from three human cell lines, rice, and tomato. Here, we show that GoldRush achieves assembly scaffold NGA50 lengths of 18.3-22.2, 0.3 and 2.6 Mbp, for the genomes of human, rice, and tomato, respectively, and assembles each genome within a day, using at most 54.5 GB of random-access memory, demonstrating the scalability of our genome assembly paradigm and its implementation.
Collapse
Affiliation(s)
- Johnathan Wong
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada.
| | - Lauren Coombe
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Vladimir Nikolić
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Emily Zhang
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Ka Ming Nip
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Puneet Sidhu
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - René L Warren
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Inanç Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada.
| |
Collapse
|
48
|
Kucuk E, van der Sanden BPGH, O'Gorman L, Kwint M, Derks R, Wenger AM, Lambert C, Chakraborty S, Baybayan P, Rowell WJ, Brunner HG, Vissers LELM, Hoischen A, Gilissen C. Comprehensive de novo mutation discovery with HiFi long-read sequencing. Genome Med 2023; 15:34. [PMID: 37158973 PMCID: PMC10169305 DOI: 10.1186/s13073-023-01183-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 04/19/2023] [Indexed: 05/10/2023] Open
Abstract
BACKGROUND Long-read sequencing (LRS) techniques have been very successful in identifying structural variants (SVs). However, the high error rate of LRS made the detection of small variants (substitutions and short indels < 20 bp) more challenging. The introduction of PacBio HiFi sequencing makes LRS also suited for detecting small variation. Here we evaluate the ability of HiFi reads to detect de novo mutations (DNMs) of all types, which are technically challenging variant types and a major cause of sporadic, severe, early-onset disease. METHODS We sequenced the genomes of eight parent-child trios using high coverage PacBio HiFi LRS (~ 30-fold coverage) and Illumina short-read sequencing (SRS) (~ 50-fold coverage). De novo substitutions, small indels, short tandem repeats (STRs) and SVs were called in both datasets and compared to each other to assess the accuracy of HiFi LRS. In addition, we determined the parent-of-origin of the small DNMs using phasing. RESULTS We identified a total of 672 and 859 de novo substitutions/indels, 28 and 126 de novo STRs, and 24 and 1 de novo SVs in LRS and SRS respectively. For the small variants, there was a 92 and 85% concordance between the platforms. For the STRs and SVs, the concordance was 3.6 and 0.8%, and 4 and 100% respectively. We successfully validated 27/54 LRS-unique small variants, of which 11 (41%) were confirmed as true de novo events. For the SRS-unique small variants, we validated 42/133 DNMs and 8 (19%) were confirmed as true de novo event. Validation of 18 LRS-unique de novo STR calls confirmed none of the repeat expansions as true DNM. Confirmation of the 23 LRS-unique SVs was possible for 19 candidate SVs of which 10 (52.6%) were true de novo events. Furthermore, we were able to assign 96% of DNMs to their parental allele with LRS data, as opposed to just 20% with SRS data. CONCLUSIONS HiFi LRS can now produce the most comprehensive variant dataset obtainable by a single technology in a single laboratory, allowing accurate calling of substitutions, indels, STRs and SVs. The accuracy even allows sensitive calling of DNMs on all variant levels, and also allows for phasing, which helps to distinguish true positive from false positive DNMs.
Collapse
Affiliation(s)
- Erdi Kucuk
- Department of Human Genetics, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
- Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Bart P G H van der Sanden
- Department of Human Genetics, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Luke O'Gorman
- Department of Human Genetics, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
| | - Michael Kwint
- Department of Human Genetics, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
| | - Ronny Derks
- Department of Human Genetics, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
| | | | | | | | | | | | - Han G Brunner
- Department of Human Genetics, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands
- Department of Clinical Genetics, Maastricht University Medical Center, Maastricht, The Netherlands
- GROW School for Oncology and Developmental Biology, Maastricht University Medical Center, Maastricht, The Netherlands
| | - Lisenka E L M Vissers
- Department of Human Genetics, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Alexander Hoischen
- Department of Human Genetics, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands.
- Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, The Netherlands.
- Department of Internal Medicine, Radboud University Medical Center for Infectious Diseases (RCI), Radboud University Medical Center, Nijmegen, the Netherlands.
| | - Christian Gilissen
- Department of Human Genetics, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands.
- Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, The Netherlands.
| |
Collapse
|
49
|
Luo J, Guan T, Chen G, Yu Z, Zhai H, Yan C, Luo H. SLHSD: hybrid scaffolding method based on short and long reads. Brief Bioinform 2023; 24:7152317. [PMID: 37141142 DOI: 10.1093/bib/bbad169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Revised: 01/08/2023] [Accepted: 04/12/2023] [Indexed: 05/05/2023] Open
Abstract
In genome assembly, scaffolding can obtain more complete and continuous scaffolds. Current scaffolding methods usually adopt one type of read to construct a scaffold graph and then orient and order contigs. However, scaffolding with the strengths of two or more types of reads seems to be a better solution to some tricky problems. Combining the advantages of different types of data is significant for scaffolding. Here, a hybrid scaffolding method (SLHSD) is present that simultaneously leverages the precision of short reads and the length advantage of long reads. Building an optimal scaffold graph is an important foundation for getting scaffolds. SLHSD uses a new algorithm that combines long and short read alignment information to determine whether to add an edge and how to calculate the edge weight in a scaffold graph. In addition, SLHSD develops a strategy to ensure that edges with high confidence can be added to the graph with priority. Then, a linear programming model is used to detect and remove remaining false edges in the graph. We compared SLHSD with other scaffolding methods on five datasets. Experimental results show that SLHSD outperforms other methods. The open-source code of SLHSD is available at https://github.com/luojunwei/SLHSD.
Collapse
Affiliation(s)
- Junwei Luo
- School of Software, Henan Polytechnic University, Jiaozuo 454003, China
| | - Ting Guan
- School of Software, Henan Polytechnic University, Jiaozuo 454003, China
| | - Guolin Chen
- School of Software, Henan Polytechnic University, Jiaozuo 454003, China
| | - Zhonghua Yu
- School of Software, Henan Polytechnic University, Jiaozuo 454003, China
| | - Haixia Zhai
- School of Software, Henan Polytechnic University, Jiaozuo 454003, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng 475001, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng 475001, China
| |
Collapse
|
50
|
De La Cerda GY, Landis JB, Eifler E, Hernandez AI, Li F, Zhang J, Tribble CM, Karimi N, Chan P, Givnish T, Strickler SR, Specht CD. Balancing read length and sequencing depth: Optimizing Nanopore long-read sequencing for monocots with an emphasis on the Liliales. APPLICATIONS IN PLANT SCIENCES 2023; 11:e11524. [PMID: 37342170 PMCID: PMC10278932 DOI: 10.1002/aps3.11524] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 01/20/2023] [Accepted: 01/30/2023] [Indexed: 06/22/2023]
Abstract
PREMISE We present approaches used to generate long-read Nanopore sequencing reads for the Liliales and demonstrate how modifications to standard protocols directly impact read length and total output. The goal is to help those interested in generating long-read sequencing data determine which steps may be necessary for optimizing output and results. METHODS Four species of Calochortus (Liliaceae) were sequenced. Modifications made to sodium dodecyl sulfate (SDS) extractions and cleanup protocols included grinding with a mortar and pestle, using cut or wide-bore tips, chloroform cleaning, bead cleaning, eliminating short fragments, and using highly purified DNA. RESULTS Steps taken to maximize read length can decrease overall output. Notably, the number of pores in a flow cell is correlated with the overall output, yet we did not see an association between the pore number and the read length or the number of reads produced. DISCUSSION Many factors contribute to the overall success of a Nanopore sequencing run. We showed the direct impact that several modifications to the DNA extraction and cleaning steps have on the total sequencing output, read size, and number of reads generated. We show a tradeoff between read length and the number of reads and, to a lesser extent, the total sequencing output, all of which are important factors for successful de novo genome assembly.
Collapse
Affiliation(s)
- Gisel Y. De La Cerda
- School of Integrative Plant Science, Section of Plant Biology and the L. H. Bailey HortoriumCornell UniversityIthacaNew York14853USA
| | - Jacob B. Landis
- School of Integrative Plant Science, Section of Plant Biology and the L. H. Bailey HortoriumCornell UniversityIthacaNew York14853USA
- BTI Computational Biology CenterBoyce Thompson InstituteIthacaNew York14853USA
| | - Evan Eifler
- Department of BotanyUniversity of Wisconsin–MadisonMadisonWisconsin53706USA
| | - Adriana I. Hernandez
- School of Integrative Plant Science, Section of Plant Biology and the L. H. Bailey HortoriumCornell UniversityIthacaNew York14853USA
| | - Fay‐Wei Li
- BTI Computational Biology CenterBoyce Thompson InstituteIthacaNew York14853USA
| | - Jing Zhang
- BTI Computational Biology CenterBoyce Thompson InstituteIthacaNew York14853USA
| | - Carrie M. Tribble
- School of Life SciencesUniversity of Hawaiʻi, MānoaHonoluluHawaiʻi96822USA
| | - Nisa Karimi
- Department of BotanyUniversity of Wisconsin–MadisonMadisonWisconsin53706USA
| | - Patricia Chan
- Department of BotanyUniversity of Wisconsin–MadisonMadisonWisconsin53706USA
| | - Thomas Givnish
- Department of BotanyUniversity of Wisconsin–MadisonMadisonWisconsin53706USA
| | - Susan R. Strickler
- BTI Computational Biology CenterBoyce Thompson InstituteIthacaNew York14853USA
- Present address:
Plant Science and ConservationChicago Botanic GardenGlencoeIllinois60022USA
- Present address:
Plant Biology and Conservation ProgramNorthwestern UniversityEvanstonIllinois60208USA
| | - Chelsea D. Specht
- School of Integrative Plant Science, Section of Plant Biology and the L. H. Bailey HortoriumCornell UniversityIthacaNew York14853USA
| |
Collapse
|