1
|
Zhang J, Zhao F. Circular RNA discovery with emerging sequencing and deep learning technologies. Nat Genet 2025; 57:1089-1102. [PMID: 40247051 DOI: 10.1038/s41588-025-02157-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2024] [Accepted: 03/07/2025] [Indexed: 04/19/2025]
Abstract
Circular RNA (circRNA) represents a type of RNA molecule characterized by a closed-loop structure that is distinct from linear RNA counterparts. Recent studies have revealed the emerging role of these circular transcripts in gene regulation and disease pathogenesis. However, their low expression levels and high sequence similarity to linear RNAs present substantial challenges for circRNA detection and characterization. Recent advances in long-read and single-cell RNA sequencing technologies, coupled with sophisticated deep learning-based algorithms, have revolutionized the investigation of circRNAs at unprecedented resolution and scale. This Review summarizes recent breakthroughs in circRNA discovery, characterization and functional analysis algorithms. We also discuss the challenges associated with integrating large-scale circRNA sequencing data and explore the potential future development of artificial intelligence (AI)-driven algorithms to unlock the full potential of circRNA research in biomedical applications.
Collapse
Affiliation(s)
- Jinyang Zhang
- Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
| | - Fangqing Zhao
- Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
- Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China.
- University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
2
|
Golyaev V, Dierickx S, Deforche K, Dumon W, Vanderschuren H. A method for in-depth analysis of circular DNA virus populations by unambiguously profiling the low abundant virus variants and partial genomic components. Nucleic Acids Res 2025; 53:gkaf221. [PMID: 40173013 PMCID: PMC11963754 DOI: 10.1093/nar/gkaf221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2024] [Revised: 02/19/2025] [Accepted: 03/11/2025] [Indexed: 04/04/2025] Open
Abstract
Severe epidemic outbreaks of diseases associated with newly emerging strains of single-stranded DNA (ssDNA) viruses have led to serious economic losses of numerous important food crops. While the current mitigation strategies are mostly relying on the deployment of genetic resistance in crop varieties, the constantly evolving virus populations have the potential to rapidly break virus resistance. Therefore, the development of diagnostic tools enabling early detection of virus variants associated with hypervirulence and/or expansion to new host species is urgently needed as an effective mitigation solution. Here, we introduce a novel approach by designing a pipeline that allows accurately identifying and characterizing the full-length sequence variants of viral circular DNA genomes utilizing Nanopore sequencing technology and the bioinformatics tool Genome Detective. We demonstrate that the pipeline is suitable to provide an accurate and in-depth analysis of monopartite Tomato yellow leaf curl Sardinia virus (TYLCSV) and multipartite Banana bunchy top virus (BBTV) ssDNA virus populations resulting in the profiling of high- and low-frequency virus variants with ≥1% relative abundance. The approach also enabled the unambiguous detection and characterization of four TYLCSV partial genomic sequences as well as several partial genomic sequences for each BBTV genomic component not previously reported and accumulating during infection.
Collapse
Affiliation(s)
- Victor Golyaev
- Tropical Crop Improvement Laboratory, Crop Biotechnics, Department of Biosystems, KU Leuven, Leuven 3001, Belgium
- KU Leuven Plant Institute (LPI), KU Leuven, Leuven 3001, Belgium
| | | | | | | | - Hervé Vanderschuren
- Tropical Crop Improvement Laboratory, Crop Biotechnics, Department of Biosystems, KU Leuven, Leuven 3001, Belgium
- KU Leuven Plant Institute (LPI), KU Leuven, Leuven 3001, Belgium
- Plant Genetics and Rhizospheric Processes Laboratory, Gembloux Agro BioTech, University of Liège, Gembloux 5030, Belgium
| |
Collapse
|
3
|
Otron DH, Filloux D, Brousse A, Hoareau M, Fenelon B, Hoareau C, Fernandez E, Tiendrébéogo F, Lett JM, Pita JS, Roumagnac P, Lefeuvre P. Improvement of Nanopore sequencing provides access to high quality genomic data for multi-component CRESS-DNA plant viruses. Virol J 2025; 22:78. [PMID: 40098028 PMCID: PMC11917030 DOI: 10.1186/s12985-025-02694-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2024] [Accepted: 03/04/2025] [Indexed: 03/19/2025] Open
Abstract
BACKGROUND Faced with the recrudescence of viral CRESS-DNA plant diseases, the availability of efficient and cost-effective tools for routine diagnosis and genomic characterisation is vital. As these viruses possess circular single-strand DNA genomes, they have been routinely characterised using rolling circle amplification (RCA) coupled with Sanger sequencing. However, while providing the basis of our knowledge of the diverse CRESS-DNA viruses, this approach is laboratory-intensive, time-consuming and ultimately ineffective faced with co-infection or viruses with multiple genomic components, two common characteristics of these viruses. Whereas alternatives have proved effective in some applications, there is a strong need for next-generation sequencing methods suitable for small-scale projects that can routinely produce high quality sequences comparable to the gold standard Sanger sequencing. RESULTS Here, we present an RCA sequencing diagnostic technique using the latest Oxford Nanopore Technology flongle flow cells. Originally, using the tandem-repeat nature of RCA products, we were able to improve the quality of each viral read and assemble high-quality genomic components. The effectiveness of the method was demonstrated on two plant samples, one infected with the bipartite begomovirus African cassava mosaic virus (ACMV) and the other infected with the nanovirus faba bean necrotic stunt virus (FBNSV), a virus with eight genomic segments. This method allow us to recover all genomic components of both viruses. The assembled genomes of ACMV and FBNSV shared 100% nucleotide identity with those obtained with Sanger sequencing. Additionally, our experiments demonstrated that for similar-sized components, the number of reads was proportional to the segment frequencies measured using qPCR. CONCLUSION In this study, we demonstrated an accessible and effective Nanopore-based method for high-quality genomic characterisation of CRESS-DNA viruses, comparable to Sanger sequencing. Face with of increasing challenges posed by viral CRESS-DNA plant diseases, integrating this approach into routine workflows could pave the way for more proactive responses to viral epidemics.
Collapse
Affiliation(s)
- Daniel H Otron
- The Central and West African Virus Epidemiology (WAVE) for Food Security Program, Pôle Scientifique et d'Innovation, Université Félix Houphouët-Boigny (UFHB), Abidjan , 22 BP 582, Côte d'Ivoire
- CIRAD, UMR PVBMT, St Pierre, La Réunion, F-97410, France
- UFR Biosciences, Université Félix Houphouët-Boigny (UFHB), Abidjan , 22 BP 582, Côte d'Ivoire
| | - Denis Filloux
- PHIM Plant Health Institute, University Montpellier, CIRAD, INRAE, Institut Agro, IRD, Montpellier, F- 34398, France
- CIRAD, PHIM, Montpellier, F-34398, France
| | - Andy Brousse
- PHIM Plant Health Institute, University Montpellier, CIRAD, INRAE, Institut Agro, IRD, Montpellier, F- 34398, France
| | | | | | - Cécile Hoareau
- CIRAD, UMR PVBMT, St Pierre, La Réunion, F-97410, France
| | - Emmanuel Fernandez
- PHIM Plant Health Institute, University Montpellier, CIRAD, INRAE, Institut Agro, IRD, Montpellier, F- 34398, France
- CIRAD, PHIM, Montpellier, F-34398, France
| | - Fidèle Tiendrébéogo
- The Central and West African Virus Epidemiology (WAVE) for Food Security Program, Pôle Scientifique et d'Innovation, Université Félix Houphouët-Boigny (UFHB), Abidjan , 22 BP 582, Côte d'Ivoire
| | | | - Justin S Pita
- The Central and West African Virus Epidemiology (WAVE) for Food Security Program, Pôle Scientifique et d'Innovation, Université Félix Houphouët-Boigny (UFHB), Abidjan , 22 BP 582, Côte d'Ivoire
- UFR Biosciences, Université Félix Houphouët-Boigny (UFHB), Abidjan , 22 BP 582, Côte d'Ivoire
| | - Philippe Roumagnac
- PHIM Plant Health Institute, University Montpellier, CIRAD, INRAE, Institut Agro, IRD, Montpellier, F- 34398, France
- CIRAD, PHIM, Montpellier, F-34398, France
| | - Pierre Lefeuvre
- CIRAD, UMR PVBMT, St Pierre, La Réunion, F-97410, France.
- Department of Plant Protection, College of Agriculture, CIRAD, UMR PVBMT, Can Tho University, Can Tho city, Vietnam.
| |
Collapse
|
4
|
Ansai S, Toyoda A, Yoshida K, Kitano J. Repositioning of centromere-associated repeats during karyotype evolution in Oryzias fishes. Mol Ecol 2024; 33:e17222. [PMID: 38014620 DOI: 10.1111/mec.17222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 11/04/2023] [Accepted: 11/13/2023] [Indexed: 11/29/2023]
Abstract
The karyotype, which is the number and shape of chromosomes, is a fundamental characteristic of all eukaryotes. Karyotypic changes play an important role in many aspects of evolutionary processes, including speciation. In organisms with monocentric chromosomes, it was previously thought that chromosome number changes were mainly caused by centric fusions and fissions, whereas chromosome shape changes, that is, changes in arm numbers, were mainly due to pericentric inversions. However, recent genomic and cytogenetic studies have revealed examples of alternative cases, such as tandem fusions and centromere repositioning, found in the karyotypic changes within and between species. Here, we employed comparative genomic approaches to investigate whether centromere repositioning occurred during karyotype evolution in medaka fishes. In the medaka family (Adrianichthyidae), the three phylogenetic groups differed substantially in their karyotypes. The Oryzias latipes species group has larger numbers of chromosome arms than the other groups, with most chromosomes being metacentric. The O. javanicus species group has similar numbers of chromosomes to the O. latipes species group, but smaller arm numbers, with most chromosomes being acrocentric. The O. celebensis species group has fewer chromosomes than the other two groups and several large metacentric chromosomes that were likely formed by chromosomal fusions. By comparing the genome assemblies of O. latipes, O. javanicus, and O. celebensis, we found that repositioning of centromere-associated repeats might be more common than simple pericentric inversion. Our results demonstrated that centromere repositioning may play a more important role in karyotype evolution than previously appreciated.
Collapse
Affiliation(s)
- Satoshi Ansai
- Laboratory of Genome Editing Breeding, Graduate School of Agriculture, Kyoto University, Kyoto, Japan
| | - Atsushi Toyoda
- Comparative Genomics Laboratory, National Institute of Genetics, Mishima, Japan
| | - Kohta Yoshida
- Ecological Genetics Laboratory, National Institute of Genetics, Mishima, Japan
| | - Jun Kitano
- Ecological Genetics Laboratory, National Institute of Genetics, Mishima, Japan
| |
Collapse
|
5
|
Song Z, Zahin T, Li X, Shao M. Accurate Detection of Tandem Repeats from Error-Prone Sequences with EquiRep. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.05.621953. [PMID: 39574759 PMCID: PMC11580891 DOI: 10.1101/2024.11.05.621953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2024]
Abstract
A tandem repeat is a sequence of nucleotides that occurs as multiple contiguous and near-identical copies positioned next to each other. These repeats play critical roles in genetic diversity, gene regulation, and are strongly linked to various neurological and developmental disorders. While several methods exist for detecting tandem repeats, they often exhibit low accuracy when the repeat unit length increases or the number of copies is low. Furthermore, methods capable of handling highly mutated sequences remain scarce, highlighting a significant opportunity for improvement. We introduce EquiRep, a tool for accurate detection of tandem repeats from erroneous sequences. EquiRep estimates the likelihood of positions originating from the same position in the unit by self-alignment followed by a novel approach that refines the estimation. The built equivalent classes and the consecutive position information will be then used to build a weighted graph, and the cycle in this graph with maximum bottleneck weight while covering most nucleotide positions will be identified to reconstruct the repeat unit. We test EquiRep on simulated and real HOR and RCA datasets where it consistently outperforms or is comparable to state-of-the-art methods. EquiRep is robust to sequencing errors, and is able to make better predictions for long units and low frequencies which underscores its broad usability for studying tandem repeats.
Collapse
Affiliation(s)
- Zhezheng Song
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| | - Tasfia Zahin
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| | - Xiang Li
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| | - Mingfu Shao
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
6
|
Vigouroux M, Novák P, Oliveira LC, Santos C, Cheema J, Wouters RHM, Paajanen P, Vickers M, Koblížková A, Vaz Patto MC, Macas J, Steuernagel B, Martin C, Emmrich PMF. A chromosome-scale reference genome of grasspea (Lathyrus sativus). Sci Data 2024; 11:1035. [PMID: 39333203 PMCID: PMC11437036 DOI: 10.1038/s41597-024-03868-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 09/05/2024] [Indexed: 09/29/2024] Open
Abstract
Grasspea (Lathyrus sativus L.) is an underutilised but promising legume crop with tolerance to a wide range of abiotic and biotic stress factors, and potential for climate-resilient agriculture. Despite a long history and wide geographical distribution of cultivation, only limited breeding resources are available. This paper reports a 5.96 Gbp genome assembly of grasspea genotype LS007, of which 5.03 Gbp is scaffolded into 7 pseudo-chromosomes. The assembly has a BUSCO completeness score of 99.1% and is annotated with 31719 gene models and repeat elements. This represents the most contiguous and accurate assembly of the grasspea genome to date.
Collapse
Affiliation(s)
- Marielle Vigouroux
- John Innes Centre, Norwich Research Park, Colney Lane, Norwich, NR4 7UH, UK
| | - Petr Novák
- Institute of Plant Molecular Biology, Biology Centre CAS, Branisovska 31, Ceske Budejovice, CZ, 37005, Czech Republic
| | - Ludmila Cristina Oliveira
- Institute of Plant Molecular Biology, Biology Centre CAS, Branisovska 31, Ceske Budejovice, CZ, 37005, Czech Republic
| | - Carmen Santos
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, Oeiras, 2780-157, Portugal
| | - Jitender Cheema
- John Innes Centre, Norwich Research Park, Colney Lane, Norwich, NR4 7UH, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, CB10 1SD, Cambridge, United Kingdom
| | - Roland H M Wouters
- John Innes Centre, Norwich Research Park, Colney Lane, Norwich, NR4 7UH, UK
| | - Pirita Paajanen
- John Innes Centre, Norwich Research Park, Colney Lane, Norwich, NR4 7UH, UK
| | - Martin Vickers
- John Innes Centre, Norwich Research Park, Colney Lane, Norwich, NR4 7UH, UK
| | - Andrea Koblížková
- Institute of Plant Molecular Biology, Biology Centre CAS, Branisovska 31, Ceske Budejovice, CZ, 37005, Czech Republic
| | - Maria Carlota Vaz Patto
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, Oeiras, 2780-157, Portugal
| | - Jiří Macas
- Institute of Plant Molecular Biology, Biology Centre CAS, Branisovska 31, Ceske Budejovice, CZ, 37005, Czech Republic
| | | | - Cathie Martin
- John Innes Centre, Norwich Research Park, Colney Lane, Norwich, NR4 7UH, UK
| | - Peter M F Emmrich
- John Innes Centre, Norwich Research Park, Colney Lane, Norwich, NR4 7UH, UK.
- Norwich Institute for Sustainable Development, School of International Development, University of East Anglia, Norwich, NR4 7TJ, UK.
| |
Collapse
|
7
|
de Lima LG, Guarracino A, Koren S, Potapova T, McKinney S, Rhie A, Solar SJ, Seidel C, Fagen B, Walenz BP, Bouffard GG, Brooks SY, Peterson M, Hall K, Crawford J, Young AC, Pickett BD, Garrison E, Phillippy AM, Gerton JL. The formation and propagation of human Robertsonian chromosomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.24.614821. [PMID: 39386535 PMCID: PMC11463614 DOI: 10.1101/2024.09.24.614821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Robertsonian chromosomes are a type of variant chromosome found commonly in nature. Present in one in 800 humans, these chromosomes can underlie infertility, trisomies, and increased cancer incidence. Recognized cytogenetically for more than a century, their origins have remained mysterious. Recent advances in genomics allowed us to assemble three human Robertsonian chromosomes completely. We identify a common breakpoint and epigenetic changes in centromeres that provide insight into the formation and propagation of common Robertsonian translocations. Further investigation of the assembled genomes of chimpanzee and bonobo highlights the structural features of the human genome that uniquely enable the specific crossover event that creates these chromosomes. Resolving the structure and epigenetic features of human Robertsonian chromosomes at a molecular level paves the way to understanding how chromosomal structural variation occurs more generally, and how chromosomes evolve.
Collapse
Affiliation(s)
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Sergey Koren
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Sean McKinney
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Arang Rhie
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Steven J Solar
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Chris Seidel
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Brandon Fagen
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Brian P Walenz
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Gerard G Bouffard
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shelise Y Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Kate Hall
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Juyun Crawford
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Alice C Young
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Brandon D Pickett
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Adam M Phillippy
- Stowers Institute for Medical Research, Kansas City, MO, USA
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | |
Collapse
|
8
|
Wu J, Liu F, Jiao J, Luo H, Fan S, Liu J, Wang H, Cui N, Zhao N, Qu Q, Kuraku S, Huang Z, Xu L. Comparative genomics illuminates karyotype and sex chromosome evolution of sharks. CELL GENOMICS 2024; 4:100607. [PMID: 38996479 PMCID: PMC11406177 DOI: 10.1016/j.xgen.2024.100607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 05/01/2024] [Accepted: 06/19/2024] [Indexed: 07/14/2024]
Abstract
Chondrichthyes is an important lineage to reconstruct the evolutionary history of vertebrates. Here, we analyzed genome synteny for six chondrichthyan chromosome-level genomes. Our comparative analysis reveals a slow evolutionary rate of chromosomal changes, with infrequent but independent fusions observed in sharks, skates, and chimaeras. The chondrichthyan common ancestor had a proto-vertebrate-like karyotype, including the presence of 18 microchromosome pairs. The X chromosome is a conversed microchromosome shared by all sharks, suggesting a likely common origin of the sex chromosome at least 181 million years ago. We characterized the Y chromosomes of two sharks that are highly differentiated from the X except for a small young evolutionary stratum and a small pseudoautosomal region. We found that shark sex chromosomes lack global dosage compensation but that dosage-sensitive genes are locally compensated. Our study on shark chromosome evolution enhances our understanding of shark sex chromosomes and vertebrate chromosome evolution.
Collapse
Affiliation(s)
- Jiahong Wu
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City, MOE Key Laboratory of Freshwater Fish Reproduction and Development, School of Life Sciences, Southwest University, Chongqing 400715, China
| | - Fujiang Liu
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Xiamen University, Xiamen, China
| | - Jie Jiao
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City, MOE Key Laboratory of Freshwater Fish Reproduction and Development, School of Life Sciences, Southwest University, Chongqing 400715, China
| | - Haoran Luo
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City, MOE Key Laboratory of Freshwater Fish Reproduction and Development, School of Life Sciences, Southwest University, Chongqing 400715, China; Key Laboratory of Ministry of Education for the Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen 361102, China
| | - Shiyu Fan
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City, MOE Key Laboratory of Freshwater Fish Reproduction and Development, School of Life Sciences, Southwest University, Chongqing 400715, China
| | - Jiao Liu
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City, MOE Key Laboratory of Freshwater Fish Reproduction and Development, School of Life Sciences, Southwest University, Chongqing 400715, China
| | - Hongxiang Wang
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City, MOE Key Laboratory of Freshwater Fish Reproduction and Development, School of Life Sciences, Southwest University, Chongqing 400715, China
| | - Ning Cui
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City, MOE Key Laboratory of Freshwater Fish Reproduction and Development, School of Life Sciences, Southwest University, Chongqing 400715, China
| | - Ning Zhao
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City, MOE Key Laboratory of Freshwater Fish Reproduction and Development, School of Life Sciences, Southwest University, Chongqing 400715, China; Key Laboratory of Animal Genetics and Breeding and Molecular Design of Jiangsu Province, College of Animal Science and Technology, Yangzhou University, Yangzhou 225009, China
| | - Qingming Qu
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Xiamen University, Xiamen, China
| | - Shigehiro Kuraku
- Molecular Life History Laboratory, Department of Genomics and Evolutionary Biology, National Institute of Genetics, Shizuoka, Japan; Department of Genetics, Sokendai (Graduate University for Advanced Studies), Mishima, Japan
| | - Zhen Huang
- Fujian-Macao Science and Technology Cooperation Base of Traditional Chinese Medicine-Oriented Chronic Disease Prevention and Treatment, Innovation and Transformation Center, Fujian University of Traditional Chinese Medicine, Fuzhou 350108, China
| | - Luohao Xu
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City, MOE Key Laboratory of Freshwater Fish Reproduction and Development, School of Life Sciences, Southwest University, Chongqing 400715, China.
| |
Collapse
|
9
|
Unneberg P, Larsson M, Olsson A, Wallerman O, Petri A, Bunikis I, Vinnere Pettersson O, Papetti C, Gislason A, Glenner H, Cartes JE, Blanco-Bercial L, Eriksen E, Meyer B, Wallberg A. Ecological genomics in the Northern krill uncovers loci for local adaptation across ocean basins. Nat Commun 2024; 15:6297. [PMID: 39090106 PMCID: PMC11294593 DOI: 10.1038/s41467-024-50239-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 05/15/2024] [Indexed: 08/04/2024] Open
Abstract
Krill are vital as food for many marine animals but also impacted by global warming. To learn how they and other zooplankton may adapt to a warmer world we studied local adaptation in the widespread Northern krill (Meganyctiphanes norvegica). We assemble and characterize its large genome and compare genome-scale variation among 74 specimens from the colder Atlantic Ocean and warmer Mediterranean Sea. The 19 Gb genome likely evolved through proliferation of retrotransposons, now targeted for inactivation by extensive DNA methylation, and contains many duplicated genes associated with molting and vision. Analysis of 760 million SNPs indicates extensive homogenizing gene-flow among populations. Nevertheless, we detect signatures of adaptive divergence across hundreds of genes, implicated in photoreception, circadian regulation, reproduction and thermal tolerance, indicating polygenic adaptation to light and temperature. The top gene candidate for ecological adaptation was nrf-6, a lipid transporter with a Mediterranean variant that may contribute to early spring reproduction. Such variation could become increasingly important for fitness in Atlantic stocks. Our study underscores the widespread but uneven distribution of adaptive variation, necessitating characterization of genetic variation among natural zooplankton populations to understand their adaptive potential, predict risks and support ocean conservation in the face of climate change.
Collapse
Affiliation(s)
- Per Unneberg
- Department of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Mårten Larsson
- Department of Medical Biochemistry and Microbiology, Uppsala University, Husargatan 3, 751 23, Uppsala, Sweden
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | - Anna Olsson
- Department of Medical Biochemistry and Microbiology, Uppsala University, Husargatan 3, 751 23, Uppsala, Sweden
| | - Ola Wallerman
- Department of Medical Biochemistry and Microbiology, Uppsala University, Husargatan 3, 751 23, Uppsala, Sweden
| | - Anna Petri
- Uppsala Genome Center, Department of Immunology, Genetics and Pathology, Uppsala University, National Genomics Infrastructure hosted by SciLifeLab, Uppsala, Sweden
| | - Ignas Bunikis
- Uppsala Genome Center, Department of Immunology, Genetics and Pathology, Uppsala University, National Genomics Infrastructure hosted by SciLifeLab, Uppsala, Sweden
| | - Olga Vinnere Pettersson
- Uppsala Genome Center, Department of Immunology, Genetics and Pathology, Uppsala University, National Genomics Infrastructure hosted by SciLifeLab, Uppsala, Sweden
| | | | - Astthor Gislason
- Marine and Freshwater Research Institute, Pelagic Division, Reykjavik, Iceland
| | - Henrik Glenner
- Department of Biological Sciences, University of Bergen, Bergen, Norway
- Center for Macroecology, Evolution and Climate Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Joan E Cartes
- Instituto de Ciencias del Mar (ICM-CSIC), Barcelona, Spain
| | | | | | - Bettina Meyer
- Section Polar Biological Oceanography, Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research, Bremerhaven, Germany
- Institute for Chemistry and Biology of the Marine Environment, Carlvon Ossietzky University of Oldenburg, Oldenburg, Germany
- Helmholtz Institute for Functional Marine Biodiversity (HIFMB), University of Oldenburg, Oldenburg, Germany
| | - Andreas Wallberg
- Department of Medical Biochemistry and Microbiology, Uppsala University, Husargatan 3, 751 23, Uppsala, Sweden.
| |
Collapse
|
10
|
Kitchen SA, Naragon TH, Brückner A, Ladinsky MS, Quinodoz SA, Badroos JM, Viliunas JW, Kishi Y, Wagner JM, Miller DR, Yousefelahiyeh M, Antoshechkin IA, Eldredge KT, Pirro S, Guttman M, Davis SR, Aardema ML, Parker J. The genomic and cellular basis of biosynthetic innovation in rove beetles. Cell 2024; 187:3563-3584.e26. [PMID: 38889727 PMCID: PMC11246231 DOI: 10.1016/j.cell.2024.05.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 02/29/2024] [Accepted: 05/06/2024] [Indexed: 06/20/2024]
Abstract
How evolution at the cellular level potentiates macroevolutionary change is central to understanding biological diversification. The >66,000 rove beetle species (Staphylinidae) form the largest metazoan family. Combining genomic and cell type transcriptomic insights spanning the largest clade, Aleocharinae, we retrace evolution of two cell types comprising a defensive gland-a putative catalyst behind staphylinid megadiversity. We identify molecular evolutionary steps leading to benzoquinone production by one cell type via a mechanism convergent with plant toxin release systems, and synthesis by the second cell type of a solvent that weaponizes the total secretion. This cooperative system has been conserved since the Early Cretaceous as Aleocharinae radiated into tens of thousands of lineages. Reprogramming each cell type yielded biochemical novelties enabling ecological specialization-most dramatically in symbionts that infiltrate social insect colonies via host-manipulating secretions. Our findings uncover cell type evolutionary processes underlying the origin and evolvability of a beetle chemical innovation.
Collapse
Affiliation(s)
- Sheila A Kitchen
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Thomas H Naragon
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Adrian Brückner
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Mark S Ladinsky
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Sofia A Quinodoz
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Jean M Badroos
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Joani W Viliunas
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Yuriko Kishi
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Julian M Wagner
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - David R Miller
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Mina Yousefelahiyeh
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Igor A Antoshechkin
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - K Taro Eldredge
- Museum of Zoology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Stacy Pirro
- Iridian Genomes, 613 Quaint Acres Dr., Silver Spring, MD 20904, USA
| | - Mitchell Guttman
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Steven R Davis
- Division of Invertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA
| | - Matthew L Aardema
- Department of Biology, Montclair State University, Montclair, NJ 07043, USA
| | - Joseph Parker
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA.
| |
Collapse
|
11
|
Digby B, Finn S, Ó Broin P. Computational approaches and challenges in the analysis of circRNA data. BMC Genomics 2024; 25:527. [PMID: 38807085 PMCID: PMC11134749 DOI: 10.1186/s12864-024-10420-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 05/15/2024] [Indexed: 05/30/2024] Open
Abstract
Circular RNAs (circRNA) are a class of non-coding RNA, forming a single-stranded covalently closed loop structure generated via back-splicing. Advancements in sequencing methods and technologies in conjunction with algorithmic developments of bioinformatics tools have enabled researchers to characterise the origin and function of circRNAs, with practical applications as a biomarker of diseases becoming increasingly relevant. Computational methods developed for circRNA analysis are predicated on detecting the chimeric back-splice junction of circRNAs whilst mitigating false-positive sequencing artefacts. In this review, we discuss in detail the computational strategies developed for circRNA identification, highlighting a selection of tool strengths, weaknesses and assumptions. In addition to circRNA identification tools, we describe methods for characterising the role of circRNAs within the competing endogenous RNA (ceRNA) network, their interactions with RNA-binding proteins, and publicly available databases for rich circRNA annotation.
Collapse
Affiliation(s)
- Barry Digby
- School of Mathematical and Statistical Sciences, University of Galway, Galway, Ireland.
| | - Stephen Finn
- Discipline of Histopathology, School of Medicine, Trinity College Dublin and Cancer Molecular Diagnostic Laboratory, Dublin, Ireland
| | - Pilib Ó Broin
- School of Mathematical and Statistical Sciences, University of Galway, Galway, Ireland
| |
Collapse
|
12
|
Packiaraj J, Thakur J. DNA satellite and chromatin organization at mouse centromeres and pericentromeres. Genome Biol 2024; 25:52. [PMID: 38378611 PMCID: PMC10880262 DOI: 10.1186/s13059-024-03184-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 02/12/2024] [Indexed: 02/22/2024] Open
Abstract
BACKGROUND Centromeres are essential for faithful chromosome segregation during mitosis and meiosis. However, the organization of satellite DNA and chromatin at mouse centromeres and pericentromeres is poorly understood due to the challenges of assembling repetitive genomic regions. RESULTS Using recently available PacBio long-read sequencing data from the C57BL/6 strain, we find that contrary to the previous reports of their homogeneous nature, both centromeric minor satellites and pericentromeric major satellites exhibit a high degree of variation in sequence and organization within and between arrays. While most arrays are continuous, a significant fraction is interspersed with non-satellite sequences, including transposable elements. Using chromatin immunoprecipitation sequencing (ChIP-seq), we find that the occupancy of CENP-A and H3K9me3 chromatin at centromeric and pericentric regions, respectively, is associated with increased sequence enrichment and homogeneity at these regions. The transposable elements at centromeric regions are not part of functional centromeres as they lack significant CENP-A enrichment. Furthermore, both CENP-A and H3K9me3 nucleosomes occupy minor and major satellites spanning centromeric-pericentric junctions and a low yet significant amount of CENP-A spreads locally at centromere junctions on both pericentric and telocentric sides. Finally, while H3K9me3 nucleosomes display a well-phased organization on major satellite arrays, CENP-A nucleosomes on minor satellite arrays are poorly phased. Interestingly, the homogeneous class of major satellites also phase CENP-A and H3K27me3 nucleosomes, indicating that the nucleosome phasing is an inherent property of homogeneous major satellites. CONCLUSIONS Our findings reveal that mouse centromeres and pericentromeres display a high diversity in satellite sequence, organization, and chromatin structure.
Collapse
Affiliation(s)
- Jenika Packiaraj
- Department of Biology, Emory University, 1510 Clifton Rd, Atlanta, GA, 30322, USA
| | - Jitendra Thakur
- Department of Biology, Emory University, 1510 Clifton Rd, Atlanta, GA, 30322, USA.
| |
Collapse
|
13
|
Sauvage T, Cormier A, Delphine P. A comparison of Oxford nanopore library strategies for bacterial genomics. BMC Genomics 2023; 24:627. [PMID: 37864145 PMCID: PMC10589936 DOI: 10.1186/s12864-023-09729-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 10/11/2023] [Indexed: 10/22/2023] Open
Abstract
BACKGROUND Oxford nanopore Technologies (ONT) provides three main library preparation strategies to sequence bacterial genomes. These include tagmentation (TAG), ligation (LIG) and amplification (PCR). Despite ONT's recommendations, making an informed decision for preparation choice remains difficult without a side-by-side comparison. Here, we sequenced 12 bacterial strains to examine the overall output of these strategies, including sequencing noise, barcoding efficiency and assembly quality based on mapping to curated genomes established herein. RESULTS Average read length ranged closely for TAG and LIG (> 5,000 bp), while being drastically smaller for PCR (< 1,100 bp). LIG produced the largest output with 33.62 Gbp vs. 11.72 Gbp for TAG and 4.79 Gbp for PCR. PCR produced the most sequencing noise with only 22.7% of reads mappable to the curated genomes, vs. 92.9% for LIG and 87.3% for TAG. Output per channel was most homogenous in LIG and most variable in PCR, while intermediate in TAG. Artifactual tandem content was most abundant in PCR (22.5%) and least in LIG and TAG (0.9% and 2.2%). Basecalling and demultiplexing of barcoded libraries resulted in ~ 20% data loss as unclassified reads and 1.5% read leakage. CONCLUSION The output of LIG was best (low noise, high read numbers of long lengths), intermediate in TAG (some noise, moderate read numbers of long lengths) and less desirable in PCR (high noise, high read numbers of short lengths). Overall, users should not accept assembly results at face value without careful replicon verification, including the detection of plasmids assembled from leaked reads.
Collapse
Affiliation(s)
- Thomas Sauvage
- Ifremer, MASAE Microbiologie Aliment Santé Environnement, F-44000, Nantes, France.
| | | | - Passerini Delphine
- Ifremer, MASAE Microbiologie Aliment Santé Environnement, F-44000, Nantes, France
| |
Collapse
|
14
|
Merkulov P, Egorova E, Kirov I. Composition and Structure of Arabidopsis thaliana Extrachromosomal Circular DNAs Revealed by Nanopore Sequencing. PLANTS (BASEL, SWITZERLAND) 2023; 12:2178. [PMID: 37299157 PMCID: PMC10255303 DOI: 10.3390/plants12112178] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 05/19/2023] [Accepted: 05/29/2023] [Indexed: 06/12/2023]
Abstract
Extrachromosomal circular DNAs (eccDNAs) are enigmatic DNA molecules that have been detected in a range of organisms. In plants, eccDNAs have various genomic origins and may be derived from transposable elements. The structures of individual eccDNA molecules and their dynamics in response to stress are poorly understood. In this study, we showed that nanopore sequencing is a useful tool for the detection and structural analysis of eccDNA molecules. Applying nanopore sequencing to the eccDNA molecules of epigenetically stressed Arabidopsis plants grown under various stress treatments (heat, abscisic acid, and flagellin), we showed that TE-derived eccDNA quantity and structure vary dramatically between individual TEs. Epigenetic stress alone did not cause eccDNA up-regulation, whereas its combination with heat stress triggered the generation of full-length and various truncated eccDNAs of the ONSEN element. We showed that the ratio between full-length and truncated eccDNAs is TE- and condition-dependent. Our work paves the way for further elucidation of the structural features of eccDNAs and their connections with various biological processes, such as eccDNA transcription and eccDNA-mediated TE silencing.
Collapse
Affiliation(s)
- Pavel Merkulov
- Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Russia;
- All-Russia Research Institute of Agricultural Biotechnology, 127550 Moscow, Russia;
| | - Ekaterina Egorova
- All-Russia Research Institute of Agricultural Biotechnology, 127550 Moscow, Russia;
| | - Ilya Kirov
- Moscow Institute of Physics and Technology, 141701 Dolgoprudny, Russia;
- All-Russia Research Institute of Agricultural Biotechnology, 127550 Moscow, Russia;
| |
Collapse
|
15
|
Kitchen SA, Naragon TH, Brückner A, Ladinsky MS, Quinodoz SA, Badroos JM, Viliunas JW, Wagner JM, Miller DR, Yousefelahiyeh M, Antoshechkin IA, Eldredge KT, Pirro S, Guttman M, Davis SR, Aardema ML, Parker J. The genomic and cellular basis of biosynthetic innovation in rove beetles. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.29.542378. [PMID: 37398185 PMCID: PMC10312436 DOI: 10.1101/2023.05.29.542378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
How evolution at the cellular level potentiates change at the macroevolutionary level is a major question in evolutionary biology. With >66,000 described species, rove beetles (Staphylinidae) comprise the largest metazoan family. Their exceptional radiation has been coupled to pervasive biosynthetic innovation whereby numerous lineages bear defensive glands with diverse chemistries. Here, we combine comparative genomic and single-cell transcriptomic data from across the largest rove beetle clade, Aleocharinae. We retrace the functional evolution of two novel secretory cell types that together comprise the tergal gland-a putative catalyst behind Aleocharinae's megadiversity. We identify key genomic contingencies that were critical to the assembly of each cell type and their organ-level partnership in manufacturing the beetle's defensive secretion. This process hinged on evolving a mechanism for regulated production of noxious benzoquinones that appears convergent with plant toxin release systems, and synthesis of an effective benzoquinone solvent that weaponized the total secretion. We show that this cooperative biosynthetic system arose at the Jurassic-Cretaceous boundary, and that following its establishment, both cell types underwent ∼150 million years of stasis, their chemistry and core molecular architecture maintained almost clade-wide as Aleocharinae radiated globally into tens of thousands of lineages. Despite this deep conservation, we show that the two cell types have acted as substrates for the emergence of adaptive, biochemical novelties-most dramatically in symbiotic lineages that have infiltrated social insect colonies and produce host behavior-manipulating secretions. Our findings uncover genomic and cell type evolutionary processes underlying the origin, functional conservation and evolvability of a chemical innovation in beetles.
Collapse
|
16
|
Mascarenhas Dos Santos AC, Julian AT, Liang P, Juárez O, Pombert JF. Telomere-to-Telomere genome assemblies of human-infecting Encephalitozoon species. BMC Genomics 2023; 24:237. [PMID: 37142951 PMCID: PMC10158259 DOI: 10.1186/s12864-023-09331-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 04/25/2023] [Indexed: 05/06/2023] Open
Abstract
BACKGROUND Microsporidia are diverse spore forming, fungal-related obligate intracellular pathogens infecting a wide range of hosts. This diversity is reflected at the genome level with sizes varying by an order of magnitude, ranging from less than 3 Mb in Encephalitozoon species (the smallest known in eukaryotes) to more than 50 Mb in Edhazardia spp. As a paradigm of genome reduction in eukaryotes, the small Encephalitozoon genomes have attracted much attention with investigations revealing gene dense, repeat- and intron-poor genomes characterized by a thorough pruning of molecular functions no longer relevant to their obligate intracellular lifestyle. However, because no Encephalitozoon genome has been sequenced from telomere-to-telomere and since no methylation data is available for these species, our understanding of their overall genetic and epigenetic architectures is incomplete. METHODS In this study, we sequenced the complete genomes from telomere-to-telomere of three human-infecting Encephalitozoon spp. -E. intestinalis ATCC 50506, E. hellem ATCC 50604 and E. cuniculi ATCC 50602- using short and long read platforms and leveraged the data generated as part of the sequencing process to investigate the presence of epigenetic markers in these genomes. We also used a mixture of sequence- and structure-based computational approaches, including protein structure prediction, to help identify which Encephalitozoon proteins are involved in telomere maintenance, epigenetic regulation, and heterochromatin formation. RESULTS The Encephalitozoon chromosomes were found capped by TTAGG 5-mer telomeric repeats followed by telomere associated repeat elements (TAREs) flanking hypermethylated ribosomal RNA (rRNA) gene loci featuring 5-methylcytosines (5mC) and 5-hemimethylcytosines (5hmC), themselves followed by lesser methylated subtelomeres and hypomethylated chromosome cores. Strong nucleotide biases were identified between the telomeres/subtelomeres and chromosome cores with significant changes in GC/AT, GT/AC and GA/CT contents. The presence of several genes coding for proteins essential to telomere maintenance, epigenetic regulation, and heterochromatin formation was further confirmed in the Encephalitozoon genomes. CONCLUSION Altogether, our results strongly support the subtelomeres as sites of heterochromatin formation in Encephalitozoon genomes and further suggest that these species might shutdown their energy-consuming ribosomal machinery while dormant as spores by silencing of the rRNA genes using both 5mC/5hmC methylation and facultative heterochromatin formation at these loci.
Collapse
Affiliation(s)
| | | | - Pingdong Liang
- Department of Biology, Illinois Institute of Technology, Chicago, IL, USA
| | - Oscar Juárez
- Department of Biology, Illinois Institute of Technology, Chicago, IL, USA
| | | |
Collapse
|
17
|
Huang Z, Xu Z, Bai H, Huang Y, Kang N, Ding X, Liu J, Luo H, Yang C, Chen W, Guo Q, Xue L, Zhang X, Xu L, Chen M, Fu H, Chen Y, Yue Z, Fukagawa T, Liu S, Chang G, Xu L. Evolutionary analysis of a complete chicken genome. Proc Natl Acad Sci U S A 2023; 120:e2216641120. [PMID: 36780517 PMCID: PMC9974502 DOI: 10.1073/pnas.2216641120] [Citation(s) in RCA: 51] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 01/18/2023] [Indexed: 02/15/2023] Open
Abstract
Microchromosomes are prevalent in nonmammalian vertebrates [P. D. Waters et al., Proc. Natl. Acad. Sci. U.S.A. 118 (2021)], but a few of them are missing in bird genome assemblies. Here, we present a new chicken reference genome containing all autosomes, a Z and a W chromosome, with all gaps closed except for the W. We identified ten small microchromosomes (termed dot chromosomes) with distinct sequence and epigenetic features, among which six were newly assembled. Those dot chromosomes exhibit extremely high GC content and a high level of DNA methylation and are enriched for housekeeping genes. The pericentromeric heterochromatin of dot chromosomes is disproportionately large and continues to expand with the proliferation of satellite DNA and testis-expressed genes. Our analyses revealed that the 41-bp CNM repeat frequently forms higher-order repeats (HORs) at the centromeres of acrocentric chromosomes. The centromere core regions where the kinetochore attaches often encompass telomeric sequence (TTAGGG)n, and in a one of the dot chromosomes, the centromere core recruits an endogenous retrovirus (ERV). We further demonstrate that the W chromosome shares some common features with dot chromosomes, having large arrays of hypermethylated tandem repeats. Finally, using the complete chicken chromosome models, we reconstructed a fine picture of chordate karyotype evolution, revealing frequent chromosomal fusions before and after vertebrate whole-genome duplications. Our sequence and epigenetic characterization of chicken chromosomes shed insights into the understanding of vertebrate genome evolution and chromosome biology.
Collapse
Affiliation(s)
- Zhen Huang
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Key Laboratory of Aquatic Science of Chongqing, School of Life Sciences, Southwest University, Chongqing400715, China
- Fujian Key Laboratory of Developmental and Neural Biology, College of Life Sciences, Fujian Normal University, Fuzhou350117, China
- Fujian-Macao Science and Technology Cooperation Base of Traditional Chinese Medicine-Oriented Chronic Disease Prevention and Treatment, Innovation and Transformation Center, Fujian University of Traditional Chinese Medicine, Fuzhou350108, China
| | - Zaoxu Xu
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Key Laboratory of Aquatic Science of Chongqing, School of Life Sciences, Southwest University, Chongqing400715, China
- Gansu Key Laboratory of Protection and Utilization for Biological Resources and Ecological Restoration, College of Life Sciences and Technology, Longdong University, Qingyang, Gansu Province745000, China
| | - Hao Bai
- Joint International Research Laboratory of Agriculture and Agri-Product Safety, the Ministry of Education of China, Yangzhou University, Yangzhou225009, China
- Key Laboratory of Animal Genetics and Breeding and Molecular Design of Jiangsu Province, College of Animal Science and Technology, Yangzhou University, Yangzhou225009, China
| | - Yongji Huang
- Institute of Oceanography, Minjiang University, Fuzhou350108, China
| | - Na Kang
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Key Laboratory of Aquatic Science of Chongqing, School of Life Sciences, Southwest University, Chongqing400715, China
| | - Xiaoting Ding
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Key Laboratory of Aquatic Science of Chongqing, School of Life Sciences, Southwest University, Chongqing400715, China
| | - Jing Liu
- Department of Neuroscience and Developmental Biology, University of Vienna, Vienna1090, Austria
| | - Haoran Luo
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Key Laboratory of Aquatic Science of Chongqing, School of Life Sciences, Southwest University, Chongqing400715, China
- Key Laboratory of Ministry of Education for Coast and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen361102, China
| | | | | | - Qixin Guo
- Joint International Research Laboratory of Agriculture and Agri-Product Safety, the Ministry of Education of China, Yangzhou University, Yangzhou225009, China
- Key Laboratory of Animal Genetics and Breeding and Molecular Design of Jiangsu Province, College of Animal Science and Technology, Yangzhou University, Yangzhou225009, China
| | - Lingzhan Xue
- Aquaculture and Genetic breeding laboratory, Freshwater Fisheries Research Institute of Fujian, Fuzhou350002, China
| | - Xueping Zhang
- Fujian Key Laboratory of Developmental and Neural Biology, College of Life Sciences, Fujian Normal University, Fuzhou350117, China
| | - Li Xu
- Fujian Key Laboratory of Developmental and Neural Biology, College of Life Sciences, Fujian Normal University, Fuzhou350117, China
| | - Meiling Chen
- Fujian Key Laboratory of Developmental and Neural Biology, College of Life Sciences, Fujian Normal University, Fuzhou350117, China
| | - Honggao Fu
- Fujian Key Laboratory of Developmental and Neural Biology, College of Life Sciences, Fujian Normal University, Fuzhou350117, China
| | - Youling Chen
- Fujian Key Laboratory of Developmental and Neural Biology, College of Life Sciences, Fujian Normal University, Fuzhou350117, China
| | - Zhicao Yue
- Department of Cell Biology and Medical Genetics, International Cancer Center, and Guangdong Key Laboratory for Genome Stability and Disease Prevention, Shenzhen University School of Medicine, Guangdong, 518054, China
| | - Tatsuo Fukagawa
- Graduate School of Frontier Biosciences, Osaka University, Suita, Osaka565-0871, Japan
| | - Shanlin Liu
- Department of Entomology, China Agricultural University, Beijing100193, China
| | - Guobin Chang
- Joint International Research Laboratory of Agriculture and Agri-Product Safety, the Ministry of Education of China, Yangzhou University, Yangzhou225009, China
- Key Laboratory of Animal Genetics and Breeding and Molecular Design of Jiangsu Province, College of Animal Science and Technology, Yangzhou University, Yangzhou225009, China
| | - Luohao Xu
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Key Laboratory of Aquatic Science of Chongqing, School of Life Sciences, Southwest University, Chongqing400715, China
| |
Collapse
|
18
|
Lang J, Xu Z, Wang Y, Sun J, Yang Z. NanoSTR: A method for detection of target short tandem repeats based on nanopore sequencing data. Front Mol Biosci 2023; 10:1093519. [PMID: 36743210 PMCID: PMC9889824 DOI: 10.3389/fmolb.2023.1093519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 01/06/2023] [Indexed: 01/19/2023] Open
Abstract
Short tandem repeats (STRs) are widely present in the human genome. Studies have confirmed that STRs are associated with more than 30 diseases, and they have also been used in forensic identification and paternity testing. However, there are few methods for STR detection based on nanopore sequencing due to the challenges posed by the sequencing principles and the data characteristics of nanopore sequencing. We developed NanoSTR for detection of target STR loci based on the length-number-rank (LNR) information of reads. NanoSTR can be used for STR detection and genotyping based on long-read data from nanopore sequencing with improved accuracy and efficiency compared with other existing methods, such as Tandem-Genotypes and TRiCoLOR. NanoSTR showed 100% concordance with the expected genotypes using error-free simulated data, and also achieved >85% concordance using the standard samples (containing autosomal and Y-chromosomal loci) with MinION sequencing platform, respectively. NanoSTR showed high performance for detection of target STR markers. Although NanoSTR needs further optimization and development, it is useful as an analytical method for the detection of STR loci by nanopore sequencing. This method adds to the toolbox for nanopore-based STR analysis and expands the applications of nanopore sequencing in scientific research and clinical scenarios. The main code and the data are available at https://github.com/langjidong/NanoSTR.
Collapse
|
19
|
Kirov I, Kolganova E, Dudnikov M, Yurkevich OY, Amosova AV, Muravenko OV. A Pipeline NanoTRF as a New Tool for De Novo Satellite DNA Identification in the Raw Nanopore Sequencing Reads of Plant Genomes. PLANTS (BASEL, SWITZERLAND) 2022; 11:2103. [PMID: 36015406 PMCID: PMC9413040 DOI: 10.3390/plants11162103] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 08/08/2022] [Accepted: 08/11/2022] [Indexed: 06/15/2023]
Abstract
High-copy tandemly organized repeats (TRs), or satellite DNA, is an important but still enigmatic component of eukaryotic genomes. TRs comprise arrays of multi-copy and highly similar tandem repeats, which makes the elucidation of TRs a very challenging task. Oxford Nanopore sequencing data provide a valuable source of information on TR organization at the single molecule level. However, bioinformatics tools for de novo identification of TRs in raw Nanopore data have not been reported so far. We developed NanoTRF, a new python pipeline for TR repeat identification, characterization and consensus monomer sequence assembly. This new pipeline requires only a raw Nanopore read file from low-depth (<1×) genome sequencing. The program generates an informative html report and figures on TR genome abundance, monomer sequence and monomer length. In addition, NanoTRF performs annotation of transposable elements (TEs) sequences within or near satDNA arrays, and the information can be used to elucidate how TR−TE co-evolve in the genome. Moreover, we validated by FISH that the NanoTRF report is useful for the evaluation of TR chromosome organization—clustered or dispersed. Our findings showed that NanoTRF is a robust method for the de novo identification of satellite repeats in raw Nanopore data without prior read assembly. The obtained sequences can be used in many downstream analyses including genome assembly assistance and gap estimation, chromosome mapping and cytogenetic marker development.
Collapse
Affiliation(s)
- Ilya Kirov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, Moscow 127550, Russia
- Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Elizaveta Kolganova
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, Moscow 127550, Russia
| | - Maxim Dudnikov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, Moscow 127550, Russia
- Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Olga Yu. Yurkevich
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia
| | - Alexandra V. Amosova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia
| | - Olga V. Muravenko
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia
| |
Collapse
|
20
|
Storer JM, Hubley R, Rosen J, Smit AFA. Methodologies for the De novo Discovery of Transposable Element Families. Genes (Basel) 2022; 13:709. [PMID: 35456515 PMCID: PMC9025800 DOI: 10.3390/genes13040709] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 04/14/2022] [Accepted: 04/15/2022] [Indexed: 02/07/2023] Open
Abstract
The discovery and characterization of transposable element (TE) families are crucial tasks in the process of genome annotation. Careful curation of TE libraries for each organism is necessary as each has been exposed to a unique and often complex set of TE families. De novo methods have been developed; however, a fully automated and accurate approach to the development of complete libraries remains elusive. In this review, we cover established methods and recent developments in de novo TE analysis. We also present various methodologies used to assess these tools and discuss opportunities for further advancement of the field.
Collapse
Affiliation(s)
| | | | | | - Arian F. A. Smit
- Institute for Systems Biology, Seattle, WA 98109, USA; (J.M.S.); (R.H.); (J.R.)
| |
Collapse
|
21
|
Sholes SL, Karimian K, Gershman A, Kelly TJ, Timp W, Greider CW. Chromosome-specific telomere lengths and the minimal functional telomere revealed by nanopore sequencing. Genome Res 2022; 32:616-628. [PMID: 34702734 PMCID: PMC8997346 DOI: 10.1101/gr.275868.121] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Accepted: 10/20/2021] [Indexed: 11/26/2022]
Abstract
We developed a method to tag telomeres and measure telomere length by nanopore sequencing in the yeast S. cerevisiae Nanopore allows long-read sequencing through the telomere, through the subtelomere, and into unique chromosomal sequence, enabling assignment of telomere length to a specific chromosome end. We observed chromosome end-specific telomere lengths that were stable over 120 cell divisions. These stable chromosome-specific telomere lengths may be explained by slow clonal variation or may represent a new biological mechanism that maintains equilibrium unique to each chromosome end. We examined the role of RIF1 and TEL1 in telomere length regulation and found that TEL1 is epistatic to RIF1 at most telomeres, consistent with the literature. However, at telomeres that lack subtelomeric Y' sequences, tel1Δ rif1Δ double mutants had a very small, but significant, increase in telomere length compared with the tel1Δ single mutant, suggesting an influence of Y' elements on telomere length regulation. We sequenced telomeres in a telomerase-null mutant (est2Δ) and found the minimal telomere length to be ∼75 bp. In these est2Δ mutants, there were apparent telomere recombination events at individual telomeres before the generation of survivors, and these events were significantly reduced in est2Δ rad52Δ double mutants. The rate of telomere shortening in the absence of telomerase was similar across all chromosome ends at ∼5 bp per generation. This new method gives quantitative, high-resolution telomere length measurement at each individual chromosome end and suggests possible new biological mechanisms regulating telomere length.
Collapse
Affiliation(s)
- Samantha L Sholes
- Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
- Biochemistry, Cellular and Molecular Biology Graduate Program, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
| | - Kayarash Karimian
- Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
- Biochemistry, Cellular and Molecular Biology Graduate Program, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
| | - Ariel Gershman
- Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
- Biochemistry, Cellular and Molecular Biology Graduate Program, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
| | - Thomas J Kelly
- Program in Molecular Biology, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA
| | - Winston Timp
- Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
| | - Carol W Greider
- Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
- Department of Molecular Cell and Developmental Biology, University of California, Santa Cruz, California 95064, USA
| |
Collapse
|
22
|
Lei Y, Meng Y, Guo X, Ning K, Bian Y, Li L, Hu Z, Anashkina AA, Jiang Q, Dong Y, Zhu X. Overview of structural variation calling: Simulation, identification, and visualization. Comput Biol Med 2022; 145:105534. [DOI: 10.1016/j.compbiomed.2022.105534] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Revised: 04/09/2022] [Accepted: 04/14/2022] [Indexed: 12/11/2022]
|
23
|
Zhang P, Peng H, Llauro C, Bucher E, Mirouze M. ecc_finder: A Robust and Accurate Tool for Detecting Extrachromosomal Circular DNA From Sequencing Data. FRONTIERS IN PLANT SCIENCE 2021; 12:743742. [PMID: 34925397 PMCID: PMC8672306 DOI: 10.3389/fpls.2021.743742] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Accepted: 10/25/2021] [Indexed: 06/06/2023]
Abstract
Extrachromosomal circular DNA (eccDNA) has been observed in different species for decades, and more and more evidence shows that this specific type of DNA molecules may play an important role in rapid adaptation. Therefore, characterizing the full landscape of eccDNA has become critical, and there are several protocols for enriching eccDNAs and performing short-read or long-read sequencing. However, there is currently no available bioinformatic tool to identify eccDNAs from Nanopore reads. More importantly, the current tools based on Illumina short reads lack an efficient standardized pipeline notably to identify eccDNA originating from repeated loci and cannot be applied to very large genomes. Here, we introduce a comprehensive tool to solve both of these two issues. Applying ecc_finder to eccDNA-seq data (either mobilome-seq, Circle-Seq and CIDER-seq) from Arabidopsis, human, and wheat (with genome sizes ranging from 120Mb to 17 Gb), we document the improvement of computational time, sensitivity, and accuracy and demonstrate ecc_finder wide applicability and functionality.
Collapse
Affiliation(s)
- Panpan Zhang
- Institut de Recherche pour le Développement (IRD), Montpellier, France
- Laboratory of Plant Genome and Development, University of Perpignan, Perpignan, France
| | - Haoran Peng
- Crop Genome Dynamics Group, Agroscope Changins, Nyon, Switzerland
- Department of Botany and Plant Biology, Section of Biology, Faculty of Science, University of Geneva, Geneva, Switzerland
| | - Christel Llauro
- Laboratory of Plant Genome and Development, University of Perpignan, Perpignan, France
- Laboratory of Plant Genome and Development, Centre National de la Recherche Scientifique (CNRS), Perpignan, France
| | - Etienne Bucher
- Crop Genome Dynamics Group, Agroscope Changins, Nyon, Switzerland
| | - Marie Mirouze
- Institut de Recherche pour le Développement (IRD), Montpellier, France
- Laboratory of Plant Genome and Development, University of Perpignan, Perpignan, France
| |
Collapse
|
24
|
Nanopore sequencing of tomato mottle leaf distortion virus, a new bipartite begomovirus infecting tomato in Brazil. Arch Virol 2021; 166:3217-3220. [PMID: 34498121 DOI: 10.1007/s00705-021-05220-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 07/13/2021] [Indexed: 10/20/2022]
Abstract
During a survey in a tomato field in Luziânia (Goiás State, Brazil), a single plant with mottling, chlorotic spots, and leaf distortion was found. A new bipartite begomovirus sequence was identified using nanopore sequence technology and confirmed by Sanger sequencing. The highest nucleotide sequence identity match of the DNA-A component (2596 bases) was 81.64% with tomato golden leaf deformation virus (HM357456). Due to the current species demarcation criterion of 91% nucleotide sequence identity for DNA-A, we propose this virus to be a new member of the genus Begomovirus, named "tomato mottle leaf distortion virus".
Collapse
|
25
|
Gao Y, Liu Y, Ma Y, Liu B, Wang Y, Xing Y. abPOA: an SIMD-based C library for fast partial order alignment using adaptive band. Bioinformatics 2021; 37:2209-2211. [PMID: 33165528 DOI: 10.1093/bioinformatics/btaa963] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 10/05/2020] [Accepted: 11/03/2020] [Indexed: 11/13/2022] Open
Abstract
SUMMARY Partial order alignment, which aligns a sequence to a directed acyclic graph, is now frequently used as a key component in long-read error correction and assembly. We present abPOA (adaptive banded Partial Order Alignment), a Single Instruction Multiple Data (SIMD)-based C library for fast partial order alignment using adaptive banded dynamic programming. It can work as a stand-alone multiple sequence alignment and consensus calling tool or be easily integrated into any long-read error correction and assembly workflow. Compared to a state-of-the-art tool (SPOA), abPOA is up to 10 times faster with a comparable alignment accuracy. AVAILABILITY AND IMPLEMENTATION abPOA is implemented in C. A stand-alone tool and a C/Python software interface are freely available at https://github.com/yangao07/abPOA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yan Gao
- .,Center for Computational and Genomic Medicine, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | | | | | | | | | - Yi Xing
- Center for Computational and Genomic Medicine, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.,Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
26
|
Morishita S, Ichikawa K, Myers EW. Finding long tandem repeats in long noisy reads. Bioinformatics 2021; 37:612-621. [PMID: 33031558 PMCID: PMC8097686 DOI: 10.1093/bioinformatics/btaa865] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 09/07/2020] [Accepted: 09/23/2020] [Indexed: 11/13/2022] Open
Abstract
Motivation Long tandem repeat expansions of more than 1000 nt have been suggested to be associated with diseases, but remain largely unexplored in individual human genomes because read lengths have been too short. However, new long-read sequencing technologies can produce single reads of 10 000 nt or more that can span such repeat expansions, although these long reads have high error rates, of 10–20%, which complicates the detection of repetitive elements. Moreover, most traditional algorithms for finding tandem repeats are designed to find short tandem repeats (<1000 nt) and cannot effectively handle the high error rate of long reads in a reasonable amount of time. Results Here, we report an efficient algorithm for solving this problem that takes advantage of the length of the repeat. Namely, a long tandem repeat has hundreds or thousands of approximate copies of the repeated unit, so despite the error rate, many short k-mers will be error-free in many copies of the unit. We exploited this characteristic to develop a method for first estimating regions that could contain a tandem repeat, by analyzing the k-mer frequency distributions of fixed-size windows across the target read, followed by an algorithm that assembles the k-mers of a putative region into the consensus repeat unit by greedily traversing a de Bruijn graph. Experimental results indicated that the proposed algorithm largely outperformed Tandem Repeats Finder, a widely used program for finding tandem repeats, in terms of sensitivity. Availability and implementation https://github.com/morisUtokyo/mTR.
Collapse
Affiliation(s)
- Shinichi Morishita
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8562, Japan
| | - Kazuki Ichikawa
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8562, Japan
| | - Eugene W Myers
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Saxony 01307, Germany.,Center for Systems Biology Dresden, Dresden, Saxony 01307, Germany
| |
Collapse
|
27
|
Bolognini D, Magi A, Benes V, Korbel JO, Rausch T. TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data. Gigascience 2020; 9:giaa101. [PMID: 33034633 PMCID: PMC7539535 DOI: 10.1093/gigascience/giaa101] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 08/07/2020] [Accepted: 09/07/2020] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Tandem repeat sequences are widespread in the human genome, and their expansions cause multiple repeat-mediated disorders. Genome-wide discovery approaches are needed to fully elucidate their roles in health and disease, but resolving tandem repeat variation accurately remains a challenging task. While traditional mapping-based approaches using short-read data have severe limitations in the size and type of tandem repeats they can resolve, recent third-generation sequencing technologies exhibit substantially higher sequencing error rates, which complicates repeat resolution. RESULTS We developed TRiCoLOR, a freely available tool for tandem repeat profiling using error-prone long reads from third-generation sequencing technologies. The method can identify repetitive regions in sequencing data without a prior knowledge of their motifs or locations and resolve repeat multiplicity and period size in a haplotype-specific manner. The tool includes methods to interactively visualize the identified repeats and to trace their Mendelian consistency in pedigrees. CONCLUSIONS TRiCoLOR demonstrates excellent performance and improved sensitivity and specificity compared with alternative tools on synthetic data. For real human whole-genome sequencing data, TRiCoLOR achieves high validation rates, suggesting its suitability to identify tandem repeat variation in personal genomes.
Collapse
Affiliation(s)
- Davide Bolognini
- Department of Experimental and Clinical Medicine, University of Florence, Viale Pieraccini 6, Florence 50134, Italy
- European Molecular Biology Laboratory (EMBL), GeneCore, Meyerhofstraße 1, Heidelberg 69117, Germany
| | - Alberto Magi
- Department of Information Engineering, University of Florence, Via di S. Marta 3, Florence 50134, Italy
| | - Vladimir Benes
- European Molecular Biology Laboratory (EMBL), GeneCore, Meyerhofstraße 1, Heidelberg 69117, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, Heidelberg 69117, Germany
| | - Tobias Rausch
- European Molecular Biology Laboratory (EMBL), GeneCore, Meyerhofstraße 1, Heidelberg 69117, Germany
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, Heidelberg 69117, Germany
| |
Collapse
|
28
|
Bracewell R, Chatla K, Nalley MJ, Bachtrog D. Dynamic turnover of centromeres drives karyotype evolution in Drosophila. eLife 2019; 8:e49002. [PMID: 31524597 PMCID: PMC6795482 DOI: 10.7554/elife.49002] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Accepted: 09/12/2019] [Indexed: 12/21/2022] Open
Abstract
Centromeres are the basic unit for chromosome inheritance, but their evolutionary dynamics is poorly understood. We generate high-quality reference genomes for multiple Drosophila obscura group species to reconstruct karyotype evolution. All chromosomes in this lineage were ancestrally telocentric and the creation of metacentric chromosomes in some species was driven by de novo seeding of new centromeres at ancestrally gene-rich regions, independently of chromosomal rearrangements. The emergence of centromeres resulted in a drastic size increase due to repeat accumulation, and dozens of genes previously located in euchromatin are now embedded in pericentromeric heterochromatin. Metacentric chromosomes secondarily became telocentric in the pseudoobscura subgroup through centromere repositioning and a pericentric inversion. The former (peri)centric sequences left behind shrunk dramatically in size after their inactivation, yet contain remnants of their evolutionary past, including increased repeat-content and heterochromatic environment. Centromere movements are accompanied by rapid turnover of the major satellite DNA detected in (peri)centromeric regions.
Collapse
Affiliation(s)
- Ryan Bracewell
- Department of Integrative BiologyUniversity of California, BerkeleyBerkeleyUnited States
| | - Kamalakar Chatla
- Department of Integrative BiologyUniversity of California, BerkeleyBerkeleyUnited States
| | - Matthew J Nalley
- Department of Integrative BiologyUniversity of California, BerkeleyBerkeleyUnited States
| | - Doris Bachtrog
- Department of Integrative BiologyUniversity of California, BerkeleyBerkeleyUnited States
| |
Collapse
|