1
|
López-Cortegano E, Chebib J, Jonas A, Vock A, Künzel S, Keightley PD, Tautz D. The rate and spectrum of new mutations in mice inferred by long-read sequencing. Genome Res 2025; 35:43-54. [PMID: 39622636 PMCID: PMC11789640 DOI: 10.1101/gr.279982.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Accepted: 11/26/2024] [Indexed: 01/12/2025]
Abstract
All forms of genetic variation originate from new mutations, making it crucial to understand their rates and mechanisms. Here, we use long-read sequencing from Pacific Biosciences (PacBio) to investigate de novo mutations that accumulated in 12 inbred mouse lines derived from three commonly used inbred strains (C3H, C57BL/6, and FVB) maintained for 8 to 15 generations in a mutation accumulation (MA) experiment. We built chromosome-level genome assemblies based on the MA line founders' genomes and then employed a combination of read and assembly-based methods to call the complete spectrum of new mutations. On average, there are about 45 mutations per haploid genome per generation, about half of which (54%) are insertions and deletions shorter than 50 bp (indels). The remainder are single-nucleotide mutations (SNMs; 44%) and large structural mutations (SMs; 2%). We found that the degree of DNA repetitiveness is positively correlated with SNM and indel rates and that a substantial fraction of SMs can be explained by homology-dependent mechanisms associated with repeat sequences. Most (90%) indels can be attributed to microsatellite contractions and expansions, and there is a marked bias toward 4 bp indels. Among the different types of SMs, tandem repeat mutations have the highest mutation rate, followed by insertions of transposable elements (TEs). We uncover a rich landscape of active TEs, notable differences in their spectrum among MA lines and strains, and a high rate of gene retroposition. Our study offers novel insights into mammalian genome evolution and highlights the importance of repetitive elements in shaping genomic diversity.
Collapse
Affiliation(s)
- Eugenio López-Cortegano
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh EH9 3FL, United Kingdom;
| | - Jobran Chebib
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh EH9 3FL, United Kingdom
| | - Anika Jonas
- Department for Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, 24306 Plön, Germany
| | - Anastasia Vock
- Department for Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, 24306 Plön, Germany
| | - Sven Künzel
- Department for Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, 24306 Plön, Germany
| | - Peter D Keightley
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh EH9 3FL, United Kingdom
| | - Diethard Tautz
- Department for Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, 24306 Plön, Germany
| |
Collapse
|
2
|
Luo LY, Wu H, Zhao LM, Zhang YH, Huang JH, Liu QY, Wang HT, Mo DX, EEr HH, Zhang LQ, Chen HL, Jia SG, Wang WM, Li MH. Telomere-to-telomere sheep genome assembly identifies variants associated with wool fineness. Nat Genet 2025; 57:218-230. [PMID: 39779954 DOI: 10.1038/s41588-024-02037-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 11/19/2024] [Indexed: 01/11/2025]
Abstract
Ongoing efforts to improve sheep reference genome assemblies still leave many gaps and incomplete regions, resulting in a few common failures and errors in genomic studies. Here, we report a 2.85-Gb gap-free telomere-to-telomere genome of a ram (T2T-sheep1.0), including all autosomes and the X and Y chromosomes. This genome adds 220.05 Mb of previously unresolved regions and 754 new genes to the most updated reference assembly ARS-UI_Ramb_v3.0; it contains four types of repeat units (SatI, SatII, SatIII and CenY) in centromeric regions. T2T-sheep1.0 has a base accuracy of more than 99.999%, corrects several structural errors in previous reference assemblies and improves structural variant detection in repetitive sequences. Alignment of whole-genome short-read sequences of global domestic and wild sheep against T2T-sheep1.0 identifies 2,664,979 new single-nucleotide polymorphisms in previously unresolved regions, which improves the population genetic analyses and detection of selective signals for domestication (for example, ABCC4) and wool fineness (for example, FOXQ1).
Collapse
Affiliation(s)
- Ling-Yun Luo
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Hui Wu
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Li-Ming Zhao
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems; Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs; Engineering Research Center of Grassland Industry, Ministry of Education; College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou, China
| | - Ya-Hui Zhang
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Jia-Hui Huang
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Qiu-Yue Liu
- Institute of Genetics and Developmental Biology, The Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing, China
| | - Hai-Tao Wang
- Institute of Genetics and Developmental Biology, The Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing, China
| | - Dong-Xin Mo
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - He-Hua EEr
- Institute of Animal Science, Ningxia Academy of Agriculture and Forestry Sciences, Yinchuan, China
| | - Lian-Quan Zhang
- Ningxia Shuomuyanchi Tan Sheep Breeding Co. Ltd., Wuzhong, China
| | | | - Shan-Gang Jia
- College of Grassland Science and Technology, China Agricultural University, Beijing, China.
| | - Wei-Min Wang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems; Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs; Engineering Research Center of Grassland Industry, Ministry of Education; College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou, China.
| | - Meng-Hua Li
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, China.
| |
Collapse
|
3
|
Rakheja I, Bharti V, Sahana S, Das PK, Ranjan G, Kumar A, Jain N, Maiti S. Development of an In Silico Platform (TRIPinRNA) for the Identification of Novel RNA Intramolecular Triple Helices and Their Validation Using Biophysical Techniques. Biochemistry 2024. [PMID: 39668452 DOI: 10.1021/acs.biochem.4c00334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2024]
Abstract
There are surprisingly few RNA intramolecular triple helices known in the human transcriptome. The structure has been most well-studied as a stability-element at the 3' end of lncRNAs such as MALAT1 and NEAT1, but the intrigue remains whether it is indeed as rare as it is understood to be or just waiting for a closer look from a new vantage point. TRIPinRNA, our Python-based in silico platform, allows for a comprehensive sequence-pattern search for potential triplex formation in the human transcriptome─noncoding as well as coding. Using this tool, we report the putative occurrence of homopyrimidine type (canonical) triple helices as well as heteropurine-pyrimidine strand type (noncanonical) triple helices in the human transcriptome and validate the formation of both types of triplexes using biophysical approaches. We find that the occurrence of triplex structures has a strong correlation with local GC content, which might be influencing their formation. By employing a search that encompasses both canonical and noncanonical triplex structures across the human transcriptome, this study enriches the understanding of RNA biology. Lastly, TRIPinRNA can be utilized in finding triplex structures for any organism with an annotated transcriptome.
Collapse
Affiliation(s)
- Isha Rakheja
- CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Vishal Bharti
- CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
| | - S Sahana
- CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Prosad Kumar Das
- CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
| | - Gyan Ranjan
- CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Ajit Kumar
- CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Niyati Jain
- CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
| | - Souvik Maiti
- CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
- Institute of Genomics and Integrative Biology (IGIB)-National Chemical Laboratory (NCL) Joint Center, Council of Scientific and Industrial Research-NCL, Pune 411008, India
| |
Collapse
|
4
|
Wu H, Luo LY, Zhang YH, Zhang CY, Huang JH, Mo DX, Zhao LM, Wang ZX, Wang YC, He-Hua EE, Bai WL, Han D, Dou XT, Ren YL, Dingkao R, Chen HL, Ye Y, Du HD, Zhao ZQ, Wang XJ, Jia SG, Liu ZH, Li MH. Telomere-to-telomere genome assembly of a male goat reveals variants associated with cashmere traits. Nat Commun 2024; 15:10041. [PMID: 39567477 PMCID: PMC11579321 DOI: 10.1038/s41467-024-54188-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 10/30/2024] [Indexed: 11/22/2024] Open
Abstract
A complete goat (Capra hircus) reference genome enhances analyses of genetic variation, thus providing insights into domestication and selection in goats and related species. Here, we assemble a telomere-to-telomere (T2T) gap-free genome (2.86 Gb) from a cashmere goat (T2T-goat1.0), including a Y chromosome of 20.96 Mb. With a base accuracy of >99.999%, T2T-goat1.0 corrects numerous genome-wide structural and base errors in previous assemblies and adds 288.5 Mb of previously unresolved regions and 446 newly assembled genes to the reference genome. We sequence the genomes of five representative goat breeds for PacBio reads, and use T2T-goat1.0 as a reference to identify a total of 63,417 structural variations (SVs) with up to 4711 (7.42%) in the previously unresolved regions. T2T-goat1.0 was applied in population analyses of global wild and domestic goats, which revealed 32,419 SVs and 25,397,794 SNPs, including 870 SVs and 545,026 SNPs in the previously unresolved regions. Also, our analyses reveal a set of selective variants and genes associated with domestication (e.g., NKG2D and ABCC4) and cashmere traits (e.g., ABCC4 and ASIP).
Collapse
Affiliation(s)
- Hui Wu
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
- Northern Agriculture and Animal Husbandry Technical Innovation Center, Chinese Academy of Agricultural Sciences, Hohhot, China
| | - Ling-Yun Luo
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Ya-Hui Zhang
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Chong-Yan Zhang
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Jia-Hui Huang
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Dong-Xin Mo
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Li-Ming Zhao
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou, China
| | - Zhi-Xin Wang
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
| | - Yi-Chuan Wang
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China
| | - EEr He-Hua
- Institute of Animal Science, NingXia Academy of Agriculture and Forestry Sciences, Yinchuan, China
| | - Wen-Lin Bai
- College of Animal Science and Veterinary Medicine, Shenyang Agricultural University, Shenyang, China
| | - Di Han
- Modern Agricultural Production Base Construction Engineering Center of Liaoning Province, Liaoyang, China
| | - Xing-Tang Dou
- Liaoning Province Liaoning Cashmere Goat Original Breeding Farm Co., Ltd., Liaoyang, China
| | - Yan-Ling Ren
- Shandong Binzhou Academy of Animal Science and Veterinary Medicine, Binzhou, China
| | | | | | - Yong Ye
- Zhongwei Goat Breeding Center of Ningxia Province, Zhongwei, China
| | - Hai-Dong Du
- Zhongwei Goat Breeding Center of Ningxia Province, Zhongwei, China
| | - Zhan-Qiang Zhao
- Zhongwei Goat Breeding Center of Ningxia Province, Zhongwei, China
| | - Xi-Jun Wang
- Jiaxiang Animal Husbandry and Veterinary Development Center, Jining, China
| | - Shan-Gang Jia
- College of Grassland Science and Technology, China Agricultural University, Beijing, China.
| | - Zhi-Hong Liu
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, China.
| | - Meng-Hua Li
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
| |
Collapse
|
5
|
Sandler G, Agrawal AF, Wright SI. Population Genomics of the Facultatively Sexual Liverwort Marchantia polymorpha. Genome Biol Evol 2023; 15:evad196. [PMID: 37883717 PMCID: PMC10667032 DOI: 10.1093/gbe/evad196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 10/15/2023] [Accepted: 10/18/2023] [Indexed: 10/28/2023] Open
Abstract
The population genomics of facultatively sexual organisms are understudied compared with their abundance across the tree of life. We explore patterns of genetic diversity in two subspecies of the facultatively sexual liverwort Marchantia polymorpha using samples from across Southern Ontario, Canada. Despite the ease with which M. polymorpha should be able to propagate asexually, we find no evidence of strictly clonal descent among our samples and little to no signal of isolation by distance. Patterns of identity-by-descent tract sharing further showed evidence of recent recombination and close relatedness between geographically distant isolates, suggesting long distance gene flow and at least a modest frequency of sexual reproduction. However, the M. polymorpha genome contains overall very low levels of nucleotide diversity and signs of inefficient selection evidenced by a relatively high fraction of segregating deleterious variants. We interpret these patterns as possible evidence of the action of linked selection and a small effective population size due to past generations of asexual propagation. Overall, the M. polymorpha genome harbors signals of a complex history of both sexual and asexual reproduction.
Collapse
Affiliation(s)
- George Sandler
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | - Aneil F Agrawal
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
- Center for Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario, Canada
| | - Stephen I Wright
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
- Center for Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
6
|
Orlov YL, Orlova NG. Bioinformatics tools for the sequence complexity estimates. Biophys Rev 2023; 15:1367-1378. [PMID: 37974990 PMCID: PMC10643780 DOI: 10.1007/s12551-023-01140-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 09/01/2023] [Indexed: 11/19/2023] Open
Abstract
We review current methods and bioinformatics tools for the text complexity estimates (information and entropy measures). The search DNA regions with extreme statistical characteristics such as low complexity regions are important for biophysical models of chromosome function and gene transcription regulation in genome scale. We discuss the complexity profiling for segmentation and delineation of genome sequences, search for genome repeats and transposable elements, and applications to next-generation sequencing reads. We review the complexity methods and new applications fields: analysis of mutation hotspots loci, analysis of short sequencing reads with quality control, and alignment-free genome comparisons. The algorithms implementing various numerical measures of text complexity estimates including combinatorial and linguistic measures have been developed before genome sequencing era. The series of tools to estimate sequence complexity use compression approaches, mainly by modification of Lempel-Ziv compression. Most of the tools are available online providing large-scale service for whole genome analysis. Novel machine learning applications for classification of complete genome sequences also include sequence compression and complexity algorithms. We present comparison of the complexity methods on the different sequence sets, the applications for gene transcription regulatory regions analysis. Furthermore, we discuss approaches and application of sequence complexity for proteins. The complexity measures for amino acid sequences could be calculated by the same entropy and compression-based algorithms. But the functional and evolutionary roles of low complexity regions in protein have specific features differing from DNA. The tools for protein sequence complexity aimed for protein structural constraints. It was shown that low complexity regions in protein sequences are conservative in evolution and have important biological and structural functions. Finally, we summarize recent findings in large scale genome complexity comparison and applications for coronavirus genome analysis.
Collapse
Affiliation(s)
- Yuriy L. Orlov
- The Digital Health Institute, I.M. Sechenov First Moscow State Medical University of the Russian Ministry of Health (Sechenov University), Moscow, 119991 Russia
- Institute of Cytology and Genetics SB RAS, 630090 Novosibirsk, Russia
- Agrarian and Technological Institute, Peoples’ Friendship University of Russia, 117198 Moscow, Russia
| | - Nina G. Orlova
- Department of Mathematics, Financial University under the Government of the Russian Federation, Moscow, 125167 Russia
| |
Collapse
|
7
|
Shi X, Teng H, Sun Z. An updated overview of experimental and computational approaches to identify non-canonical DNA/RNA structures with emphasis on G-quadruplexes and R-loops. Brief Bioinform 2022; 23:bbac441. [PMID: 36208174 PMCID: PMC9677470 DOI: 10.1093/bib/bbac441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Revised: 08/22/2022] [Accepted: 09/13/2022] [Indexed: 12/14/2022] Open
Abstract
Multiple types of non-canonical nucleic acid structures play essential roles in DNA recombination and replication, transcription, and genomic instability and have been associated with several human diseases. Thus, an increasing number of experimental and bioinformatics methods have been developed to identify these structures. To date, most reviews have focused on the features of non-canonical DNA/RNA structure formation, experimental approaches to mapping these structures, and the association of these structures with diseases. In addition, two reviews of computational algorithms for the prediction of non-canonical nucleic acid structures have been published. One of these reviews focused only on computational approaches for G4 detection until 2020. The other mainly summarized the computational tools for predicting cruciform, H-DNA and Z-DNA, in which the algorithms discussed were published before 2012. Since then, several experimental and computational methods have been developed. However, a systematic review including the conformation, sequencing mapping methods and computational prediction strategies for these structures has not yet been published. The purpose of this review is to provide an updated overview of conformation, current sequencing technologies and computational identification methods for non-canonical nucleic acid structures, as well as their strengths and weaknesses. We expect that this review will aid in understanding how these structures are characterised and how they contribute to related biological processes and diseases.
Collapse
Affiliation(s)
- Xiaohui Shi
- Key Laboratory of Clinical Laboratory Diagnosis and Translational Research of Zhejiang Province, The first Affiliated Hospital of WMU; Beijing Institutes of Life Science, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Ouhai District, Wenzhou 325000, China
| | - Huajing Teng
- Department of Radiation Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education) at Peking University Cancer Hospital and Institute, Ouhai District, Wenzhou 325000, China
| | - Zhongsheng Sun
- Key Laboratory of Clinical Laboratory Diagnosis and Translational Research of Zhejiang Province, The first Affiliated Hospital of WMU; Beijing Institutes of Life Science, Chinese Academy of Sciences; CAS Center for Excellence in Biotic Interactions and State Key Laboratory of Integrated Management of Pest Insects and Rodents, University of Chinese Academy of Sciences; Institute of Genomic Medicine, Wenzhou Medical University; IBMC-BGI Center, the Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital); Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, Ouhai District, Wenzhou 325000, China
| |
Collapse
|
8
|
Bhardwaj V, Yadav D, Dhankhar M, Saini K. A novel approach for identification of mirror repeats within the Engrailed Homeobox-1 gene of Xenopus tropicalis. BIOMEDICAL AND BIOTECHNOLOGY RESEARCH JOURNAL (BBRJ) 2022. [DOI: 10.4103/bbrj.bbrj_281_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
9
|
López-Cortegano E, Craig RJ, Chebib J, Samuels T, Morgan AD, Kraemer SA, Böndel KB, Ness RW, Colegrave N, Keightley PD. De Novo Mutation Rate Variation and Its Determinants in Chlamydomonas. Mol Biol Evol 2021; 38:3709-3723. [PMID: 33950243 PMCID: PMC8383909 DOI: 10.1093/molbev/msab140] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
De novo mutations are central for evolution, since they provide the raw material for natural selection by regenerating genetic variation. However, studying de novo mutations is challenging and is generally restricted to model species, so we have a limited understanding of the evolution of the mutation rate and spectrum between closely related species. Here, we present a mutation accumulation (MA) experiment to study de novo mutation in the unicellular green alga Chlamydomonas incerta and perform comparative analyses with its closest known relative, Chlamydomonas reinhardtii. Using whole-genome sequencing data, we estimate that the median single nucleotide mutation (SNM) rate in C. incerta is μ = 7.6 × 10-10, and is highly variable between MA lines, ranging from μ = 0.35 × 10-10 to μ = 131.7 × 10-10. The SNM rate is strongly positively correlated with the mutation rate for insertions and deletions between lines (r > 0.97). We infer that the genomic factors associated with variation in the mutation rate are similar to those in C. reinhardtii, allowing for cross-prediction between species. Among these genomic factors, sequence context and complexity are more important than GC content. With the exception of a remarkably high C→T bias, the SNM spectrum differs markedly between the two Chlamydomonas species. Our results suggest that similar genomic and biological characteristics may result in a similar mutation rate in the two species, whereas the SNM spectrum has more freedom to diverge.
Collapse
Affiliation(s)
- Eugenio López-Cortegano
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Rory J Craig
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Jobran Chebib
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Toby Samuels
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Andrew D Morgan
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | | | - Katharina B Böndel
- Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, Stuttgart, Germany
| | - Rob W Ness
- Department of Biology, University of Toronto Mississauga, Mississauga, ON, Canada
| | - Nick Colegrave
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Peter D Keightley
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
10
|
Lavezzo E, Berselli M, Frasson I, Perrone R, Palù G, Brazzale AR, Richter SN, Toppo S. G-quadruplex forming sequences in the genome of all known human viruses: A comprehensive guide. PLoS Comput Biol 2018; 14:e1006675. [PMID: 30543627 PMCID: PMC6307822 DOI: 10.1371/journal.pcbi.1006675] [Citation(s) in RCA: 96] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Revised: 12/27/2018] [Accepted: 11/27/2018] [Indexed: 12/21/2022] Open
Abstract
G-quadruplexes are non-canonical nucleic-acid structures that control transcription, replication, and recombination in organisms. G-quadruplexes are present in eukaryotes, prokaryotes, and viruses. In the latter, mounting evidence indicates their key biological activity. Since data on viruses are scattered, we here present a comprehensive analysis of potential quadruplex-forming sequences (PQS) in the genome of all known viruses that can infect humans. We show that occurrence and location of PQSs are features characteristic of each virus class and family. Our statistical analysis proves that their presence within the viral genome is orderly arranged, as indicated by the possibility to correctly assign up to two-thirds of viruses to their exact class based on the PQS classification. For each virus we provide: i) the list of all PQS present in the genome (positive and negative strands), ii) their position in the viral genome, iii) the degree of conservation among strains of each PQS in its genome context, iv) the statistical significance of PQS abundance. This information is accessible from a database to allow the easy navigation of the results: http://www.medcomp.medicina.unipd.it/main_site/doku.php?id=g4virus. The availability of these data will greatly expedite research on G-quadruplex in viruses, with the possibility to accelerate finding therapeutic opportunities to numerous and some fearsome human diseases.
Collapse
Affiliation(s)
- Enrico Lavezzo
- Department of Molecular Medicine, University of Padova, Padova, Italy
| | - Michele Berselli
- Department of Molecular Medicine, University of Padova, Padova, Italy
| | - Ilaria Frasson
- Department of Molecular Medicine, University of Padova, Padova, Italy
| | - Rosalba Perrone
- Department of Molecular Medicine, University of Padova, Padova, Italy
| | - Giorgio Palù
- Department of Molecular Medicine, University of Padova, Padova, Italy
| | | | - Sara N. Richter
- Department of Molecular Medicine, University of Padova, Padova, Italy
| | - Stefano Toppo
- Department of Molecular Medicine, University of Padova, Padova, Italy
| |
Collapse
|