1
|
Parl FF. Analysis of CENP-B Boxes as Anchor of Kinetochores in Centromeres of Human Chromosomes. Bioinform Biol Insights 2024; 18:11779322241248913. [PMID: 38690324 PMCID: PMC11060027 DOI: 10.1177/11779322241248913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 04/04/2024] [Indexed: 05/02/2024] Open
Abstract
The kinetochore is a multiprotein structure that attaches at one end to DNA in the centromere and at the other end to microtubules in the mitotic spindle. By connecting centromere and spindle, the kinetochore controls the migration of chromosomes during cell division. The exact position where the kinetochore assembles on each centromere was uncertain because large sections of centromeric DNA had not been sequenced due to highly repetitive alpha-satellite arrays. Embedded in the arrays is a 17 bp consensus sequence, the so-called CENP-B box, which binds the CENP-B protein, the only protein that binds directly to centromeric DNA. Recently, the Telomere-to-Telomere Consortium published the complete centromeric DNA sequences of all chromosomes including their epigenetic modifications in the T2T-CHM13 map. I used data from the T2T-CHM13 map to locate the CENP-B boxes in the centromeres as anchor of kinetochores. Most of the CENP-B boxes in centromeric DNA are methylated with the exception of the so-called centromere dip region (CDR), where CENP-B protein dimers bind to adjacent unmethylated CENP-B boxes and interact with CENP-A and CENP-C proteins to assemble the kinetochore. The centromeres of all chromosomes combined have a size of 407 Mb of which the kinetochores account for 5.0 Mb or 1.2%. There is no correlation between centromere and kinetochore size (P = .77). While the number of CENP-B boxes varies 4-fold between chromosomes, their density (number/Kb) varies less than 2-fold with a mean of 2.61 ± 0.33. The narrow range ensures a uniform pull of the spindle on the centromeres. I illustrate the findings in a model of the human kinetochore anchored at unmethylated CENP-B boxes in the CDR and present circos plots of chromosomes to show the location of kinetochores in their respective centromeres.
Collapse
Affiliation(s)
- Fritz F Parl
- Department of Pathology, Microbiology, and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
2
|
Glunčić M, Vlahović I, Rosandić M, Paar V. Novel Concept of Alpha Satellite Cascading Higher-Order Repeats (HORs) and Precise Identification of 15mer and 20mer Cascading HORs in Complete T2T-CHM13 Assembly of Human Chromosome 15. Int J Mol Sci 2024; 25:4395. [PMID: 38673983 PMCID: PMC11050224 DOI: 10.3390/ijms25084395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 04/08/2024] [Accepted: 04/11/2024] [Indexed: 04/28/2024] Open
Abstract
Unraveling the intricate centromere structure of human chromosomes holds profound implications, illuminating fundamental genetic mechanisms and potentially advancing our comprehension of genetic disorders and therapeutic interventions. This study rigorously identified and structurally analyzed alpha satellite higher-order repeats (HORs) within the centromere of human chromosome 15 in the complete T2T-CHM13 assembly using the high-precision GRM2023 algorithm. The most extensive alpha satellite HOR array in chromosome 15 reveals a novel cascading HOR, housing 429 15mer HOR copies, containing 4-, 7- and 11-monomer subfragments. Within each row of cascading HORs, all alpha satellite monomers are of distinct types, as in regular Willard's HORs. However, different HOR copies within the same cascading 15mer HOR contain more than one monomer of the same type. Each canonical 15mer HOR copy comprises 15 monomers belonging to only 9 different monomer types. Notably, 65% of the 429 15mer cascading HOR copies exhibit canonical structures, while 35% display variant configurations. Identified as the second most extensive alpha satellite HOR, another novel cascading HOR within human chromosome 15 encompasses 164 20mer HOR copies, each featuring two subfragments. Moreover, a distinct pattern emerges as interspersed 25mer/26mer structures differing from regular Willard's HORs and giving rise to a 34-monomer subfragment. Only a minor 18mer HOR array of 12 HOR copies is of the regular Willard's type. These revelations highlight the complexity within the chromosome 15 centromeric region, accentuating deviations from anticipated highly regular patterns and hinting at profound information encoding and functional potential within the human centromere.
Collapse
Affiliation(s)
- Matko Glunčić
- Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia;
| | - Ines Vlahović
- Algebra LAB, Algebra University College, 10000 Zagreb, Croatia;
| | - Marija Rosandić
- Department of Internal Medicine, University Hospital Centre Zagreb, 10000 Zagreb, Croatia;
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| | - Vladimir Paar
- Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia;
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| |
Collapse
|
3
|
Wang B, Jia Y, Dang N, Yu J, Bush SJ, Gao S, He W, Wang S, Guo H, Yang X, Ma W, Ye K. Near telomere-to-telomere genome assemblies of two Chlorella species unveil the composition and evolution of centromeres in green algae. BMC Genomics 2024; 25:356. [PMID: 38600443 PMCID: PMC11005252 DOI: 10.1186/s12864-024-10280-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 04/02/2024] [Indexed: 04/12/2024] Open
Abstract
BACKGROUND Centromeres play a crucial and conserved role in cell division, although their composition and evolutionary history in green algae, the evolutionary ancestors of land plants, remains largely unknown. RESULTS We constructed near telomere-to-telomere (T2T) assemblies for two Trebouxiophyceae species, Chlorella sorokiniana NS4-2 and Chlorella pyrenoidosa DBH, with chromosome numbers of 12 and 13, and genome sizes of 58.11 Mb and 53.41 Mb, respectively. We identified and validated their centromere sequences using CENH3 ChIP-seq and found that, similar to humans and higher plants, the centromeric CENH3 signals of green algae display a pattern of hypomethylation. Interestingly, the centromeres of both species largely comprised transposable elements, although they differed significantly in their composition. Species within the Chlorella genus display a more diverse centromere composition, with major constituents including members of the LTR/Copia, LINE/L1, and LINE/RTEX families. This is in contrast to green algae including Chlamydomonas reinhardtii, Coccomyxa subellipsoidea, and Chromochloris zofingiensis, in which centromere composition instead has a pronounced single-element composition. Moreover, we observed significant differences in the composition and structure of centromeres among chromosomes with strong collinearity within the Chlorella genus, suggesting that centromeric sequence evolves more rapidly than sequence in non-centromeric regions. CONCLUSIONS This study not only provides high-quality genome data for comparative genomics of green algae but gives insight into the composition and evolutionary history of centromeres in early plants, laying an important foundation for further research on their evolution.
Collapse
Affiliation(s)
- Bo Wang
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Yanyan Jia
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Ningxin Dang
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
| | - Jie Yu
- College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Stephen J Bush
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Shenghan Gao
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Wenxi He
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Sirui Wang
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Hongtao Guo
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Xiaofei Yang
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Weimin Ma
- College of Life Sciences, Shanghai Normal University, Shanghai, China.
| | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China.
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China.
- Faculty of Science, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
4
|
Volpe E, Corda L, Tommaso ED, Pelliccia F, Ottalevi R, Licastro D, Guarracino A, Capulli M, Formenti G, Tassone E, Giunta S. The complete diploid reference genome of RPE-1 identifies human phased epigenetic landscapes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.01.565049. [PMID: 38168337 PMCID: PMC10760208 DOI: 10.1101/2023.11.01.565049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Comparative analysis of recent human genome assemblies highlights profound sequence divergence that peaks within polymorphic loci such as centromeres. This raises the question about the adequacy of relying on human reference genomes to accurately analyze sequencing data derived from experimental cell lines. Here, we generated the complete diploid genome assembly for the human retinal epithelial cells (RPE-1), a widely used non-cancer laboratory cell line with a stable karyotype, to use as matched reference for multi-omics sequencing data analysis. Our RPE1v1.0 assembly presents completely phased haplotypes and chromosome-level scaffolds that span centromeres with ultra-high base accuracy (>QV60). We mapped the haplotype-specific genomic variation specific to this cell line including t(Xq;10q), a stable 73.18 Mb duplication of chromosome 10 translocated onto the microdeleted chromosome X telomere t(Xq;10q). Polymorphisms between haplotypes of the same genome reveals genetic and epigenetic variation for all chromosomes, especially at centromeres. The RPE-1 assembly as matched reference genome improves mapping quality of multi-omics reads originating from RPE-1 cells with drastic reduction in alignments mismatches compared to using the most complete human reference to date (CHM13). Leveraging the accuracy achieved using a matched reference, we were able to identify the kinetochore sites at base pair resolution and show unprecedented variation between haplotypes. This work showcases the use of matched reference genomes for multiomics analyses and serves as the foundation for a call to comprehensively assemble experimentally relevant cell lines for widespread application.
Collapse
Affiliation(s)
- Emilia Volpe
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Luca Corda
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Elena Di Tommaso
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Franca Pelliccia
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Riccardo Ottalevi
- Department of Bioinformatic, Dante Genomics Corp Inc., 667 Madison Avenue, New York, NY 10065 USA and S.s.17, 67100, L’Aquila, Italy
| | | | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Mattia Capulli
- Department of Biotechnological and Applied Clinical Sciences, University of L’Aquila, L’Aquila, Italy
| | - Giulio Formenti
- The Rockefeller University, 1230 York Avenue, 10065 New York, USA
| | - Evelyne Tassone
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| | - Simona Giunta
- Giunta Laboratory of Genome Evolution, Department of Biology and Biotechnologies Charles Darwin, University of Rome “Sapienza”, Piazzale Aldo Moro 5, 00185 Rome, Italy
| |
Collapse
|
5
|
Xu R, Pan Z, Nakagawa T. Gross Chromosomal Rearrangement at Centromeres. Biomolecules 2023; 14:28. [PMID: 38254628 PMCID: PMC10813616 DOI: 10.3390/biom14010028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 12/19/2023] [Accepted: 12/22/2023] [Indexed: 01/24/2024] Open
Abstract
Centromeres play essential roles in the faithful segregation of chromosomes. CENP-A, the centromere-specific histone H3 variant, and heterochromatin characterized by di- or tri-methylation of histone H3 9th lysine (H3K9) are the hallmarks of centromere chromatin. Contrary to the epigenetic marks, DNA sequences underlying the centromere region of chromosomes are not well conserved through evolution. However, centromeres consist of repetitive sequences in many eukaryotes, including animals, plants, and a subset of fungi, including fission yeast. Advances in long-read sequencing techniques have uncovered the complete sequence of human centromeres containing more than thousands of alpha satellite repeats and other types of repetitive sequences. Not only tandem but also inverted repeats are present at a centromere. DNA recombination between centromere repeats can result in gross chromosomal rearrangement (GCR), such as translocation and isochromosome formation. CENP-A chromatin and heterochromatin suppress the centromeric GCR. The key player of homologous recombination, Rad51, safeguards centromere integrity through conservative noncrossover recombination between centromere repeats. In contrast to Rad51-dependent recombination, Rad52-mediated single-strand annealing (SSA) and microhomology-mediated end-joining (MMEJ) lead to centromeric GCR. This review summarizes recent findings on the role of centromere and recombination proteins in maintaining centromere integrity and discusses how GCR occurs at centromeres.
Collapse
Affiliation(s)
- Ran Xu
- Department of Biological Sciences, Graduate School of Science, Osaka University, 1-1 Machikaneyama, Toyonaka 560-0043, Osaka, Japan
- Forefront Research Center, Graduate School of Science, Osaka University, 1-1 Machikaneyama, Toyonaka 560-0043, Osaka, Japan
| | - Ziyi Pan
- Department of Biological Sciences, Graduate School of Science, Osaka University, 1-1 Machikaneyama, Toyonaka 560-0043, Osaka, Japan
- Forefront Research Center, Graduate School of Science, Osaka University, 1-1 Machikaneyama, Toyonaka 560-0043, Osaka, Japan
| | - Takuro Nakagawa
- Department of Biological Sciences, Graduate School of Science, Osaka University, 1-1 Machikaneyama, Toyonaka 560-0043, Osaka, Japan
- Forefront Research Center, Graduate School of Science, Osaka University, 1-1 Machikaneyama, Toyonaka 560-0043, Osaka, Japan
| |
Collapse
|
6
|
Ma H, Ding W, Chen Y, Zhou J, Chen W, Lan C, Mao H, Li Q, Yan W, Su H. Centromere Plasticity With Evolutionary Conservation and Divergence Uncovered by Wheat 10+ Genomes. Mol Biol Evol 2023; 40:msad176. [PMID: 37541261 PMCID: PMC10422864 DOI: 10.1093/molbev/msad176] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2023] [Revised: 06/26/2023] [Accepted: 07/28/2023] [Indexed: 08/06/2023] Open
Abstract
Centromeres (CEN) are the chromosomal regions that play a crucial role in maintaining genomic stability. The underlying highly repetitive DNA sequences can evolve quickly in most eukaryotes, and promote karyotype evolution. Despite their variability, it is not fully understood how these widely variable sequences ensure the homeostasis of centromere function. In this study, we investigated the genetics and epigenetics of CEN in a population of wheat lines from global breeding programs. We captured a high degree of sequences, positioning, and epigenetic variations in the large and complex wheat CEN. We found that most CENH3-associated repeats are Cereba element of retrotransposons and exhibit phylogenetic homogenization across different wheat lines, but the less-associated repeat sequences diverge on their own way in each wheat line, implying specific mechanisms for selecting certain repeat types as functional core CEN. Furthermore, we observed that CENH3 nucleosome structures display looser wrapping of DNA termini on complex centromeric repeats, including the repositioned CEN. We also found that strict CENH3 nucleosome positioning and intrinsic DNA features play a role in determining centromere identity among different lines. Specific non-B form DNAs were substantially associated with CENH3 nucleosomes for the repositioned centromeres. These findings suggest that multiple mechanisms were involved in the adaptation of CENH3 nucleosomes that can stabilize CEN. Ultimately, we proposed a remarkable epigenetic plasticity of centromere chromatin within the diverse genomic context, and the high robustness is crucial for maintaining centromere function and genome stability in wheat 10+ lines as a result of past breeding selections.
Collapse
Affiliation(s)
- Huan Ma
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, China
| | - Wentao Ding
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, China
| | - Yiqian Chen
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, China
| | - Jingwei Zhou
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, China
| | - Wei Chen
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, China
| | - Caixia Lan
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, China
| | - Hailiang Mao
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, China
| | - Qiang Li
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, China
| | - Wenhao Yan
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, China
| | - Handong Su
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| |
Collapse
|
7
|
Lin Y, Ye C, Li X, Chen Q, Wu Y, Zhang F, Pan R, Zhang S, Chen S, Wang X, Cao S, Wang Y, Yue Y, Liu Y, Yue J. quarTeT: a telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification. HORTICULTURE RESEARCH 2023; 10:uhad127. [PMID: 37560017 PMCID: PMC10407605 DOI: 10.1093/hr/uhad127] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Accepted: 06/05/2023] [Indexed: 08/11/2023]
Abstract
A high-quality genome is the basis for studies on functional, evolutionary, and comparative genomics. The majority of attention has been paid to the solution of complex chromosome structures and highly repetitive sequences, along with the emergence of a new 'telomere-to-telomere (T2T) assembly' era. However, the bioinformatic tools for the automatic construction and/or characterization of T2T genome are limited. Here, we developed a user-friendly web toolkit, quarTeT, which currently includes four modules: AssemblyMapper, GapFiller, TeloExplorer, and CentroMiner. First, AssemblyMapper is designed to assemble phased contigs into the chromosome-level genome by referring to a closely related genome. Then, GapFiller would endeavor to fill all unclosed gaps in a given genome with the aid of additional ultra-long sequences. Finally, TeloExplorer and CentroMiner are applied to identify candidate telomere and centromere as well as their localizations on each chromosome. These four modules can be used alone or in combination with each other for T2T genome assembly and characterization. As a case study, by adopting the entire modular functions of quarTeT, we have achieved the Actinidia chinensis genome assembly that is of a quality comparable to the reported genome Hongyang v4.0, which was assembled with the addition of manual handling. Further evaluation of CentroMiner by searching centromeres in Arabidopsis thaliana and Oryza sativa genomes showed that quarTeT is capable of identifying all the centromeric regions that have been previously detected by experimental methods. Collectively, quarTeT is an efficient toolkit for studies of large-scale T2T genomes and can be accessed at http://www.atcgn.com:8080/quarTeT/home.html without registration.
Collapse
Affiliation(s)
- Yunzhi Lin
- College of Life Science, Sichuan University, Chengdu, Sichuan 610064, China
- School of Horticulture, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Chen Ye
- School of Information and Computer, Anhui Agricultural University, Hefei, Anhui 230036, China
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Xingzhu Li
- School of Horticulture, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Qinyao Chen
- School of Horticulture, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Ying Wu
- School of Horticulture, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Feng Zhang
- School of Horticulture, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Rui Pan
- School of Horticulture, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Sijia Zhang
- School of Horticulture, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Shuxia Chen
- School of Information and Computer, Anhui Agricultural University, Hefei, Anhui 230036, China
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Xu Wang
- Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518124, China
| | - Shuo Cao
- Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518124, China
- Key Laboratory of Horticultural Plant Biology Ministry of Education, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Yingzhen Wang
- School of Horticulture, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Yi Yue
- School of Information and Computer, Anhui Agricultural University, Hefei, Anhui 230036, China
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Yongsheng Liu
- College of Life Science, Sichuan University, Chengdu, Sichuan 610064, China
- School of Horticulture, Anhui Agricultural University, Hefei, Anhui 230036, China
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Junyang Yue
- School of Horticulture, Anhui Agricultural University, Hefei, Anhui 230036, China
- Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518124, China
| |
Collapse
|
8
|
Harold C. All these screens that we've done: how functional genetic screens have informed our understanding of ribosome biogenesis. Biosci Rep 2023; 43:BSR20230631. [PMID: 37335083 PMCID: PMC10329186 DOI: 10.1042/bsr20230631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2023] [Revised: 06/08/2023] [Accepted: 06/19/2023] [Indexed: 06/21/2023] Open
Abstract
Ribosome biogenesis is the complex and essential process that ultimately leads to the synthesis of cellular proteins. Understanding each step of this essential process is imperative to increase our understanding of basic biology, but also more critically, to provide novel therapeutic avenues for genetic and developmental diseases such as ribosomopathies and cancers which can arise when this process is impaired. In recent years, significant advances in technology have made identifying and characterizing novel human regulators of ribosome biogenesis via high-content, high-throughput screens. Additionally, screening platforms have been used to discover novel therapeutics for cancer. These screens have uncovered a wealth of knowledge regarding novel proteins involved in human ribosome biogenesis, from the regulation of the transcription of the ribosomal RNA to global protein synthesis. Specifically, comparing the discovered proteins in these screens showed interesting connections between large ribosomal subunit (LSU) maturation factors and earlier steps in ribosome biogenesis, as well as overall nucleolar integrity. In this review, a discussion of the current standing of screens for human ribosome biogenesis factors through the lens of comparing the datasets and discussing the biological implications of the areas of overlap will be combined with a look toward other technologies and how they can be adapted to discover more factors involved in ribosome synthesis, and answer other outstanding questions in the field.
Collapse
Affiliation(s)
- Cecelia M. Harold
- Department of Genetics, Yale School of Medicine, New Haven, CT, U.S.A
| |
Collapse
|
9
|
Peona V, Kutschera VE, Blom MPK, Irestedt M, Suh A. Satellite DNA evolution in Corvoidea inferred from short and long reads. Mol Ecol 2023; 32:1288-1305. [PMID: 35488497 DOI: 10.1111/mec.16484] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 04/11/2022] [Accepted: 04/17/2022] [Indexed: 11/29/2022]
Abstract
Satellite DNA (satDNA) is a fast-evolving portion of eukaryotic genomes. The homogeneous and repetitive nature of such satDNA causes problems during the assembly of genomes, and therefore it is still difficult to study it in detail in nonmodel organisms as well as across broad evolutionary timescales. Here, we combined the use of short- and long-read data to explore the diversity and evolution of satDNA between individuals of the same species and between genera of birds spanning ~40 millions of years of bird evolution using birds-of-paradise (Paradisaeidae) and crow (Corvus) species. These avian species highlighted the presence of a GC-rich Corvoidea satellitome composed of 61 satellite families and provided a set of candidate satDNA monomers for being centromeric on the basis of length, abundance, homogeneity and transcription. Surprisingly, we found that the satDNA of crow species rapidly diverged between closely related species while the satDNA appeared more similar between birds-of-paradise species belonging to different genera.
Collapse
Affiliation(s)
- Valentina Peona
- Department of Organismal Biology - Systematic Biology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Verena E Kutschera
- Department of Biochemistry and Biophysics, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Stockholm University, Solna, Sweden
| | - Mozes P K Blom
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden.,Museum für Naturkunde, Leibniz Institut für Evolutions- und Biodiversitätsforschung, Berlin, Germany
| | - Martin Irestedt
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden
| | - Alexander Suh
- Department of Organismal Biology - Systematic Biology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden.,School of Biological Sciences-Organisms and the Environment, University of East Anglia, Norwich, UK
| |
Collapse
|
10
|
Ragupathi A, Singh M, Perez AM, Zhang D. Targeting the BRCA1/ 2 deficient cancer with PARP inhibitors: Clinical outcomes and mechanistic insights. Front Cell Dev Biol 2023; 11:1133472. [PMID: 37035242 PMCID: PMC10073599 DOI: 10.3389/fcell.2023.1133472] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 03/14/2023] [Indexed: 04/11/2023] Open
Abstract
BRCA1 and BRCA2 play a critical role in a variety of molecular processes related to DNA metabolism, including homologous recombination and mediating the replication stress response. Individuals with mutations in the BRCA1 and BRCA2 (BRCA1/2) genes have a significantly higher risk of developing various types of cancers, especially cancers of the breast, ovary, pancreas, and prostate. Currently, the Food and Drug Administration (FDA) has approved four PARP inhibitors (PARPi) to treat cancers with BRCA1/2 mutations. In this review, we will first summarize the clinical outcomes of the four FDA-approved PARPi in treating BRCA1/2 deficient cancers. We will then discuss evidence supporting the hypothesis that the cytotoxic effect of PARPi is likely due to inducing excessive replication stress at the difficult-to-replicate (DTR) genomic regions in BRCA1/2 mutated tumors. Finally, we will discuss the ongoing preclinical and clinical studies on how to combine the PARPi with immuno-oncology drugs to further improve clinical outcomes.
Collapse
|
11
|
Silva JM, Qi W, Pinho AJ, Pratas D. AlcoR: alignment-free simulation, mapping, and visualization of low-complexity regions in biological data. Gigascience 2022; 12:giad101. [PMID: 38091509 PMCID: PMC10716826 DOI: 10.1093/gigascience/giad101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/29/2023] [Accepted: 11/07/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND Low-complexity data analysis is the area that addresses the search and quantification of regions in sequences of elements that contain low-complexity or repetitive elements. For example, these can be tandem repeats, inverted repeats, homopolymer tails, GC-biased regions, similar genes, and hairpins, among many others. Identifying these regions is crucial because of their association with regulatory and structural characteristics. Moreover, their identification provides positional and quantity information where standard assembly methodologies face significant difficulties because of substantial higher depth coverage (mountains), ambiguous read mapping, or where sequencing or reconstruction defects may occur. However, the capability to distinguish low-complexity regions (LCRs) in genomic and proteomic sequences is a challenge that depends on the model's ability to find them automatically. Low-complexity patterns can be implicit through specific or combined sources, such as algorithmic or probabilistic, and recurring to different spatial distances-namely, local, medium, or distant associations. FINDINGS This article addresses the challenge of automatically modeling and distinguishing LCRs, providing a new method and tool (AlcoR) for efficient and accurate segmentation and visualization of these regions in genomic and proteomic sequences. The method enables the use of models with different memories, providing the ability to distinguish local from distant low-complexity patterns. The method is reference and alignment free, providing additional methodologies for testing, including a highly flexible simulation method for generating biological sequences (DNA or protein) with different complexity levels, sequence masking, and a visualization tool for automatic computation of the LCR maps into an ideogram style. We provide illustrative demonstrations using synthetic, nearly synthetic, and natural sequences showing the high efficiency and accuracy of AlcoR. As large-scale results, we use AlcoR to unprecedentedly provide a whole-chromosome low-complexity map of a recent complete human genome and the haplotype-resolved chromosome pairs of a heterozygous diploid African cassava cultivar. CONCLUSIONS The AlcoR method provides the ability of fast sequence characterization through data complexity analysis, ideally for scenarios entangling the presence of new or unknown sequences. AlcoR is implemented in C language using multithreading to increase the computational speed, is flexible for multiple applications, and does not contain external dependencies. The tool accepts any sequence in FASTA format. The source code is freely provided at https://github.com/cobilab/alcor.
Collapse
Affiliation(s)
- Jorge M Silva
- IEETA, Institute of Electronics and Informatics Engineering of Aveiro, and LASI, Intelligent Systems Associate Laboratory, University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal
- Department of Electronics Telecommunications and Informatics, University of Aveiro, Campus Universitario de Santiago, 3810-193, Aveiro, Portugal
| | - Weihong Qi
- Functional Genomics Center Zurich, ETH Zurich and University of Zurich, Winterthurerstrasse, 190, 8057, Zurich, Switzerland
- SIB, Swiss Institute of Bioinformatics, 1202, Geneva, Switzerland
| | - Armando J Pinho
- IEETA, Institute of Electronics and Informatics Engineering of Aveiro, and LASI, Intelligent Systems Associate Laboratory, University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal
- Department of Electronics Telecommunications and Informatics, University of Aveiro, Campus Universitario de Santiago, 3810-193, Aveiro, Portugal
| | - Diogo Pratas
- IEETA, Institute of Electronics and Informatics Engineering of Aveiro, and LASI, Intelligent Systems Associate Laboratory, University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal
- Department of Electronics Telecommunications and Informatics, University of Aveiro, Campus Universitario de Santiago, 3810-193, Aveiro, Portugal
- Department of Virology, University of Helsinki, Haartmaninkatu, 3, 00014 Helsinki, Finland
| |
Collapse
|
12
|
Rosandić M, Vlahović I, Pilaš I, Glunčić M, Paar V. An Explanation of Exceptions from Chargaff's Second Parity Rule/Strand Symmetry of DNA Molecules. Genes (Basel) 2022; 13:1929. [PMID: 36360166 PMCID: PMC9689577 DOI: 10.3390/genes13111929] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Revised: 10/12/2022] [Accepted: 10/17/2022] [Indexed: 11/04/2022] Open
Abstract
In this article, we show that mono/oligonucleotide quadruplets, as basic structures of DNA, along with our classification of trinucleotides, disclose an organization of genomes based on purine-pyrimidine symmetry. Moreover, the structure and stability of DNA are influenced by the Watson-Crick pairing and the natural law of DNA creation and conservation, according to which the same mono- or oligonucleotide insertion must be inserted simultaneously into both strands of DNA. Taken together, they lead to quadruplets with central mirror symmetry and bidirectional DNA strand orientation and are incorporated into Chargaff's second parity rule (CSPR). Performing our quadruplet frequency analysis of all human chromosomes and of Neuroblastoma BreakPoint Family (NBPF) genes, which code Olduvai protein domains in the human genome, we show that the coding part of DNA violates CSPR. This may shed new light and give rise to a novel hypothesis on DNA creation and its evolution. In this framework, the logarithmic relationship between oligonucleotide order and minimal DNA sequence length, to establish the validity of CSPR, automatically follows from the quadruplet structure of the genomic sequence. The problem of the violation of CSPR in rare symbionts is discussed.
Collapse
Affiliation(s)
- Marija Rosandić
- University Hospital Centre Zagreb (Ret.), 10000 Zagreb, Croatia
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| | - Ines Vlahović
- Faculty of Science, Algebra University College, 10000 Zagreb, Croatia
| | - Ivan Pilaš
- Forest Research Institute, 10450 Jastrebarsko, Croatia
| | - Matko Glunčić
- Physics Department, Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia
| | - Vladimir Paar
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
- Physics Department, Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia
| |
Collapse
|
13
|
Jiang N, Li Z, Dai Y, Liu Z, Han X, Li Y, Li Y, Xiong H, Xu J, Zhang G, Xiao S, Yuan X, Fu Y. Massive genome investigations reveal insights of prevalent introgression for environmental adaptation and triterpene biosynthesis in Ganoderma. Mol Ecol Resour 2022. [PMID: 36214617 DOI: 10.1111/1755-0998.13718] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 09/26/2022] [Accepted: 10/06/2022] [Indexed: 11/29/2022]
Abstract
Genome introgression is one of the driving forces that can increase species and genetic diversity and facilitate the adaptive evolution of organisms and biodiversity conservation. However, the genomic introgression and its contribution to biodiversity of macrofungi are still unclear. The genus Ganoderma is a typical macrofungal group that plays crucial roles in forest ecosystem as saprophytic organisms and plant pathogens, and is also involved in human health as medicinal mushrooms. Most public Ganoderma genomes are fragmented, and reference genomes and whole-genome information of diverse germplasm resources for many Ganoderma species are lacking, thus hindering functional and evolutionary genomic investigations among Ganoderma species. In this study, we provide high-quality genomes of 10 Ganoderma species and whole-genome variants data of 224 individuals from various ecoregions, enabling us to infer the phylogeny of Ganoderma species and their historical population dynamics. Based on whole-genome variants, widespread and genome-wide introgression among Ganoderma species is revealed. Genes with significant introgression signals were related to stress response, digestive absorption, and secondary metabolite synthesis, factors that may contribute to environmental adaptation and important biocomponent metabolism. CYP512U6, an essential functional gene in the CYP450 family related to Ganoderma triterpene synthesis, was detected with significant introgression and selection signals combined with Ganoderma metabolomic analysis, indicating that both ancient gene exchange and recent domestication have contributed to the categories and content of secondary metabolites of Ganoderma. The reference genomes, whole-genome variants, and metabolite profiles could serve as abundant and valuable genetic resources for evolution, ecology, and conservation investigations of Ganoderma species and other macrofungi.
Collapse
Affiliation(s)
- Nan Jiang
- International Cooperation Research Center of China for New Germplasm Breeding of Edible Mushrooms, Jilin Agricultural University, Changchun, Jilin, China
- College of Plant Protection, Jilin Agricultural University, Jilin, Changchun, China
| | - Zhenhao Li
- ShouXianGu Botanical Drug Institute Co., Ltd., Jinhua, Zhejiang, China
| | - Yueting Dai
- International Cooperation Research Center of China for New Germplasm Breeding of Edible Mushrooms, Jilin Agricultural University, Changchun, Jilin, China
| | - Zhenhua Liu
- International Cooperation Research Center of China for New Germplasm Breeding of Edible Mushrooms, Jilin Agricultural University, Changchun, Jilin, China
| | - Xuerong Han
- International Cooperation Research Center of China for New Germplasm Breeding of Edible Mushrooms, Jilin Agricultural University, Changchun, Jilin, China
| | - Yu Li
- International Cooperation Research Center of China for New Germplasm Breeding of Edible Mushrooms, Jilin Agricultural University, Changchun, Jilin, China
| | - Yong Li
- Ecology and Nature Conservation Institute, Chinese Academy of Forestry, Beijing, China
| | - Hui Xiong
- ShouXianGu Botanical Drug Institute Co., Ltd., Jinhua, Zhejiang, China
| | - Jing Xu
- ShouXianGu Botanical Drug Institute Co., Ltd., Jinhua, Zhejiang, China
| | - Guoliang Zhang
- ShouXianGu Botanical Drug Institute Co., Ltd., Jinhua, Zhejiang, China
| | - Shijun Xiao
- Jiaxing Key Laboratory for New Germplasm Breeding of Economic Mycology, Jiaxing, Zhejiang, China
| | - Xiaohui Yuan
- International Cooperation Research Center of China for New Germplasm Breeding of Edible Mushrooms, Jilin Agricultural University, Changchun, Jilin, China
| | - Yongping Fu
- College of Plant Protection, Jilin Agricultural University, Jilin, Changchun, China
| |
Collapse
|
14
|
Naughton C, Huidobro C, Catacchio CR, Buckle A, Grimes GR, Nozawa RS, Purgato S, Rocchi M, Gilbert N. Human centromere repositioning activates transcription and opens chromatin fibre structure. Nat Commun 2022; 13:5609. [PMID: 36153345 PMCID: PMC9509383 DOI: 10.1038/s41467-022-33426-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 09/14/2022] [Indexed: 11/09/2022] Open
Abstract
AbstractHuman centromeres appear as constrictions on mitotic chromosomes and form a platform for kinetochore assembly in mitosis. Biophysical experiments led to a suggestion that repetitive DNA at centromeric regions form a compact scaffold necessary for function, but this was revised when neocentromeres were discovered on non-repetitive DNA. To test whether centromeres have a special chromatin structure we have analysed the architecture of a neocentromere. Centromere repositioning is accompanied by RNA polymerase II recruitment and active transcription to form a decompacted, negatively supercoiled domain enriched in ‘open’ chromatin fibres. In contrast, centromerisation causes a spreading of repressive epigenetic marks to surrounding regions, delimited by H3K27me3 polycomb boundaries and divergent genes. This flanking domain is transcriptionally silent and partially remodelled to form ‘compact’ chromatin, similar to satellite-containing DNA sequences, and exhibits genomic instability. We suggest transcription disrupts chromatin to provide a foundation for kinetochore formation whilst compact pericentromeric heterochromatin generates mechanical rigidity.
Collapse
|
15
|
Volarić M, Despot-Slade E, Veseljak D, Meštrović N, Mravinac B. Reference-Guided De Novo Genome Assembly of the Flour Beetle Tribolium freemani. Int J Mol Sci 2022; 23:ijms23115869. [PMID: 35682551 PMCID: PMC9180572 DOI: 10.3390/ijms23115869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 05/19/2022] [Accepted: 05/20/2022] [Indexed: 02/06/2023] Open
Abstract
The flour beetle Tribolium freemani is a sibling species of the model organism and important pest Tribolium castaneum. The two species are so closely related that they can produce hybrid progeny, but the genetic basis of their differences has not been revealed. In this work, we sequenced the T. freemani genome by applying PacBio HiFi technology. Using the well-assembled T. castaneum genome as a reference, we assembled 262 Mb of the T. freemani genomic sequence and anchored it in 10 linkage groups corresponding to nine autosomes and sex chromosome X. The assembly showed 99.8% completeness of conserved insect genes, indicating a high-quality reference genome. Comparison with the T. castaneum assembly revealed that the main differences in genomic sequence between the two sibling species come from repetitive DNA, including interspersed and tandem repeats. In this work, we also provided the complete assembled mitochondrial genome of T. freemani. Although the genome assembly needs to be ameliorated in tandemly repeated regions, the first version of the T. freemani reference genome and the complete mitogenome presented here represent useful resources for comparative evolutionary studies of related species and for further basic and applied research on different biological aspects of economically important pests.
Collapse
|
16
|
Altemose N, Glennis A, Bzikadze AV, Sidhwani P, Langley SA, Caldas GV, Hoyt SJ, Uralsky L, Ryabov FD, Shew CJ, Sauria MEG, Borchers M, Gershman A, Mikheenko A, Shepelev VA, Dvorkina T, Kunyavskaya O, Vollger MR, Rhie A, McCartney AM, Asri M, Lorig-Roach R, Shafin K, Aganezov S, Olson D, de Lima LG, Potapova T, Hartley GA, Haukness M, Kerpedjiev P, Gusev F, Tigyi K, Brooks S, Young A, Nurk S, Koren S, Salama SR, Paten B, Rogaev EI, Streets A, Karpen GH, Dernburg AF, Sullivan BA, Straight AF, Wheeler TJ, Gerton JL, Eichler EE, Phillippy AM, Timp W, Dennis MY, O'Neill RJ, Zook JM, Schatz MC, Pevzner PA, Diekhans M, Langley CH, Alexandrov IA, Miga KH. Complete genomic and epigenetic maps of human centromeres. Science 2022; 376:eabl4178. [PMID: 35357911 PMCID: PMC9233505 DOI: 10.1126/science.abl4178] [Citation(s) in RCA: 167] [Impact Index Per Article: 83.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, and functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome (189.9 megabases). Detailed maps of these regions revealed multimegabase structural rearrangements, including in active centromeric repeat arrays. Analysis of centromere-associated sequences uncovered a strong relationship between the position of the centromere and the evolution of the surrounding DNA through layered repeat expansions. Furthermore, comparisons of chromosome X centromeres across a diverse panel of individuals illuminated high degrees of structural, epigenetic, and sequence variation in these complex and rapidly evolving regions.
Collapse
Affiliation(s)
- Nicolas Altemose
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - A. Glennis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Andrey V. Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA, USA
| | - Pragya Sidhwani
- Department of Biochemistry, Stanford University, Stanford, CA, USA
| | - Sasha A. Langley
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Gina V. Caldas
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Savannah J. Hoyt
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Lev Uralsky
- Sirius University of Science and Technology, Sochi, Russia
- Vavilov Institute of General Genetics, Moscow, Russia
| | | | - Colin J. Shew
- Genome Center, MIND Institute, and Department of Biochemistry and Molecular Medicine, School of Medicine, University of California, Davis, Davis, CA, USA
| | | | | | - Ariel Gershman
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | | | - Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Olga Kunyavskaya
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Mitchell R. Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ann M. McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Ryan Lorig-Roach
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Daniel Olson
- Department of Computer Science, University of Montana, Missoula, MT. USA
| | | | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Gabrielle A. Hartley
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Fedor Gusev
- Vavilov Institute of General Genetics, Moscow, Russia
| | - Kristof Tigyi
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Shelise Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Alice Young
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sofie R. Salama
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| | - Evgeny I. Rogaev
- Sirius University of Science and Technology, Sochi, Russia
- Vavilov Institute of General Genetics, Moscow, Russia
- Department of Psychiatry, University of Massachusetts Medical School, Worcester, MA, USA
- Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Aaron Streets
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Gary H. Karpen
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- BioEngineering and BioMedical Sciences Department, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Abby F. Dernburg
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Institute for Quantitative Biosciences (QB3), University of California, Berkeley, Berkeley, CA, USA
| | - Beth A. Sullivan
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC, USA
| | | | - Travis J. Wheeler
- Department of Computer Science, University of Montana, Missoula, MT. USA
| | - Jennifer L. Gerton
- Stowers Institute for Medical Research, Kansas City, MO, USA
- University of Kansas Medical School, Department of Biochemistry and Molecular Biology and Cancer Center, University of Kansas, Kansas City, KS, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Winston Timp
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Megan Y. Dennis
- Genome Center, MIND Institute, and Department of Biochemistry and Molecular Medicine, School of Medicine, University of California, Davis, Davis, CA, USA
| | - Rachel J. O'Neill
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Justin M. Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California at San Diego, San Diego, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Charles H. Langley
- Department of Evolution and Ecology, University of California Davis, Davis, CA, USA
| | - Ivan A. Alexandrov
- Vavilov Institute of General Genetics, Moscow, Russia
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
- Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| |
Collapse
|
17
|
Blaxter M, Archibald JM, Childers AK, Coddington JA, Crandall KA, Di Palma F, Durbin R, Edwards SV, Graves JAM, Hackett KJ, Hall N, Jarvis ED, Johnson RN, Karlsson EK, Kress WJ, Kuraku S, Lawniczak MKN, Lindblad-Toh K, Lopez JV, Moran NA, Robinson GE, Ryder OA, Shapiro B, Soltis PS, Warnow T, Zhang G, Lewin HA. Why sequence all eukaryotes? Proc Natl Acad Sci U S A 2022; 119:e2115636118. [PMID: 35042801 PMCID: PMC8795522 DOI: 10.1073/pnas.2115636118] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Life on Earth has evolved from initial simplicity to the astounding complexity we experience today. Bacteria and archaea have largely excelled in metabolic diversification, but eukaryotes additionally display abundant morphological innovation. How have these innovations come about and what constraints are there on the origins of novelty and the continuing maintenance of biodiversity on Earth? The history of life and the code for the working parts of cells and systems are written in the genome. The Earth BioGenome Project has proposed that the genomes of all extant, named eukaryotes-about 2 million species-should be sequenced to high quality to produce a digital library of life on Earth, beginning with strategic phylogenetic, ecological, and high-impact priorities. Here we discuss why we should sequence all eukaryotic species, not just a representative few scattered across the many branches of the tree of life. We suggest that many questions of evolutionary and ecological significance will only be addressable when whole-genome data representing divergences at all of the branchings in the tree of life or all species in natural ecosystems are available. We envisage that a genomic tree of life will foster understanding of the ongoing processes of speciation, adaptation, and organismal dependencies within entire ecosystems. These explorations will resolve long-standing problems in phylogenetics, evolution, ecology, conservation, agriculture, bioindustry, and medicine.
Collapse
Affiliation(s)
- Mark Blaxter
- Wellcome Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom;
| | - John M Archibald
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, NS B3H 4H7, Canada
| | - Anna K Childers
- Bee Research Laboratory, Agricultural Research Service, US Department of Agriculture (USDA), Beltsville, MD 20705
| | - Jonathan A Coddington
- Global Genome Initiative, National Museum of Natural History, Smithsonian Institution, Washington, DC 20560
| | - Keith A Crandall
- Computational Biology Institute, Department of Biostatistics and Bioinformatics, George Washington University, Washington, DC 20052
- Department of Invertebrate Zoology, Smithsonian Institution, Washington, DC 20013
| | - Federica Di Palma
- School of Biological Sciences, University of East Anglia, Norwich NR4 7TJ, United Kingdom
| | - Richard Durbin
- Wellcome Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Scott V Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138
- Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138
| | - Jennifer A M Graves
- School of Life Sciences, La Trobe University, Bundoora, VIC 751 23, Australia
- University of Canberra, Bruce, ACT 2617, Australia
| | - Kevin J Hackett
- Crop Production and Protection, Office of National Programs, Agricultural Research Service, USDA, Beltsville, MD 20705
| | - Neil Hall
- Earlham Institute, Norwich, Norfolk NR4 7UZ, United Kingdom
| | - Erich D Jarvis
- Laboratory of the Neurogenetics of Language, The Rockefeller University, New York, NY 10065
- Howard Hughes Medical Institute, Chevy Chase, MD 20815
| | - Rebecca N Johnson
- National Museum of Natural History, Smithsonian Institution, Washington, DC 20560
| | - Elinor K Karlsson
- Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
| | - W John Kress
- Botany, National Museum of Natural History, Smithsonian Institution, Washington, DC 20013-7012
| | - Shigehiro Kuraku
- Department of Genomics and Evolutionary Biology, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
- Laboratory for Phyloinformatics, RIKEN Center for Biosystems Dynamics Research, Kobe, Hyogo 650-0047, Japan
| | | | - Kerstin Lindblad-Toh
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala 751 23, Sweden
| | - Jose V Lopez
- Department of Biological Sciences, Halmos College of Arts and Sciences, Nova Southeastern University, Dania Beach, FL 33004
- Guy Harvey Oceanographic Center, Dania Beach, FL 33004
| | - Nancy A Moran
- Integrative Biology, University of Texas at Austin, Austin, TX 78712
| | - Gene E Robinson
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801
- Department of Entomology, University of Illinois at Urbana-Champaign, Urbana, IL 61801
| | - Oliver A Ryder
- Conservation Genetics, Division of Biology, San Diego Zoo Wildlife Alliance, Escondido, CA 92027
- Department of Evolution, Behavior and Ecology, University of California, San Diego, La Jolla, CA 92039
| | - Beth Shapiro
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA 95064
| | - Pamela S Soltis
- Florida Museum of Natural History, University of Florida, Gainesville, FL 32611
- Biodiversity Institute, University of Florida, Gainesville, FL 32611
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61301
| | - Guojie Zhang
- Villum Center for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen 2100, Denmark
- China National Genebank, Beijing Genomics Institute-Shenzhen, Shenzhen 518083, China
| | - Harris A Lewin
- Department of Evolution and Ecology, College of Biological Sciences, University of California, Davis, CA 95616
- Department of Population Health and Reproduction, University of California, Davis, CA 95616
| |
Collapse
|
18
|
Bzikadze AV, Mikheenko A, Pevzner PA. Fast and accurate mapping of long reads to complete genome assemblies with VerityMap. Genome Res 2022; 32:2107-2118. [PMID: 36379716 PMCID: PMC9808623 DOI: 10.1101/gr.276871.122] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 11/09/2022] [Indexed: 11/16/2022]
Abstract
Recent advancements in long-read sequencing have enabled the telomere-to-telomere (complete) assembly of a human genome and are now contributing to the haplotype-resolved complete assemblies of multiple human genomes. Because the accuracy of read mapping tools deteriorates in highly repetitive regions, there is a need to develop accurate, error-exposing (detecting potential assembly errors), and diploid-aware (distinguishing different haplotypes) tools for read mapping in complete assemblies. We describe the first accurate, error-exposing, and partially diploid-aware VerityMap tool for long-read mapping to complete assemblies.
Collapse
Affiliation(s)
- Andrey V. Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, California 92093, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, 199034, Russia
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, California 92093, USA
| |
Collapse
|
19
|
Abstract
We are entering a new era in genomics where entire centromeric regions are accurately represented in human reference assemblies. Access to these high-resolution maps will enable new surveys of sequence and epigenetic variation in the population and offer new insight into satellite array genomics and centromere function. Here, we focus on the sequence organization and evolution of alpha satellites, which are credited as the genetic and genomic definition of human centromeres due to their interaction with inner kinetochore proteins and their importance in the development of human artificial chromosome assays. We provide an overview of alpha satellite repeat structure and array organization in the context of these high-quality reference data sets; discuss the emergence of variation-based surveys; and provide perspective on the role of this new source of genetic and epigenetic variation in the context of chromosome biology, genome instability, and human disease.
Collapse
Affiliation(s)
- Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California 95064, USA; .,Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Ivan A Alexandrov
- Department of Genomics and Human Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia; .,Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199004, Russia.,Research Center of Biotechnology of the Russian Academy of Sciences, Moscow 119071, Russia
| |
Collapse
|
20
|
Belser C, Baurens FC, Noel B, Martin G, Cruaud C, Istace B, Yahiaoui N, Labadie K, Hřibová E, Doležel J, Lemainque A, Wincker P, D'Hont A, Aury JM. Telomere-to-telomere gapless chromosomes of banana using nanopore sequencing. Commun Biol 2021; 4:1047. [PMID: 34493830 PMCID: PMC8423783 DOI: 10.1038/s42003-021-02559-3] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 08/13/2021] [Indexed: 02/07/2023] Open
Abstract
Long-read technologies hold the promise to obtain more complete genome assemblies and to make them easier. Coupled with long-range technologies, they can reveal the architecture of complex regions, like centromeres or rDNA clusters. These technologies also make it possible to know the complete organization of chromosomes, which remained complicated before even when using genetic maps. However, generating a gapless and telomere-to-telomere assembly is still not trivial, and requires a combination of several technologies and the choice of suitable software. Here, we report a chromosome-scale assembly of a banana genome (Musa acuminata) generated using Oxford Nanopore long-reads. We generated a genome coverage of 177X from a single PromethION flowcell with near 17X with reads longer than 75 kbp. From the 11 chromosomes, 5 were entirely reconstructed in a single contig from telomere to telomere, revealing for the first time the content of complex regions like centromeres or clusters of paralogous genes.
Collapse
Affiliation(s)
- Caroline Belser
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, Evry, France
| | - Franc-Christophe Baurens
- CIRAD, UMR AGAP Institut, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Benjamin Noel
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, Evry, France
| | - Guillaume Martin
- CIRAD, UMR AGAP Institut, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Corinne Cruaud
- Commissariat à l'Energie Atomique (CEA), Institut François Jacob, Genoscope, Evry, France
| | - Benjamin Istace
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, Evry, France
| | - Nabila Yahiaoui
- CIRAD, UMR AGAP Institut, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Karine Labadie
- Commissariat à l'Energie Atomique (CEA), Institut François Jacob, Genoscope, Evry, France
| | - Eva Hřibová
- Institute of Experimental Botany of the Czech Academy of Sciences, Centre of the Region Haná for Biotechnological and Agricultural Research, Olomouc, Czech Republic
| | - Jaroslav Doležel
- Institute of Experimental Botany of the Czech Academy of Sciences, Centre of the Region Haná for Biotechnological and Agricultural Research, Olomouc, Czech Republic
| | - Arnaud Lemainque
- Commissariat à l'Energie Atomique (CEA), Institut François Jacob, Genoscope, Evry, France
| | - Patrick Wincker
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, Evry, France
| | - Angélique D'Hont
- CIRAD, UMR AGAP Institut, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Jean-Marc Aury
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, Evry, France.
| |
Collapse
|
21
|
Wang B, Yang X, Jia Y, Xu Y, Jia P, Dang N, Wang S, Xu T, Zhao X, Gao S, Dong Q, Ye K. High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads. GENOMICS, PROTEOMICS & BIOINFORMATICS 2021; 20:4-13. [PMID: 34487862 PMCID: PMC9510872 DOI: 10.1016/j.gpb.2021.08.003] [Citation(s) in RCA: 65] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 08/18/2021] [Accepted: 08/23/2021] [Indexed: 02/08/2023]
Abstract
Arabidopsis thaliana is an important and long-established model species for plant molecular biology, genetics, epigenetics, and genomics. However, the latest version of reference genome still contains significant number of missing segments. Here, we report a high-quality and almost complete Col-0 genome assembly with two gaps (Col-XJTU) using combination of Oxford Nanopore Technology ultra-long reads, PacBio high-fidelity long reads, and Hi-C data. The total genome assembly size is 133,725,193 bp, introducing 14.6 Mb of novel sequences compared to the TAIR10.1 reference genome. All five chromosomes of Col-XJTU assembly are highly accurate with consensus quality (QV) scores > 60 (ranging from 62 to 68), which are higher than those of TAIR10.1 reference (QV scores ranging from 45 to 52). We have completely resolved chromosome (Chr) 3 and Chr5 in a telomere-to-telomere manner. Chr4 has been completely resolved except the nucleolar organizing regions, which comprise long repetitive DNA fragments. The Chr1 centromere (CEN1), reportedly around 9 Mb in length, is particularly challenging to assemble due to the presence of tens of thousands of CEN180 satellite repeats. Using the cutting-edge sequencing data and novel computational approaches, we assembled about 4 Mb of sequence for CEN1 and a 3.5-Mb-long CEN2. We investigated the structure and epigenetics of centromeres. We detected four clusters of CEN180 monomers, and found that the centromere-specific histone H3-like protein (CENH3) exhibits a strong preference for CEN180 cluster 3. Moreover, we observed hypomethylation patterns in CENH3-enriched regions. We believe that this high-quality genome assembly, Col-XJTU, would serve as a valuable reference to better understand the global pattern of centromeric polymorphisms, as well as genetic and epigenetic features in plants.
Collapse
Affiliation(s)
- Bo Wang
- MOE Key Laboratory for Intelligent Networks & Network Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Xiaofei Yang
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China.
| | - Yanyan Jia
- MOE Key Laboratory for Intelligent Networks & Network Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Yu Xu
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Peng Jia
- MOE Key Laboratory for Intelligent Networks & Network Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China; School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Ningxin Dang
- MOE Key Laboratory for Intelligent Networks & Network Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China; School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China; Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
| | - Songbo Wang
- MOE Key Laboratory for Intelligent Networks & Network Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China; School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Tun Xu
- MOE Key Laboratory for Intelligent Networks & Network Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China; School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Xixi Zhao
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
| | - Shenghan Gao
- MOE Key Laboratory for Intelligent Networks & Network Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China; School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
| | - Quanbin Dong
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China
| | - Kai Ye
- MOE Key Laboratory for Intelligent Networks & Network Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, 710049, China; School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, 710049, China; School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China; Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China.
| |
Collapse
|
22
|
Dvorkina T, Kunyavskaya O, Bzikadze AV, Alexandrov I, Pevzner PA. CentromereArchitect: inference and analysis of the architecture of centromeres. Bioinformatics 2021; 37:i196-i204. [PMID: 34252949 PMCID: PMC8336445 DOI: 10.1093/bioinformatics/btab265] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Motivation Recent advances in long-read sequencing technologies led to rapid progress in centromere assembly in the last year and, for the first time, opened a possibility to address the long-standing questions about the architecture and evolution of human centromeres. However, since these advances have not been yet accompanied by the development of the centromere-specific bioinformatics algorithms, even the fundamental questions (e.g. centromere annotation by deriving the complete set of human monomers and high-order repeats), let alone more complex questions (e.g. explaining how monomers and high-order repeats evolved) about human centromeres remain open. Moreover, even though there was a four-decade-long series of studies aimed at cataloging all human monomers and high-order repeats, the rigorous algorithmic definitions of these concepts are still lacking. Thus, the development of a centromere annotation tool is a prerequisite for follow-up personalized biomedical studies of centromeres across the human population and evolutionary studies of centromeres across various species. Results We describe the CentromereArchitect, the first tool for the centromere annotation in a newly sequenced genome, apply it to the recently generated complete assembly of a human genome by the Telomere-to-Telomere consortium, generate the complete set of human monomers and high-order repeats for ‘live’ centromeres, and reveal a vast set of hybrid monomers that may represent the focal points of centromere evolution. Availability and implementation CentromereArchitect is publicly available on https://github.com/ablab/stringdecomposer/tree/ismb2021 Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Olga Kunyavskaya
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, CA 92093, USA
| | - Ivan Alexandrov
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, CA 92093, USA
| |
Collapse
|
23
|
Lopes M, Louzada S, Gama-Carvalho M, Chaves R. Genomic Tackling of Human Satellite DNA: Breaking Barriers through Time. Int J Mol Sci 2021; 22:4707. [PMID: 33946766 PMCID: PMC8125562 DOI: 10.3390/ijms22094707] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 04/24/2021] [Accepted: 04/27/2021] [Indexed: 12/12/2022] Open
Abstract
(Peri)centromeric repetitive sequences and, more specifically, satellite DNA (satDNA) sequences, constitute a major human genomic component. SatDNA sequences can vary on a large number of features, including nucleotide composition, complexity, and abundance. Several satDNA families have been identified and characterized in the human genome through time, albeit at different speeds. Human satDNA families present a high degree of sub-variability, leading to the definition of various subfamilies with different organization and clustered localization. Evolution of satDNA analysis has enabled the progressive characterization of satDNA features. Despite recent advances in the sequencing of centromeric arrays, comprehensive genomic studies to assess their variability are still required to provide accurate and proportional representation of satDNA (peri)centromeric/acrocentric short arm sequences. Approaches combining multiple techniques have been successfully applied and seem to be the path to follow for generating integrated knowledge in the promising field of human satDNA biology.
Collapse
Affiliation(s)
- Mariana Lopes
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (M.L.); (S.L.)
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| | - Sandra Louzada
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (M.L.); (S.L.)
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| | - Margarida Gama-Carvalho
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| | - Raquel Chaves
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (M.L.); (S.L.)
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| |
Collapse
|
24
|
Garg S. Computational methods for chromosome-scale haplotype reconstruction. Genome Biol 2021; 22:101. [PMID: 33845884 PMCID: PMC8040228 DOI: 10.1186/s13059-021-02328-9] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 03/25/2021] [Indexed: 12/13/2022] Open
Abstract
High-quality chromosome-scale haplotype sequences of diploid genomes, polyploid genomes, and metagenomes provide important insights into genetic variation associated with disease and biodiversity. However, whole-genome short read sequencing does not yield haplotype information spanning whole chromosomes directly. Computational assembly of shorter haplotype fragments is required for haplotype reconstruction, which can be challenging owing to limited fragment lengths and high haplotype and repeat variability across genomes. Recent advancements in long-read and chromosome-scale sequencing technologies, alongside computational innovations, are improving the reconstruction of haplotypes at the level of whole chromosomes. Here, we review recent and discuss methodological progress and perspectives in these areas.
Collapse
Affiliation(s)
- Shilpa Garg
- Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
25
|
An 8.22 Mb Assembly and Annotation of the Alpaca ( Vicugna pacos) Y Chromosome. Genes (Basel) 2021; 12:genes12010105. [PMID: 33467186 PMCID: PMC7830431 DOI: 10.3390/genes12010105] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 01/07/2021] [Accepted: 01/14/2021] [Indexed: 12/26/2022] Open
Abstract
The unique evolutionary dynamics and complex structure make the Y chromosome the most diverse and least understood region in the mammalian genome, despite its undisputable role in sex determination, development, and male fertility. Here we present the first contig-level annotated draft assembly for the alpaca (Vicugna pacos) Y chromosome based on hybrid assembly of short- and long-read sequence data of flow-sorted Y. The latter was also used for cDNA selection providing Y-enriched testis transcriptome for annotation. The final assembly of 8.22 Mb comprised 4.5 Mb of male specific Y (MSY) and 3.7 Mb of the pseudoautosomal region. In MSY, we annotated 15 X-degenerate genes and two novel transcripts, but no transposed sequences. Two MSY genes, HSFY and RBMY, are multicopy. The pseudoautosomal boundary is located between SHROOM2 and HSFY. Comparative analysis shows that the small and cytogenetically distinct alpaca Y shares most of MSY sequences with the larger dromedary and Bactrian camel Y chromosomes. Most of alpaca X-degenerate genes are also shared with other mammalian MSYs, though WWC3Y is Y-specific only in alpaca/camels and the horse. The partial alpaca Y assembly is a starting point for further expansion and will have applications in the study of camelid populations and male biology.
Collapse
|
26
|
Cechova M. Probably Correct: Rescuing Repeats with Short and Long Reads. Genes (Basel) 2020; 12:48. [PMID: 33396198 PMCID: PMC7823596 DOI: 10.3390/genes12010048] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 12/23/2020] [Accepted: 12/24/2020] [Indexed: 02/07/2023] Open
Abstract
Ever since the introduction of high-throughput sequencing following the human genome project, assembling short reads into a reference of sufficient quality posed a significant problem as a large portion of the human genome-estimated 50-69%-is repetitive. As a result, a sizable proportion of sequencing reads is multi-mapping, i.e., without a unique placement in the genome. The two key parameters for whether or not a read is multi-mapping are the read length and genome complexity. Long reads are now able to span difficult, heterochromatic regions, including full centromeres, and characterize chromosomes from "telomere to telomere". Moreover, identical reads or repeat arrays can be differentiated based on their epigenetic marks, such as methylation patterns, aiding in the assembly process. This is despite the fact that long reads still contain a modest percentage of sequencing errors, disorienting the aligners and assemblers both in accuracy and speed. Here, I review the proposed and implemented solutions to the repeat resolution and the multi-mapping read problem, as well as the downstream consequences of reference choice, repeat masking, and proper representation of sex chromosomes. I also consider the forthcoming challenges and solutions with regards to long reads, where we expect the shift from the problem of repeat localization within a single individual to the problem of repeat positioning within pangenomes.
Collapse
Affiliation(s)
- Monika Cechova
- Genetics and Reproductive Biotechnologies, Veterinary Research Institute, Central European Institute of Technology (CEITEC), 621 00 Brno, Czech Republic
| |
Collapse
|
27
|
MacKinnon RN, Peverall J, Campbell LJ, Wall M. Detailed molecular cytogenetic characterisation of the myeloid cell line U937 reveals the fate of homologous chromosomes and shows that centromere capture is a feature of genome instability. Mol Cytogenet 2020; 13:50. [PMID: 33317567 PMCID: PMC7737353 DOI: 10.1186/s13039-020-00517-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 11/02/2020] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND The U937 cell line is widely employed as a research tool. It has a complex karyotype. A PICALM-MLLT10 fusion gene formed by the recurrent t(10;11) translocation is present, and the myeloid common deleted region at 20q12 has been lost from its near-triploid karyotype. We carried out a detailed investigation of U937 genome reorganisation including the chromosome 20 rearrangements and other complex rearrangements. RESULTS SNP array, G-banding and Multicolour FISH identified chromosome segments resulting from unbalanced and balanced rearrangements. The organisation of the abnormal chromosomes containing these segments was then reconstructed with the strategic use of targeted metaphase FISH. This provided more accurate karyotype information for the evolving karyotype. Rearrangements involving the homologues of a chromosome pair could be differentiated in most instances. Centromere capture was demonstrated in an abnormal chromosome containing parts of chromosomes 16 and 20 which were stabilised by joining to a short section of chromosome containing an 11 centromere. This adds to the growing number of examples of centromere capture, which to date have a high incidence in complex karyotypes where the centromeres of the rearranged chromosomes are identified. There were two normal copies of one chromosome 20 homologue, and complex rearrangement of the other homologue including loss of the 20q12 common deleted region. This confirmed the previously reported loss of heterozygosity of this region in U937, and defined the rearrangements giving rise to this loss. CONCLUSIONS Centromere capture, stabilising chromosomes pieced together from multiple segments, may be a common feature of complex karyotypes. However, it has only recently been recognised, as this requires deliberate identification of the centromeres of abnormal chromosomes. The approach presented here is invaluable for studying complex reorganised genomes such as those produced by chromothripsis, and provides a more complete picture than can be obtained by microarray, karyotyping or FISH studies alone. One major advantage of SNP arrays for this process is that the two homologues can usually be distinguished when there is more than one rearrangement of a chromosome pair. Tracking the fate of each homologue and of highly repetitive DNA regions such as centromeres helps build a picture of genome evolution. Centromere- and telomere-containing elements are important to deducing chromosome structure. This study confirms and highlights ongoing evolution in cultured cell lines.
Collapse
Affiliation(s)
- Ruth N. MacKinnon
- Victorian Cancer Cytogenetics Service, St Vincent’s Hospital, PO Box 2900, Fitzroy, Melbourne, 3065 Australia
- Department of Medicine, St Vincent’s Hospital, University of Melbourne, Parkville, Australia
| | - Joanne Peverall
- PathWest Department of Diagnostic Genomics, PathWest Laboratory Medicine, QEII Medical Centre, Nedlands, Australia
| | - Lynda J. Campbell
- Victorian Cancer Cytogenetics Service, St Vincent’s Hospital, PO Box 2900, Fitzroy, Melbourne, 3065 Australia
- Department of Medicine, St Vincent’s Hospital, University of Melbourne, Parkville, Australia
| | - Meaghan Wall
- Victorian Clinical Genetics Services, Parkville, Melbourne, Australia
- Murdoch Children’s Research Institute, Parkville, Melbourne, Australia
| |
Collapse
|
28
|
Artificial chromosomes. Exp Cell Res 2020; 396:112302. [PMID: 32980292 DOI: 10.1016/j.yexcr.2020.112302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|