1
|
Logsdon GA, Rozanski AN, Ryabov F, Potapova T, Shepelev VA, Catacchio CR, Porubsky D, Mao Y, Yoo D, Rautiainen M, Koren S, Nurk S, Lucas JK, Hoekzema K, Munson KM, Gerton JL, Phillippy AM, Ventura M, Alexandrov IA, Eichler EE. The variation and evolution of complete human centromeres. Nature 2024; 629:136-145. [PMID: 38570684 PMCID: PMC11062924 DOI: 10.1038/s41586-024-07278-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 03/07/2024] [Indexed: 04/05/2024]
Abstract
Human centromeres have been traditionally very difficult to sequence and assemble owing to their repetitive nature and large size1. As a result, patterns of human centromeric variation and models for their evolution and function remain incomplete, despite centromeres being among the most rapidly mutating regions2,3. Here, using long-read sequencing, we completely sequenced and assembled all centromeres from a second human genome and compared it to the finished reference genome4,5. We find that the two sets of centromeres show at least a 4.1-fold increase in single-nucleotide variation when compared with their unique flanks and vary up to 3-fold in size. Moreover, we find that 45.8% of centromeric sequence cannot be reliably aligned using standard methods owing to the emergence of new α-satellite higher-order repeats (HORs). DNA methylation and CENP-A chromatin immunoprecipitation experiments show that 26% of the centromeres differ in their kinetochore position by >500 kb. To understand evolutionary change, we selected six chromosomes and sequenced and assembled 31 orthologous centromeres from the common chimpanzee, orangutan and macaque genomes. Comparative analyses reveal a nearly complete turnover of α-satellite HORs, with characteristic idiosyncratic changes in α-satellite HORs for each species. Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the short (p) and long (q) arms across centromeres and reveals that novel α-satellite HORs share a monophyletic origin, providing a strategy to estimate the rate of saltatory amplification and mutation of human centromeric DNA.
Collapse
Affiliation(s)
- Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Allison N Rozanski
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | | | - Claudia R Catacchio
- Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, Bari, Italy
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Oxford Nanopore Technologies, Oxford, United Kingdom
| | - Julian K Lucas
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mario Ventura
- Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, Bari, Italy
| | - Ivan A Alexandrov
- Department of Human Molecular Genetics and Biochemistry, Tel Aviv University, Tel Aviv, Israel
- Department of Anatomy and Anthropology, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Dan David Center for Human Evolution and Biohistory Research, Tel Aviv University, Tel Aviv, Israel
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
2
|
Kixmoeller K, Chang YW, Black BE. Centromeric chromatin clearings demarcate the site of kinetochore formation. bioRxiv 2024:2024.04.26.591177. [PMID: 38712116 PMCID: PMC11071481 DOI: 10.1101/2024.04.26.591177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
The centromere is the chromosomal locus that recruits the kinetochore, directing faithful propagation of the genome during cell division. The kinetochore has been interrogated by electron microscopy since the middle of the last century, but with methodologies that compromised fine structure. Using cryo-ET on human mitotic chromosomes, we reveal a distinctive architecture at the centromere: clustered 20-25 nm nucleosome-associated complexes within chromatin clearings that delineate them from surrounding chromatin. Centromere components CENP-C and CENP-N are each required for the integrity of the complexes, while CENP-C is also required to maintain the chromatin clearing. We further visualize the scaffold of the fibrous corona, a structure amplified at unattached kinetochores, revealing crescent-shaped parallel arrays of fibrils that extend >1 μm. Thus, we reveal how the organization of centromeric chromatin creates a clearing at the site of kinetochore formation as well as the nature of kinetochore amplification mediated by corona fibrils.
Collapse
|
3
|
Huang S, Shi W, Li S, Fan Q, Yang C, Cao J, Wu L. Advanced sequencing-based high-throughput and long-read single-cell transcriptome analysis. Lab Chip 2024. [PMID: 38669201 DOI: 10.1039/d4lc00105b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]
Abstract
Cells are the fundamental building blocks of living systems, exhibiting significant heterogeneity. The transcriptome connects the cellular genotype and phenotype, and profiling single-cell transcriptomes is critical for uncovering distinct cell types, states, and the interplay between cells in development, health, and disease. Nevertheless, single-cell transcriptome analysis faces daunting challenges due to the low abundance and diverse nature of RNAs in individual cells, as well as their heterogeneous expression. The advent and continuous advancements of next-generation sequencing (NGS) and third-generation sequencing (TGS) technologies have solved these problems and facilitated the high-throughput, sensitive, full-length, and rapid profiling of single-cell RNAs. In this review, we provide a broad introduction to current methodologies for single-cell transcriptome sequencing. First, state-of-the-art advancements in high-throughput and full-length single-cell RNA sequencing (scRNA-seq) platforms using NGS are reviewed. Next, TGS-based long-read scRNA-seq methods are summarized. Finally, a brief conclusion and perspectives for comprehensive single-cell transcriptome analysis are discussed.
Collapse
Affiliation(s)
- Shanqing Huang
- Discipline of Intelligent Instrument and Equipment, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Weixiong Shi
- Discipline of Intelligent Instrument and Equipment, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Shiyu Li
- Discipline of Intelligent Instrument and Equipment, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Qian Fan
- Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200127, China.
| | - Chaoyong Yang
- Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200127, China.
- Discipline of Intelligent Instrument and Equipment, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China
| | - Jiao Cao
- Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200127, China.
| | - Lingling Wu
- Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200127, China.
| |
Collapse
|
4
|
Espinosa E, Bautista R, Larrosa R, Plata O. Advancements in long-read genome sequencing technologies and algorithms. Genomics 2024; 116:110842. [PMID: 38608738 DOI: 10.1016/j.ygeno.2024.110842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 04/01/2024] [Accepted: 04/06/2024] [Indexed: 04/14/2024]
Abstract
The recent advent of long read sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore technology (ONT), have led to substantial improvements in accuracy and computational cost in sequencing genomes. However, de novo whole-genome assembly still presents significant challenges related to the quality of the results. Pursuing de novo whole-genome assembly remains a formidable challenge, underscored by intricate considerations surrounding computational demands and result quality. As sequencing accuracy and throughput steadily advance, a continuous stream of innovative assembly tools floods the field. Navigating this dynamic landscape necessitates a reasonable choice of sequencing platform, depth, and assembly tools to orchestrate high-quality genome reconstructions. This comprehensive review delves into the intricate interplay between cutting-edge long read sequencing technologies, assembly methodologies, and the ever-evolving field of genomics. With a focus on addressing the pivotal challenges and harnessing the opportunities presented by these advancements, we provide an in-depth exploration of the crucial factors influencing the selection of optimal strategies for achieving robust and insightful genome assemblies.
Collapse
Affiliation(s)
- Elena Espinosa
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain.
| | - Rocio Bautista
- Supercomputing and Bioinnovation Center, University of Malaga, C. Severo Ochoa, 34, Malaga 29590, Spain.
| | - Rafael Larrosa
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain; Supercomputing and Bioinnovation Center, University of Malaga, C. Severo Ochoa, 34, Malaga 29590, Spain.
| | - Oscar Plata
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain.
| |
Collapse
|
5
|
Greshnova A, Pál K, Martinez JFI, Canzar S, Makova KD. Transcript Isoform Diversity of Y Chromosome Ampliconic Genes of Great Apes Uncovered Using Long Reads and Telomere-to-Telomere Reference Genome Assemblies. bioRxiv 2024:2024.04.02.587783. [PMID: 38617276 PMCID: PMC11014635 DOI: 10.1101/2024.04.02.587783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Y chromosomes of great apes harbor Ampliconic Genes (YAGs)-multi-copy gene families (BPY2, CDY, DAZ, HSFY, PRY, RBMY, TSPY, VCY, and XKRY) that encode proteins important for spermatogenesis. Previous work assembled YAG transcripts based on their targeted sequencing but not using reference genome assemblies, potentially resulting in an incomplete transcript repertoire. Here we used the recently produced gapless telomere-to-telomere (T2T) Y chromosome assemblies of great ape species (bonobo, chimpanzee, human, gorilla, Bornean orangutan, and Sumatran orangutan) and analyzed RNA data from whole-testis samples for the same species. We generated hybrid transcriptome assemblies by combining targeted long reads (Pacific Biosciences), untargeted long reads (Pacific Biosciences) and untargeted short reads (Illumina)and mapping them to the T2T reference genomes. Compared to the results from the reference-free approach, average transcript length was more than two times higher, and the total number of transcripts decreased three times, improving the quality of the assembled transcriptome. The reference-based transcriptome assemblies allowed us to differentiate transcripts originating from different Y chromosome gene copies and from their non-Y chromosome homologs. We identified two sources of transcriptome diversity-alternative splicing and gene duplication with subsequent diversification of gene copies. For each gene family, we detected transcribed pseudogenes along with protein-coding gene copies. We revealed previously unannotated gene copies of YAGs as compared to currently available NCBI annotations, as well as novel isoforms for annotated gene copies. This analysis paves the way for better understanding Y chromosome gene functions, which is important given their role in spermatogenesis.
Collapse
Affiliation(s)
- Aleksandra Greshnova
- Department of Biology, Penn State University, University Park, PA, USA
- Current address: Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Karol Pál
- Department of Biology, Penn State University, University Park, PA, USA
| | - Juan Francisco Iturralde Martinez
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, United States
- Huck Institutes of the Life Sciences. Pennsylvania State University, University Park, PA 16802, USA
| | - Stefan Canzar
- Faculty of Informatics and Data Science, University of Regensburg, Regensburg, Germany
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, United States
| | - Kateryna D Makova
- Department of Biology, Penn State University, University Park, PA, USA
| |
Collapse
|
6
|
Darian JC, Kundu R, Rajaby R, Sung WK. Constructing telomere-to-telomere diploid genome by polishing haploid nanopore-based assembly. Nat Methods 2024; 21:574-583. [PMID: 38459383 DOI: 10.1038/s41592-023-02141-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 11/30/2023] [Indexed: 03/10/2024]
Abstract
Draft genomes generated from Oxford Nanopore Technologies (ONT) long reads are known to have a higher error rate. Although existing genome polishers can enhance their quality, the error rate (including mismatches, indels and switching errors between paternal and maternal haplotypes) can be significant. Here, we develop two polishers, hypo-short and hypo-hybrid to address this issue. Hypo-short utilizes Illumina short reads to polish an ONT-based draft assembly, resulting in a high-quality assembly with low error rates and switching errors. Expanding on this, hypo-hybrid incorporates ONT long reads to further refine the assembly into a diploid representation. Leveraging on hypo-hybrid, we have created a diploid genome assembly pipeline called hypo-assembler. Hypo-assembler automates the generation of highly accurate, contiguous and nearly complete diploid assemblies using ONT long reads, Illumina short reads and optionally Hi-C reads. Notably, our solution even allows for the production of telomere-to-telomere diploid genomes with additional manual steps. As a proof of concept, we successfully assembled a fully phased telomere-to-telomere diploid genome of HG00733, achieving a quality value exceeding 50.
Collapse
Affiliation(s)
| | - Ritu Kundu
- School of Computing, National University of Singapore, Singapore, Singapore
| | | | - Wing-Kin Sung
- School of Computing, National University of Singapore, Singapore, Singapore.
- Genome Institute of Singapore, Singapore, Singapore.
- Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong, China.
- JC STEM Laboratory of Computational Genomics, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong, China.
- Hong Kong Genome Institute, Hong Kong, China.
| |
Collapse
|
7
|
Ermini L, Driguez P. The Application of Long-Read Sequencing to Cancer. Cancers (Basel) 2024; 16:1275. [PMID: 38610953 PMCID: PMC11011098 DOI: 10.3390/cancers16071275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 03/20/2024] [Accepted: 03/21/2024] [Indexed: 04/14/2024] Open
Abstract
Cancer is a multifaceted disease arising from numerous genomic aberrations that have been identified as a result of advancements in sequencing technologies. While next-generation sequencing (NGS), which uses short reads, has transformed cancer research and diagnostics, it is limited by read length. Third-generation sequencing (TGS), led by the Pacific Biosciences and Oxford Nanopore Technologies platforms, employs long-read sequences, which have marked a paradigm shift in cancer research. Cancer genomes often harbour complex events, and TGS, with its ability to span large genomic regions, has facilitated their characterisation, providing a better understanding of how complex rearrangements affect cancer initiation and progression. TGS has also characterised the entire transcriptome of various cancers, revealing cancer-associated isoforms that could serve as biomarkers or therapeutic targets. Furthermore, TGS has advanced cancer research by improving genome assemblies, detecting complex variants, and providing a more complete picture of transcriptomes and epigenomes. This review focuses on TGS and its growing role in cancer research. We investigate its advantages and limitations, providing a rigorous scientific analysis of its use in detecting previously hidden aberrations missed by NGS. This promising technology holds immense potential for both research and clinical applications, with far-reaching implications for cancer diagnosis and treatment.
Collapse
Affiliation(s)
- Luca Ermini
- NORLUX Neuro-Oncology Laboratory, Department of Cancer Research, Luxembourg Institute of Health, L-1210 Luxembourg, Luxembourg
| | - Patrick Driguez
- Bioscience Core Lab, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
8
|
Naish M, Henderson IR. The structure, function, and evolution of plant centromeres. Genome Res 2024; 34:161-178. [PMID: 38485193 PMCID: PMC10984392 DOI: 10.1101/gr.278409.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/22/2024]
Abstract
Centromeres are essential regions of eukaryotic chromosomes responsible for the formation of kinetochore complexes, which connect to spindle microtubules during cell division. Notably, although centromeres maintain a conserved function in chromosome segregation, the underlying DNA sequences are diverse both within and between species and are predominantly repetitive in nature. The repeat content of centromeres includes high-copy tandem repeats (satellites), and/or specific families of transposons. The functional region of the centromere is defined by loading of a specific histone 3 variant (CENH3), which nucleates the kinetochore and shows dynamic regulation. In many plants, the centromeres are composed of satellite repeat arrays that are densely DNA methylated and invaded by centrophilic retrotransposons. In some cases, the retrotransposons become the sites of CENH3 loading. We review the structure of plant centromeres, including monocentric, holocentric, and metapolycentric architectures, which vary in the number and distribution of kinetochore attachment sites along chromosomes. We discuss how variation in CENH3 loading can drive genome elimination during early cell divisions of plant embryogenesis. We review how epigenetic state may influence centromere identity and discuss evolutionary models that seek to explain the paradoxically rapid change of centromere sequences observed across species, including the potential roles of recombination. We outline putative modes of selection that could act within the centromeres, as well as the role of repeats in driving cycles of centromere evolution. Although our primary focus is on plant genomes, we draw comparisons with animal and fungal centromeres to derive a eukaryote-wide perspective of centromere structure and function.
Collapse
Affiliation(s)
- Matthew Naish
- Department of Plant Sciences, University of Cambridge, Cambridge CB2 3EA, United Kingdom
| | - Ian R Henderson
- Department of Plant Sciences, University of Cambridge, Cambridge CB2 3EA, United Kingdom
| |
Collapse
|
9
|
Olbrich M, Bartels L, Wohlers I. Sequencing technologies and hardware-accelerated parallel computing transform computational genomics research. Front Bioinform 2024; 4:1384497. [PMID: 38567256 PMCID: PMC10985184 DOI: 10.3389/fbinf.2024.1384497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 03/07/2024] [Indexed: 04/04/2024] Open
Affiliation(s)
- Michael Olbrich
- Center for Biotechnology, Khalifa University for Science and Technology, Abu Dhabi, United Arab Emirates
| | - Lennart Bartels
- Biomolecular Data Science in Pneumology, Research Center Borstel, Borstel, Germany
| | - Inken Wohlers
- Biomolecular Data Science in Pneumology, Research Center Borstel, Borstel, Germany
- University of Lübeck, Lübeck, Germany
| |
Collapse
|
10
|
Uppuluri L, Shi CH, Varapula D, Young E, Ehrlich RL, Wang Y, Piazza D, Mell JC, Yip KY, Xiao M. A long-read sequencing strategy with overlapping linkers on adjacent fragments (OLAF-Seq) for targeted resequencing and enrichment. Sci Rep 2024; 14:5583. [PMID: 38448490 PMCID: PMC10917763 DOI: 10.1038/s41598-024-56402-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 03/06/2024] [Indexed: 03/08/2024] Open
Abstract
In this report, we present OLAF-Seq, a novel strategy to construct a long-read sequencing library such that adjacent fragments are linked with end-terminal duplications. We use the CRISPR-Cas9 nickase enzyme and a pool of multiple sgRNAs to perform non-random fragmentation of targeted long DNA molecules (> 300kb) into smaller library-sized fragments (about 20 kbp) in a manner so as to retain physical linkage information (up to 1000 bp) between adjacent fragments. DNA molecules targeted for fragmentation are preferentially ligated with adaptors for sequencing, so this method can enrich targeted regions while taking advantage of the long-read sequencing platforms. This enables the sequencing of target regions with significantly lower total coverage, and the genome sequence within linker regions provides information for assembly and phasing. We demonstrated the validity and efficacy of the method first using phage and then by sequencing a panel of 100 full-length cancer-related genes (including both exons and introns) in the human genome. When the designed linkers contained heterozygous genetic variants, long haplotypes could be established. This sequencing strategy can be readily applied in both PacBio and Oxford Nanopore platforms for both long and short genes with an easy protocol. This economically viable approach is useful for targeted enrichment of hundreds of target genomic regions and where long no-gap contigs need deep sequencing.
Collapse
Affiliation(s)
- Lahari Uppuluri
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, 19104, USA
| | - Christina Huan Shi
- Cancer Genome and Epigenetics Program, NCI-Designated Cancer Center, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, 92037, USA
| | - Dharma Varapula
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, 19104, USA
| | - Eleanor Young
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, 19104, USA
| | - Rachel L Ehrlich
- Department of Microbiology and Immunology, College of Medicine, Drexel University, Philadelphia, PA, 19104, USA
- Center for Genomic Sciences, Institute of Molecular Medicine and Infectious Disease, Drexel University, Philadelphia, PA, 19104, USA
| | - Yilin Wang
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, 19104, USA
| | - Danielle Piazza
- Department of Microbiology and Immunology, College of Medicine, Drexel University, Philadelphia, PA, 19104, USA
- Center for Genomic Sciences, Institute of Molecular Medicine and Infectious Disease, Drexel University, Philadelphia, PA, 19104, USA
| | - Joshua Chang Mell
- Department of Microbiology and Immunology, College of Medicine, Drexel University, Philadelphia, PA, 19104, USA
- Center for Genomic Sciences, Institute of Molecular Medicine and Infectious Disease, Drexel University, Philadelphia, PA, 19104, USA
| | - Kevin Y Yip
- Cancer Genome and Epigenetics Program, NCI-Designated Cancer Center, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, 92037, USA
| | - Ming Xiao
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, 19104, USA.
- Center for Genomic Sciences, Institute of Molecular Medicine and Infectious Disease, Drexel University, Philadelphia, PA, 19104, USA.
| |
Collapse
|
11
|
Ma T, Yan C, Zhang S, Liang D, Mao C, Zhang C. High-quality genome assembly and genetic transformation system of Lasiodiplodia theobromae strain LTTK16-3, a fungal pathogen of Chinese hickory. Microbiol Spectr 2024; 12:e0331123. [PMID: 38349153 PMCID: PMC10913528 DOI: 10.1128/spectrum.03311-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 01/23/2024] [Indexed: 02/15/2024] Open
Abstract
Lasiodiplodia theobromae, as one of the causative agents associated with Chinese hickory trunk cankers, has caused huge economic losses to the Chinese hickory industry. Although the biological characteristics of this pathogen and the occurrence pattern of this disease have been well studied, few studies have addressed the related mechanisms due to the poor molecular and genetic study basis of this fungus. In this study, we sequenced and assembled L. theobromae strain LTTK16-3, isolated from a Chinese hickory tree (cultivar of Linan) in Linan, Zhejiang province, China. Phylogenetic analysis and comparative genomics analysis presented crucial cues in the prediction of LTTK16-3, which shared similar regulatory mechanisms of transcription, DNA replication, and DNA damage response with the other four Chinese hickory trunk canker-associated Botryosphaeria strains including, Botryosphaeria dothidea, Botryosphaeria fabicerciana, Botryosphaeria qingyuanensis, and Botryosphaeria corticis. Moreover, it contained 18 strain-specific protein clusters (not conserved in the other L. theobromae strains, AM2As and CITRA15), with potential roles in specific host-pathogen interactions during the Chinese hickory infection. Additionally, an efficient system for L. theobromae protoplast preparation and polyethylene glycol (PEG) -mediated genetic transformation was firstly established as the foundation for its future mechanisms study. Collectively, the high-quality genome data and the efficient transformation system of L. theobromae here set up the possibility of targeted molecular improvements for Chinese hickory canker control.IMPORTANCEFungi with disparate genomic features are physiologically diverse, possessing species-specific survival strategies and environmental adaptation mechanisms. The high-quality genome data and related molecular genetic studies are the basis for revealing the mechanisms behind the physiological traits that are responsible for their environmental fitness. In this study, we sequenced and assembled the LTTK16-3 strain, the genome of Lasiodiplodia theobromae first obtained from a diseased Chinese hickory tree (cultivar of Linan) in Linan, Zhejiang province, China. Further phylogenetic analysis and comparative genomics analysis provide crucial cues in the prediction of the proteins with potential roles in specific host-pathogen interactions during the Chinese hickory infection. An efficient PEG-mediated genetic transformation system of L. theobromae was established as the foundation for the future mechanisms exploration. The above genetic information and tools set up valuable clues to study L. theobromae pathogenesis and assist in Chinese hickory canker control.
Collapse
Affiliation(s)
- Tianling Ma
- Department of Plant Pathology, Zhejiang Agriculture and Forest University, Hangzhou, China
| | - Chenyi Yan
- Department of Plant Pathology, Zhejiang Agriculture and Forest University, Hangzhou, China
| | - Shuya Zhang
- Department of Plant Pathology, Zhejiang Agriculture and Forest University, Hangzhou, China
| | - Dong Liang
- Department of Plant Pathology, Zhejiang Agriculture and Forest University, Hangzhou, China
| | - Chengxin Mao
- Department of Plant Pathology, Zhejiang Agriculture and Forest University, Hangzhou, China
| | - Chuanqing Zhang
- Department of Plant Pathology, Zhejiang Agriculture and Forest University, Hangzhou, China
| |
Collapse
|
12
|
He J, Zeng C, Li M. Plant Functional Genomics Based on High-Throughput CRISPR Library Knockout Screening: A Perspective. Adv Genet (Hoboken) 2024; 5:2300203. [PMID: 38465224 PMCID: PMC10919289 DOI: 10.1002/ggn2.202300203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 10/19/2023] [Indexed: 03/12/2024]
Abstract
Plant biology studies in the post-genome era have been focused on annotating genome sequences' functions. The established plant mutant collections have greatly accelerated functional genomics research in the past few decades. However, most plant genome sequences' roles and the underlying regulatory networks remain substantially unknown. Clustered, regularly interspaced short palindromic repeat (CRISPR)-associated systems are robust, versatile tools for manipulating plant genomes with various targeted DNA perturbations, providing an excellent opportunity for high-throughput interrogation of DNA elements' roles. This study compares methods frequently used for plant functional genomics and then discusses different DNA multi-targeted strategies to overcome gene redundancy using the CRISPR-Cas9 system. Next, this work summarizes recent reports using CRISPR libraries for high-throughput gene knockout and function discoveries in plants. Finally, this work envisions the future perspective of optimizing and leveraging CRISPR library screening in plant genomes' other uncharacterized DNA sequences.
Collapse
Affiliation(s)
- Jianjie He
- Department of BiotechnologyCollege of Life Science and TechnologyHuazhong University of Science and TechnologyWuhan430074China
- Key Laboratory of Molecular Biophysics of the Ministry of EducationWuhan430074China
| | - Can Zeng
- Department of BiotechnologyCollege of Life Science and TechnologyHuazhong University of Science and TechnologyWuhan430074China
- Key Laboratory of Molecular Biophysics of the Ministry of EducationWuhan430074China
| | - Maoteng Li
- Department of BiotechnologyCollege of Life Science and TechnologyHuazhong University of Science and TechnologyWuhan430074China
- Key Laboratory of Molecular Biophysics of the Ministry of EducationWuhan430074China
| |
Collapse
|
13
|
Genner R, Akeson S, Meredith M, Jerez PA, Malik L, Baker B, Miano-Burkhardt A, Paten B, Billingsley KJ, Blauwendraat C, Jain M. Assessing methylation detection for primary human tissue using Nanopore sequencing. bioRxiv 2024:2024.02.29.581569. [PMID: 38464144 PMCID: PMC10925257 DOI: 10.1101/2024.02.29.581569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
DNA methylation most commonly occurs as 5-methylcytosine (5-mC) in the human genome and has been associated with human diseases. Recent developments in single-molecule sequencing technologies (Oxford Nanopore Technologies (ONT) and Pacific Biosciences) have enabled readouts of long, native DNA molecules, including cytosine methylation. ONT recently upgraded their Nanopore sequencing chemistry and kits from R9 to the R10 version, which yielded increased accuracy and sequencing throughput. However the effects on methylation detection have not yet been documented. Here we performed a series of computational analyses to characterize differences in Nanopore-based 5mC detection between the ONT R9 and R10 chemistries. We compared 5mC calls in R9 and R10 for three human genome datasets: a cell line, a frontal cortex brain sample, and a blood sample. We performed an in-depth analysis on CpG islands and homopolymer regions, and documented high concordance for methylation detection among sequencing technologies. The strongest correlation was observed between Nanopore R10 and Illumina bisulfite technologies for cell line-derived datasets. Subtle differences in methylation datasets between technologies can impact analysis tools such as differential methylation calling software. Our findings show that comparisons can be drawn between methylation data from different Nanopore chemistries using guided hypotheses. This work will facilitate comparison among Nanopore data cohorts derived using different chemistries from large scale sequencing efforts, such as the NIH CARD Long Read Initiative.
Collapse
Affiliation(s)
- Rylee Genner
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Stuart Akeson
- Department of Bioengineering, Northeastern University, Boston, MA, USA
| | - Melissa Meredith
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Pilar Alvarez Jerez
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Laksh Malik
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Breeana Baker
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Benedict Paten
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Kimberley J Billingsley
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - Cornelis Blauwendraat
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD, USA
| | - Miten Jain
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Department of Bioengineering, Northeastern University, Boston, MA, USA
- Department of Physics, Northeastern University, Boston, MA, USA
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| |
Collapse
|
14
|
Song B, Buckler ES, Stitzer MC. New whole-genome alignment tools are needed for tapping into plant diversity. Trends Plant Sci 2024; 29:355-369. [PMID: 37749022 DOI: 10.1016/j.tplants.2023.08.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 07/19/2023] [Accepted: 08/23/2023] [Indexed: 09/27/2023]
Abstract
Genome alignment is one of the most foundational methods for genome sequence studies. With rapid advances in sequencing and assembly technologies, these newly assembled genomes present challenges for alignment tools to meet the increased complexity and scale. Plant genome alignment is technologically challenging because of frequent whole-genome duplications (WGDs) as well as chromosome rearrangements and fractionation, high nucleotide diversity, widespread structural variation, and high transposable element (TE) activity causing large proportions of repeat elements. We summarize classical pairwise and multiple genome alignment (MGA) methods, and highlight techniques that are widely used or are being developed by the plant research community. We also outline the remaining challenges for precise genome alignment and the interpretation of alignment results in plants.
Collapse
Affiliation(s)
- Baoxing Song
- National Key Laboratory of Wheat Improvement, Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agriculture Sciences in Weifang, Weifang, Shandong 261325, China; Key Laboratory of Maize Biology and Genetic Breeding in Arid Area of Northwest Region of the Ministry of Agriculture, College of Agronomy, Northwest A&F University, Yangling, Shaanxi 712100, China.
| | - Edward S Buckler
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA; Section of Plant Breeding and Genetics, Cornell University, Ithaca, NY 14853, USA; Agricultural Research Service, United States Department of Agriculture, Ithaca, NY 14853, USA
| | - Michelle C Stitzer
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA; Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA.
| |
Collapse
|
15
|
Porubsky D, Eichler EE. A 25-year odyssey of genomic technology advances and structural variant discovery. Cell 2024; 187:1024-1037. [PMID: 38290514 DOI: 10.1016/j.cell.2024.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 12/20/2023] [Accepted: 01/02/2024] [Indexed: 02/01/2024]
Abstract
This perspective focuses on advances in genome technology over the last 25 years and their impact on germline variant discovery within the field of human genetics. The field has witnessed tremendous technological advances from microarrays to short-read sequencing and now long-read sequencing. Each technology has provided genome-wide access to different classes of human genetic variation. We are now on the verge of comprehensive variant detection of all forms of variation for the first time with a single assay. We predict that this transition will further transform our understanding of human health and biology and, more importantly, provide novel insights into the dynamic mutational processes shaping our genomes.
Collapse
Affiliation(s)
- David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
16
|
Schreiber M, Jayakodi M, Stein N, Mascher M. Plant pangenomes for crop improvement, biodiversity and evolution. Nat Rev Genet 2024:10.1038/s41576-024-00691-4. [PMID: 38378816 DOI: 10.1038/s41576-024-00691-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/14/2023] [Indexed: 02/22/2024]
Abstract
Plant genome sequences catalogue genes and the genetic elements that regulate their expression. Such inventories further research aims as diverse as mapping the molecular basis of trait diversity in domesticated plants or inquiries into the origin of evolutionary innovations in flowering plants millions of years ago. The transformative technological progress of DNA sequencing in the past two decades has enabled researchers to sequence ever more genomes with greater ease. Pangenomes - complete sequences of multiple individuals of a species or higher taxonomic unit - have now entered the geneticists' toolkit. The genomes of crop plants and their wild relatives are being studied with translational applications in breeding in mind. But pangenomes are applicable also in ecological and evolutionary studies, as they help classify and monitor biodiversity across the tree of life, deepen our understanding of how plant species diverged and show how plants adapt to changing environments or new selection pressures exerted by human beings.
Collapse
Affiliation(s)
- Mona Schreiber
- Department of Biology, University of Marburg, Marburg, Germany
| | - Murukarthick Jayakodi
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
- Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.
| |
Collapse
|
17
|
Liu X, Zheng J, Ding J, Wu J, Zuo F, Zhang G. When Livestock Genomes Meet Third-Generation Sequencing Technology: From Opportunities to Applications. Genes (Basel) 2024; 15:245. [PMID: 38397234 PMCID: PMC10888458 DOI: 10.3390/genes15020245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 01/30/2024] [Accepted: 02/10/2024] [Indexed: 02/25/2024] Open
Abstract
Third-generation sequencing technology has found widespread application in the genomic, transcriptomic, and epigenetic research of both human and livestock genetics. This technology offers significant advantages in the sequencing of complex genomic regions, the identification of intricate structural variations, and the production of high-quality genomes. Its attributes, including long sequencing reads, obviation of PCR amplification, and direct determination of DNA/RNA, contribute to its efficacy. This review presents a comprehensive overview of third-generation sequencing technologies, exemplified by single-molecule real-time sequencing (SMRT) and Oxford Nanopore Technology (ONT). Emphasizing the research advancements in livestock genomics, the review delves into genome assembly, structural variation detection, transcriptome sequencing, and epigenetic investigations enabled by third-generation sequencing. A comprehensive analysis is conducted on the application and potential challenges of third-generation sequencing technology for genome detection in livestock. Beyond providing valuable insights into genome structure analysis and the identification of rare genes in livestock, the review ventures into an exploration of the genetic mechanisms underpinning exemplary traits. This review not only contributes to our understanding of the genomic landscape in livestock but also provides fresh perspectives for the advancement of research in this domain.
Collapse
Affiliation(s)
- Xinyue Liu
- College of Animal Science and Technology, Southwest University, Rongchang, Chongqing 402460, China; (X.L.); (J.Z.); (J.D.); (J.W.); (F.Z.)
| | - Junyuan Zheng
- College of Animal Science and Technology, Southwest University, Rongchang, Chongqing 402460, China; (X.L.); (J.Z.); (J.D.); (J.W.); (F.Z.)
| | - Jialan Ding
- College of Animal Science and Technology, Southwest University, Rongchang, Chongqing 402460, China; (X.L.); (J.Z.); (J.D.); (J.W.); (F.Z.)
| | - Jiaxin Wu
- College of Animal Science and Technology, Southwest University, Rongchang, Chongqing 402460, China; (X.L.); (J.Z.); (J.D.); (J.W.); (F.Z.)
| | - Fuyuan Zuo
- College of Animal Science and Technology, Southwest University, Rongchang, Chongqing 402460, China; (X.L.); (J.Z.); (J.D.); (J.W.); (F.Z.)
- Beef Cattle Engineering and Technology Research Center of Chongqing, Southwest University, Rongchang, Chongqing 402460, China
| | - Gongwei Zhang
- College of Animal Science and Technology, Southwest University, Rongchang, Chongqing 402460, China; (X.L.); (J.Z.); (J.D.); (J.W.); (F.Z.)
- Beef Cattle Engineering and Technology Research Center of Chongqing, Southwest University, Rongchang, Chongqing 402460, China
| |
Collapse
|
18
|
Firdaus Z, Li X. Unraveling the Genetic Landscape of Neurological Disorders: Insights into Pathogenesis, Techniques for Variant Identification, and Therapeutic Approaches. Int J Mol Sci 2024; 25:2320. [PMID: 38396996 PMCID: PMC10889342 DOI: 10.3390/ijms25042320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 02/09/2024] [Accepted: 02/13/2024] [Indexed: 02/25/2024] Open
Abstract
Genetic abnormalities play a crucial role in the development of neurodegenerative disorders (NDDs). Genetic exploration has indeed contributed to unraveling the molecular complexities responsible for the etiology and progression of various NDDs. The intricate nature of rare and common variants in NDDs contributes to a limited understanding of the genetic risk factors associated with them. Advancements in next-generation sequencing have made whole-genome sequencing and whole-exome sequencing possible, allowing the identification of rare variants with substantial effects, and improving the understanding of both Mendelian and complex neurological conditions. The resurgence of gene therapy holds the promise of targeting the etiology of diseases and ensuring a sustained correction. This approach is particularly enticing for neurodegenerative diseases, where traditional pharmacological methods have fallen short. In the context of our exploration of the genetic epidemiology of the three most prevalent NDDs-amyotrophic lateral sclerosis, Alzheimer's disease, and Parkinson's disease, our primary goal is to underscore the progress made in the development of next-generation sequencing. This progress aims to enhance our understanding of the disease mechanisms and explore gene-based therapies for NDDs. Throughout this review, we focus on genetic variations, methodologies for their identification, the associated pathophysiology, and the promising potential of gene therapy. Ultimately, our objective is to provide a comprehensive and forward-looking perspective on the emerging research arena of NDDs.
Collapse
Affiliation(s)
- Zeba Firdaus
- Department of Internal Medicine, Mayo Clinic, Rochester, MN 55905, USA;
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN 55905, USA
| | - Xiaogang Li
- Department of Internal Medicine, Mayo Clinic, Rochester, MN 55905, USA;
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN 55905, USA
| |
Collapse
|
19
|
Li J, Ma H, Qin Y, Zhao Z, Niu Y, Lian J, Li J, Noor Z, Guo S, Yu Z, Zhang Y. Chromosome-level genome assembly and annotation of rare and endangered tropical bivalve, Tridacna crocea. Sci Data 2024; 11:186. [PMID: 38341475 PMCID: PMC10858879 DOI: 10.1038/s41597-024-03014-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 01/24/2024] [Indexed: 02/12/2024] Open
Abstract
Tridacna crocea is an ecologically important marine bivalve inhabiting tropical coral reef waters. High quality and available genomic resources will help us understand the population structure and genetic diversity of giant clams. This study reports a high-quality chromosome-scale T. crocea genome sequence of 1.30 Gb, with a scaffold N50 and contig N50 of 56.38 Mb and 1.29 Mb, respectively, which was assembled by combining PacBio long reads and Hi-C sequencing data. Repetitive sequences cover 71.60% of the total length, and a total of 25,440 protein-coding genes were annotated. A total of 1,963 non-coding RNA (ncRNA) were determined in the T. crocea genome, including 62 micro RNA (miRNA), 58 small nuclear RNA (snRNA), 83 ribosomal RNA (rRNA), and 1,760 transfer RNA (tRNA). Phylogenetic analysis revealed that giant clams diverged from oyster about 505.7 Mya during the evolution of bivalves. The genome assembly presented here provides valuable genomic resources to enhance our understanding of the genetic diversity and population structure of giant clams.
Collapse
Affiliation(s)
- Jun Li
- Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Provincial Key Laboratory of Applied Marine Biology, Innovation Academy of South China Sea Ecology and Environmental Engineering, South China Sea Institute of Oceanology, Chinese Academy of Science, Guangzhou, 510301, China
- Hainan Key Laboratory of Tropical Marine Biotechnology, Hainan Sanya Marine Ecosystem National Observation and Research Station, Sanya, 572024, China
- Daya Bay Marine Biology Research Station, Chinese Academy of Sciences, Shenzhen, 518124, China
- Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai, 519015, China
| | - Haitao Ma
- Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Provincial Key Laboratory of Applied Marine Biology, Innovation Academy of South China Sea Ecology and Environmental Engineering, South China Sea Institute of Oceanology, Chinese Academy of Science, Guangzhou, 510301, China
- Hainan Key Laboratory of Tropical Marine Biotechnology, Hainan Sanya Marine Ecosystem National Observation and Research Station, Sanya, 572024, China
- Daya Bay Marine Biology Research Station, Chinese Academy of Sciences, Shenzhen, 518124, China
- Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai, 519015, China
| | - Yanpin Qin
- Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Provincial Key Laboratory of Applied Marine Biology, Innovation Academy of South China Sea Ecology and Environmental Engineering, South China Sea Institute of Oceanology, Chinese Academy of Science, Guangzhou, 510301, China
- Hainan Key Laboratory of Tropical Marine Biotechnology, Hainan Sanya Marine Ecosystem National Observation and Research Station, Sanya, 572024, China
- Daya Bay Marine Biology Research Station, Chinese Academy of Sciences, Shenzhen, 518124, China
- Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai, 519015, China
| | - Zhen Zhao
- Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Provincial Key Laboratory of Applied Marine Biology, Innovation Academy of South China Sea Ecology and Environmental Engineering, South China Sea Institute of Oceanology, Chinese Academy of Science, Guangzhou, 510301, China
- Hainan Key Laboratory of Tropical Marine Biotechnology, Hainan Sanya Marine Ecosystem National Observation and Research Station, Sanya, 572024, China
| | | | | | - Jiang Li
- Biozeron Shenzhen, Inc, Shenzhen, 518000, China
| | - Zohaib Noor
- Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Provincial Key Laboratory of Applied Marine Biology, Innovation Academy of South China Sea Ecology and Environmental Engineering, South China Sea Institute of Oceanology, Chinese Academy of Science, Guangzhou, 510301, China
- Hainan Key Laboratory of Tropical Marine Biotechnology, Hainan Sanya Marine Ecosystem National Observation and Research Station, Sanya, 572024, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Shuming Guo
- Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Provincial Key Laboratory of Applied Marine Biology, Innovation Academy of South China Sea Ecology and Environmental Engineering, South China Sea Institute of Oceanology, Chinese Academy of Science, Guangzhou, 510301, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Ziniu Yu
- Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Provincial Key Laboratory of Applied Marine Biology, Innovation Academy of South China Sea Ecology and Environmental Engineering, South China Sea Institute of Oceanology, Chinese Academy of Science, Guangzhou, 510301, China.
- Hainan Key Laboratory of Tropical Marine Biotechnology, Hainan Sanya Marine Ecosystem National Observation and Research Station, Sanya, 572024, China.
- Daya Bay Marine Biology Research Station, Chinese Academy of Sciences, Shenzhen, 518124, China.
- Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai, 519015, China.
| | - Yuehuan Zhang
- Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Provincial Key Laboratory of Applied Marine Biology, Innovation Academy of South China Sea Ecology and Environmental Engineering, South China Sea Institute of Oceanology, Chinese Academy of Science, Guangzhou, 510301, China.
- Hainan Key Laboratory of Tropical Marine Biotechnology, Hainan Sanya Marine Ecosystem National Observation and Research Station, Sanya, 572024, China.
- Daya Bay Marine Biology Research Station, Chinese Academy of Sciences, Shenzhen, 518124, China.
- Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai, 519015, China.
| |
Collapse
|
20
|
Versoza CJ, Weiss S, Johal R, La Rosa B, Jensen JD, Pfeifer SP. Novel Insights into the Landscape of Crossover and Noncrossover Events in Rhesus Macaques (Macaca mulatta). Genome Biol Evol 2024; 16:evad223. [PMID: 38051960 PMCID: PMC10773715 DOI: 10.1093/gbe/evad223] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 11/04/2023] [Accepted: 11/28/2023] [Indexed: 12/07/2023] Open
Abstract
Meiotic recombination landscapes differ greatly between distantly and closely related taxa, populations, individuals, sexes, and even within genomes; however, the factors driving this variation are yet to be well elucidated. Here, we directly estimate contemporary crossover rates and, for the first time, noncrossover rates in rhesus macaques (Macaca mulatta) from four three-generation pedigrees comprising 32 individuals. We further compare these results with historical, demography-aware, linkage disequilibrium-based recombination rate estimates. From paternal meioses in the pedigrees, 165 crossover events with a median resolution of 22.3 kb were observed, corresponding to a male autosomal map length of 2,357 cM-approximately 15% longer than an existing linkage map based on human microsatellite loci. In addition, 85 noncrossover events with a mean tract length of 155 bp were identified-similar to the tract lengths observed in the only other two primates in which noncrossovers have been studied to date, humans and baboons. Consistent with observations in other placental mammals with PRDM9-directed recombination, crossover (and to a lesser extent noncrossover) events in rhesus macaques clustered in intergenic regions and toward the chromosomal ends in males-a pattern in broad agreement with the historical, sex-averaged recombination rate estimates-and evidence of GC-biased gene conversion was observed at noncrossover sites.
Collapse
Affiliation(s)
- Cyril J Versoza
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| | - Sarah Weiss
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Ravneet Johal
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Bruno La Rosa
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| | - Susanne P Pfeifer
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
21
|
Yang L, Metzger GA, Padilla Del Valle R, Delgadillo Rubalcaba D, McLaughlin RN. Evolutionary insights from profiling LINE-1 activity at allelic resolution in a single human genome. EMBO J 2024; 43:112-131. [PMID: 38177314 PMCID: PMC10883270 DOI: 10.1038/s44318-023-00007-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 10/18/2023] [Accepted: 11/10/2023] [Indexed: 01/06/2024] Open
Abstract
Transposable elements have created the majority of the sequence in many genomes. In mammals, LINE-1 retrotransposons have been expanding for more than 100 million years as distinct, consecutive lineages; however, the drivers of this recurrent lineage emergence and disappearance are unknown. Most human genome assemblies provide a record of this ancient evolution, but fail to resolve ongoing LINE-1 retrotranspositions. Utilizing the human CHM1 long-read-based haploid assembly, we identified and cloned all full-length, intact LINE-1s, and found 29 LINE-1s with measurable in vitro retrotransposition activity. Among individuals, these LINE-1s varied in their presence, their allelic sequences, and their activity. We found that recently retrotransposed LINE-1s tend to be active in vitro and polymorphic in the population relative to more ancient LINE-1s. However, some rare allelic forms of old LINE-1s retain activity, suggesting older lineages can persist longer than expected. Finally, in LINE-1s with in vitro activity and in vivo fitness, we identified mutations that may have increased replication in ancient genomes and may prove promising candidates for mechanistic investigations of the drivers of LINE-1 evolution and which LINE-1 sequences contribute to human disease.
Collapse
Affiliation(s)
- Lei Yang
- Pacific Northwest Research Institute, Seattle, WA, USA
| | | | - Ricky Padilla Del Valle
- Pacific Northwest Research Institute, Seattle, WA, USA
- Molecular and Cellular Biology Graduate Program, University of Washington, Seattle, WA, USA
| | | | - Richard N McLaughlin
- Pacific Northwest Research Institute, Seattle, WA, USA.
- Molecular and Cellular Biology Graduate Program, University of Washington, Seattle, WA, USA.
| |
Collapse
|
22
|
Hoang M, Marçais G, Kingsford C. Density and Conservation Optimization of the Generalized Masked-Minimizer Sketching Scheme. J Comput Biol 2024; 31:2-20. [PMID: 37975802 PMCID: PMC10794853 DOI: 10.1089/cmb.2023.0212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2023] Open
Abstract
Minimizers and syncmers are sketching methods that sample representative k-mer seeds from a long string. The minimizer scheme guarantees a well-spread k-mer sketch (high coverage) while seeking to minimize the sketch size (low density). The syncmer scheme yields sketches that are more robust to base substitutions (high conservation) on random sequences, but do not have the coverage guarantee of minimizers. These sketching metrics are generally adversarial to one another, especially in the context of sketch optimization for a specific sequence, and thus are difficult to be simultaneously achieved. The parameterized syncmer scheme was recently introduced as a generalization of syncmers with more flexible sampling rules and empirically better coverage than the original syncmer variants. However, no approach exists to optimize parameterized syncmers. To address this shortcoming, we introduce a new scheme called masked minimizers that generalizes minimizers in manner analogous to how parameterized syncmers generalize syncmers and allows us to extend existing optimization techniques developed for minimizers. This results in a practical algorithm to optimize the masked minimizer scheme with respect to both density and conservation. We evaluate the optimization algorithm on various benchmark genomes and show that our algorithm finds sketches that are overall more compact, well-spread, and robust to substitutions than those found by previous methods. Our implementation is released at https://github.com/Kingsford-Group/maskedminimizer. This new technique will enable more efficient and robust genomic analyses in the many settings where minimizers and syncmers are used.
Collapse
Affiliation(s)
- Minh Hoang
- Department of Computer Science, and Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Guillaume Marçais
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Carl Kingsford
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
23
|
LoTempio J, Delot E, Vilain E. Benchmarking long-read genome sequence alignment tools for human genomics applications. PeerJ 2023; 11:e16515. [PMID: 38130927 PMCID: PMC10734412 DOI: 10.7717/peerj.16515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 11/02/2023] [Indexed: 12/23/2023] Open
Abstract
Background The utility of long-read genome sequencing platforms has been shown in many fields including whole genome assembly, metagenomics, and amplicon sequencing. Less clear is the applicability of long reads to reference-guided human genomics, which is the foundation of genomic medicine. Here, we benchmark available platform-agnostic alignment tools on datasets from nanopore and single-molecule real-time platforms to understand their suitability in producing a genome representation. Results For this study, we leveraged publicly-available data from sample NA12878 generated on Oxford Nanopore and sample NA24385 on Pacific Biosciences platforms. We employed state of the art sequence alignment tools including GraphMap2, long-read aligner (LRA), Minimap2, CoNvex Gap-cost alignMents for Long Reads (NGMLR), and Winnowmap2. Minimap2 and Winnowmap2 were computationally lightweight enough for use at scale, while GraphMap2 was not. NGMLR took a long time and required many resources, but produced alignments each time. LRA was fast, but only worked on Pacific Biosciences data. Each tool widely disagreed on which reads to leave unaligned, affecting the end genome coverage and the number of discoverable breakpoints. No alignment tool independently resolved all large structural variants (1,001-100,000 base pairs) present in the Database of Genome Variants (DGV) for sample NA12878 or the truthset for NA24385. Conclusions These results suggest a combined approach is needed for LRS alignments for human genomics. Specifically, leveraging alignments from three tools will be more effective in generating a complete picture of genomic variability. It should be best practice to use an analysis pipeline that generates alignments with both Minimap2 and Winnowmap2 as they are lightweight and yield different views of the genome. Depending on the question at hand, the data available, and the time constraints, NGMLR and LRA are good options for a third tool. If computational resources and time are not a factor for a given case or experiment, NGMLR will provide another view, and another chance to resolve a case. LRA, while fast, did not work on the nanopore data for our cluster, but PacBio results were promising in that those computations completed faster than Minimap2. Due to its significant burden on computational resources and slow run time, Graphmap2 is not an ideal tool for exploration of a whole human genome generated on a long-read sequencing platform.
Collapse
Affiliation(s)
- Jonathan LoTempio
- Institute for Clinical and Translational Science, University of California, Irvine, CA, United States of America
- International Research Laboratory (IRL2006) “Epigenetics, Data, Politics (EpiDaPo)”, Centre National de la Recherche Scientifique, Washington, DC, United States of America
| | - Emmanuele Delot
- Center for Genetic Medicine Research, Children’s National Hospital, Washington, DC, United States of America
- Department of Genomics and Precision Medicine, George Washington University, Washington, DC, United States of America
| | - Eric Vilain
- Institute for Clinical and Translational Science, University of California, Irvine, CA, United States of America
- International Research Laboratory (IRL2006) “Epigenetics, Data, Politics (EpiDaPo)”, Centre National de la Recherche Scientifique, Washington, DC, United States of America
| |
Collapse
|
24
|
Abdelwahab O, Belzile F, Torkamaneh D. Performance analysis of conventional and AI-based variant callers using short and long reads. BMC Bioinformatics 2023; 24:472. [PMID: 38097928 PMCID: PMC10720095 DOI: 10.1186/s12859-023-05596-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 12/04/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND The accurate detection of variants is essential for genomics-based studies. Currently, there are various tools designed to detect genomic variants, however, it has always been a challenge to decide which tool to use, especially when various major genome projects have chosen to use different tools. Thus far, most of the existing tools were mainly developed to work on short-read data (i.e., Illumina); however, other sequencing technologies (e.g. PacBio, and Oxford Nanopore) have recently shown that they can also be used for variant calling. In addition, with the emergence of artificial intelligence (AI)-based variant calling tools, there is a pressing need to compare these tools in terms of efficiency, accuracy, computational power, and ease of use. RESULTS In this study, we evaluated five of the most widely used conventional and AI-based variant calling tools (BCFTools, GATK4, Platypus, DNAscope, and DeepVariant) in terms of accuracy and computational cost using both short-read and long-read data derived from three different sequencing technologies (Illumina, PacBio HiFi, and ONT) for the same set of samples from the Genome In A Bottle project. The analysis showed that AI-based variant calling tools supersede conventional ones for calling SNVs and INDELs using both long and short reads in most aspects. In addition, we demonstrate the advantages and drawbacks of each tool while ranking them in each aspect of these comparisons. CONCLUSION This study provides best practices for variant calling using AI-based and conventional variant callers with different types of sequencing data.
Collapse
Affiliation(s)
- Omar Abdelwahab
- Département de Phytologie, Université Laval, Québec, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Canada
- Centre de recherche et d'innovation sur les végétaux (CRIV), Université Laval, Québec, Canada
- Institut intelligence et données (IID), Université Laval, Québec, Canada
| | - François Belzile
- Département de Phytologie, Université Laval, Québec, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Canada
- Centre de recherche et d'innovation sur les végétaux (CRIV), Université Laval, Québec, Canada
| | - Davoud Torkamaneh
- Département de Phytologie, Université Laval, Québec, Canada.
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Canada.
- Centre de recherche et d'innovation sur les végétaux (CRIV), Université Laval, Québec, Canada.
- Institut intelligence et données (IID), Université Laval, Québec, Canada.
| |
Collapse
|
25
|
Makova KD, Pickett BD, Harris RS, Hartley GA, Cechova M, Pal K, Nurk S, Yoo D, Li Q, Hebbar P, McGrath BC, Antonacci F, Aubel M, Biddanda A, Borchers M, Bomberg E, Bouffard GG, Brooks SY, Carbone L, Carrel L, Carroll A, Chang PC, Chin CS, Cook DE, Craig SJ, de Gennaro L, Diekhans M, Dutra A, Garcia GH, Grady PG, Green RE, Haddad D, Hallast P, Harvey WT, Hickey G, Hillis DA, Hoyt SJ, Jeong H, Kamali K, Kosakovsky Pond SL, LaPolice TM, Lee C, Lewis AP, Loh YHE, Masterson P, McCoy RC, Medvedev P, Miga KH, Munson KM, Pak E, Paten B, Pinto BJ, Potapova T, Rhie A, Rocha JL, Ryabov F, Ryder OA, Sacco S, Shafin K, Shepelev VA, Slon V, Solar SJ, Storer JM, Sudmant PH, Sweetalana, Sweeten A, Tassia MG, Thibaud-Nissen F, Ventura M, Wilson MA, Young AC, Zeng H, Zhang X, Szpiech ZA, Huber CD, Gerton JL, Yi SV, Schatz MC, Alexandrov IA, Koren S, O’Neill RJ, Eichler E, Phillippy AM. The Complete Sequence and Comparative Analysis of Ape Sex Chromosomes. bioRxiv 2023:2023.11.30.569198. [PMID: 38077089 PMCID: PMC10705393 DOI: 10.1101/2023.11.30.569198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
Apes possess two sex chromosomes-the male-specific Y and the X shared by males and females. The Y chromosome is crucial for male reproduction, with deletions linked to infertility. The X chromosome carries genes vital for reproduction and cognition. Variation in mating patterns and brain function among great apes suggests corresponding differences in their sex chromosome structure and evolution. However, due to their highly repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the state-of-the-art experimental and computational methods developed for the telomere-to-telomere (T2T) human genome, we produced gapless, complete assemblies of the X and Y chromosomes for five great apes (chimpanzee, bonobo, gorilla, Bornean and Sumatran orangutans) and a lesser ape, the siamang gibbon. These assemblies completely resolved ampliconic, palindromic, and satellite sequences, including the entire centromeres, allowing us to untangle the intricacies of ape sex chromosome evolution. We found that, compared to the X, ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements. This divergence on the Y arises from the accumulation of lineage-specific ampliconic regions and palindromes (which are shared more broadly among species on the X) and from the abundance of transposable elements and satellites (which have a lower representation on the X). Our analysis of Y chromosome genes revealed lineage-specific expansions of multi-copy gene families and signatures of purifying selection. In summary, the Y exhibits dynamic evolution, while the X is more stable. Finally, mapping short-read sequencing data from >100 great ape individuals revealed the patterns of diversity and selection on their sex chromosomes, demonstrating the utility of these reference assemblies for studies of great ape evolution. These complete sex chromosome assemblies are expected to further inform conservation genetics of nonhuman apes, all of which are endangered species.
Collapse
Affiliation(s)
| | - Brandon D. Pickett
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Monika Cechova
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Karol Pal
- Penn State University, University Park, PA, USA
| | - Sergey Nurk
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - DongAhn Yoo
- University of Washington School of Medicine, Seattle, WA, USA
| | - Qiuhui Li
- Johns Hopkins University, Baltimore, MD, USA
| | - Prajna Hebbar
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | | | | | - Erich Bomberg
- University of Münster, Münster, Germany
- MPI for Developmental Biology, Tübingen, Germany
| | - Gerard G. Bouffard
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shelise Y. Brooks
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Lucia Carbone
- Oregon Health & Science University, Portland, OR, USA
- Oregon National Primate Research Center, Hillsboro, OR, USA
| | - Laura Carrel
- Penn State University School of Medicine, Hershey, PA, USA
| | | | | | - Chen-Shan Chin
- Foundation of Biological Data Sciences, Belmont, CA, USA
| | | | | | | | - Mark Diekhans
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Amalia Dutra
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Gage H. Garcia
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Diana Haddad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Glenn Hickey
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - David A. Hillis
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | - Hyeonsoo Jeong
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | | | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Karen H. Miga
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Evgenia Pak
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Benedict Paten
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | - Arang Rhie
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | | | - Samuel Sacco
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | - Steven J. Solar
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Sweetalana
- Penn State University, University Park, PA, USA
| | - Alex Sweeten
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Johns Hopkins University, Baltimore, MD, USA
| | | | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Alice C. Young
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Xinru Zhang
- Penn State University, University Park, PA, USA
| | | | | | | | - Soojin V. Yi
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | | | - Sergey Koren
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Evan Eichler
- University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Adam M. Phillippy
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
26
|
Kang X, Xu J, Luo X, Schönhuth A. Hybrid-hybrid correction of errors in long reads with HERO. Genome Biol 2023; 24:275. [PMID: 38041098 PMCID: PMC10690975 DOI: 10.1186/s13059-023-03112-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 11/16/2023] [Indexed: 12/03/2023] Open
Abstract
Although generally superior, hybrid approaches for correcting errors in third-generation sequencing (TGS) reads, using next-generation sequencing (NGS) reads, mistake haplotype-specific variants for errors in polyploid and mixed samples. We suggest HERO, as the first "hybrid-hybrid" approach, to make use of both de Bruijn graphs and overlap graphs for optimal catering to the particular strengths of NGS and TGS reads. Extensive benchmarking experiments demonstrate that HERO improves indel and mismatch error rates by on average 65% (27[Formula: see text]95%) and 20% (4[Formula: see text]61%). Using HERO prior to genome assembly significantly improves the assemblies in the majority of the relevant categories.
Collapse
Affiliation(s)
- Xiongbin Kang
- College of Biology, Hunan University, Changsha, China
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Jialu Xu
- College of Biology, Hunan University, Changsha, China
| | - Xiao Luo
- College of Biology, Hunan University, Changsha, China.
| | - Alexander Schönhuth
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany.
| |
Collapse
|
27
|
Jevit MJ, Castaneda C, Paria N, Das PJ, Miller D, Antczak DF, Kalbfleisch TS, Davis BW, Raudsepp T. Trio-binning of a hinny refines the comparative organization of the horse and donkey X chromosomes and reveals novel species-specific features. Sci Rep 2023; 13:20180. [PMID: 37978222 PMCID: PMC10656420 DOI: 10.1038/s41598-023-47583-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 11/14/2023] [Indexed: 11/19/2023] Open
Abstract
We generated single haplotype assemblies from a hinny hybrid which significantly improved the gapless contiguity for horse and donkey autosomal genomes and the X chromosomes. We added over 15 Mb of missing sequence to both X chromosomes, 60 Mb to donkey autosomes and corrected numerous errors in donkey and some in horse reference genomes. We resolved functionally important X-linked repeats: the DXZ4 macrosatellite and ampliconic Equine Testis Specific Transcript Y7 (ETSTY7). We pinpointed the location of the pseudoautosomal boundaries (PAB) and determined the size of the horse (1.8 Mb) and donkey (1.88 Mb) pseudoautosomal regions (PARs). We discovered distinct differences in horse and donkey PABs: a testis-expressed gene, XKR3Y, spans horse PAB with exons1-2 located in Y and exon3 in the X-Y PAR, whereas the donkey XKR3Y is Y-specific. DXZ4 had a similar ~ 8 kb monomer in both species with 10 copies in horse and 20 in donkey. We assigned hundreds of copies of ETSTY7, a sequence horizontally transferred from Parascaris and massively amplified in equids, to horse and donkey X chromosomes and three autosomes. The findings and products contribute to molecular studies of equid biology and advance research on X-linked conditions, sex chromosome regulation and evolution in equids.
Collapse
Affiliation(s)
- Matthew J Jevit
- School of Veterinary Medicine, Texas A&M University, College Station, TX, 77843, USA
| | - Caitlin Castaneda
- School of Veterinary Medicine, Texas A&M University, College Station, TX, 77843, USA
| | - Nandina Paria
- Texas Scottish Rite Hospital for Children, Dallas, TX, 75219, USA
| | - Pranab J Das
- ICAR-National Research Centre on Pig, Rani, Guwahati, Assam, 781131, India
| | - Donald Miller
- Baker Institute for Animal Health, Cornell University, Ithaca, NY, 14853, USA
| | - Douglas F Antczak
- Baker Institute for Animal Health, Cornell University, Ithaca, NY, 14853, USA
| | - Theodore S Kalbfleisch
- Maxwell H. Gluck Equine Research Center, University of Kentucky, Lexington, KY, 40546, USA
| | - Brian W Davis
- School of Veterinary Medicine, Texas A&M University, College Station, TX, 77843, USA.
| | - Terje Raudsepp
- School of Veterinary Medicine, Texas A&M University, College Station, TX, 77843, USA.
| |
Collapse
|
28
|
Gambogi CW, Pandey N, Dawicki-McKenna JM, Arora UP, Liskovykh MA, Ma J, Lamelza P, Larionov V, Lampson MA, Logsdon GA, Dumont BL, Black BE. Centromere innovations within a mouse species. Sci Adv 2023; 9:eadi5764. [PMID: 37967185 PMCID: PMC10651114 DOI: 10.1126/sciadv.adi5764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 10/13/2023] [Indexed: 11/17/2023]
Abstract
Mammalian centromeres direct faithful genetic inheritance and are typically characterized by regions of highly repetitive and rapidly evolving DNA. We focused on a mouse species, Mus pahari, that we found has evolved to house centromere-specifying centromere protein-A (CENP-A) nucleosomes at the nexus of a satellite repeat that we identified and termed π-satellite (π-sat), a small number of recruitment sites for CENP-B, and short stretches of perfect telomere repeats. One M. pahari chromosome, however, houses a radically divergent centromere harboring ~6 mega-base pairs of a homogenized π-sat-related repeat, π-satB, that contains >20,000 functional CENP-B boxes. There, CENP-B abundance promotes accumulation of microtubule-binding components of the kinetochore and a microtubule-destabilizing kinesin of the inner centromere. We propose that the balance of pro- and anti-microtubule binding by the new centromere is what permits it to segregate during cell division with high fidelity alongside the older ones whose sequence creates a markedly different molecular composition.
Collapse
Affiliation(s)
- Craig W. Gambogi
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
- Penn Center for Genome Integrity, University of Pennsylvania, Philadelphia, PA 19104, USA
- Epigenetics Institute, University of Pennsylvania, Philadelphia, PA 19104, USA
- Biochemistry and Molecular Biophysics Graduate Group, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Nootan Pandey
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
- Penn Center for Genome Integrity, University of Pennsylvania, Philadelphia, PA 19104, USA
- Epigenetics Institute, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jennine M. Dawicki-McKenna
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
- Penn Center for Genome Integrity, University of Pennsylvania, Philadelphia, PA 19104, USA
- Epigenetics Institute, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Uma P. Arora
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
- Graduate School of Biomedical Sciences, Tufts University, Boston, MA 02111, USA
| | - Mikhail A. Liskovykh
- Developmental Therapeutics Branch, National Cancer Institute, Bethesda, MD 20892, USA
| | - Jun Ma
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Piero Lamelza
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Vladimir Larionov
- Developmental Therapeutics Branch, National Cancer Institute, Bethesda, MD 20892, USA
| | - Michael A. Lampson
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Beth L. Dumont
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
- Graduate School of Biomedical Sciences, Tufts University, Boston, MA 02111, USA
- Graduate School of Biomedical Science and Engineering, University of Maine, Orono, ME 04469, USA
| | - Ben E. Black
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
- Penn Center for Genome Integrity, University of Pennsylvania, Philadelphia, PA 19104, USA
- Epigenetics Institute, University of Pennsylvania, Philadelphia, PA 19104, USA
- Biochemistry and Molecular Biophysics Graduate Group, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
29
|
Miga KH, Eichler EE. Envisioning a new era: Complete genetic information from routine, telomere-to-telomere genomes. Am J Hum Genet 2023; 110:1832-1840. [PMID: 37922882 PMCID: PMC10645551 DOI: 10.1016/j.ajhg.2023.09.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/19/2023] [Accepted: 09/20/2023] [Indexed: 11/07/2023] Open
Abstract
Advances in long-read sequencing and assembly now mean that individual labs can generate phased genomes that are more accurate and more contiguous than the original human reference genome. With declining costs and increasing democratization of technology, we suggest that complete genome assemblies, where both parental haplotypes are phased telomere to telomere, will become standard in human genetics. Soon, even in clinical settings where rigorous sample-handling standards must be met, affected individuals could have reference-grade genomes fully sequenced and assembled in just a few hours given advances in technology, computational processing, and annotation. Complete genetic variant discovery will transform how we map, catalog, and associate variation with human disease and fundamentally change our understanding of the genetic diversity of all humans.
Collapse
Affiliation(s)
- Karen H Miga
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
30
|
Cotter DJ, Webster TH, Wilson MA. Genomic and demographic processes differentially influence genetic variation across the human X chromosome. PLoS One 2023; 18:e0287609. [PMID: 37910456 PMCID: PMC10619814 DOI: 10.1371/journal.pone.0287609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 06/08/2023] [Indexed: 11/03/2023] Open
Abstract
Many forces influence genetic variation across the genome including mutation, recombination, selection, and demography. Increased mutation and recombination both lead to increases in genetic diversity in a region-specific manner, while complex demographic patterns shape patterns of diversity on a more global scale. While these processes act across the entire genome, the X chromosome is particularly interesting because it contains several distinct regions that are subject to different combinations and strengths of these forces: the pseudoautosomal regions (PARs) and the X-transposed region (XTR). The X chromosome thus can serve as a unique model for studying how genetic and demographic forces act in different contexts to shape patterns of observed variation. We therefore sought to explore diversity, divergence, and linkage disequilibrium in each region of the X chromosome using genomic data from 26 human populations. Across populations, we find that both diversity and substitution rate are consistently elevated in PAR1 and the XTR compared to the rest of the X chromosome. In contrast, linkage disequilibrium is lowest in PAR1, consistent with the high recombination rate in this region, and highest in the region of the X chromosome that does not recombine in males. However, linkage disequilibrium in the XTR is intermediate between PAR1 and the autosomes, and much lower than the non-recombining X. Finally, in addition to these global patterns, we also observed variation in ratios of X versus autosomal diversity consistent with population-specific evolutionary history as well. While our results were generally consistent with previous work, two unexpected observations emerged. First, our results suggest that the XTR does not behave like the rest of the recombining X and may need to be evaluated separately in future studies. Second, the different regions of the X chromosome appear to exhibit unique patterns of linked selection across different human populations. Together, our results highlight profound regional differences across the X chromosome, simultaneously making it an ideal system for exploring the action of evolutionary forces as well as necessitating its careful consideration and treatment in genomic analyses.
Collapse
Affiliation(s)
- Daniel J. Cotter
- Department of Genetics, Stanford University, Stanford, CA, United States of America
| | - Timothy H. Webster
- Department of Anthropology, University of Utah, Salt Lake City, UT, United States of America
- School of Life Sciences, Arizona State University, Tempe, AZ, United States of America
| | - Melissa A. Wilson
- School of Life Sciences, Arizona State University, Tempe, AZ, United States of America
- Center for Evolution and Medicine, Biodesign Institute, Arizona State University, Tempe, AZ, United States of America
| |
Collapse
|
31
|
Guo M, Songyang Z, Xiong Y. ChArmTelo Enables Large-Scale Chromosome Arm-Level Telomere Analysis across Human Populations and in Cancer Patients. Small Methods 2023; 7:e2300385. [PMID: 37526331 DOI: 10.1002/smtd.202300385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 06/29/2023] [Indexed: 08/02/2023]
Abstract
Telomeres are structures protecting chromosome ends. However, a scalable and cost-effective method to investigate chromosome arm-level (ChArm) telomeres (Telos) in large-scale projects is still lacking, hindering intensive investigation of high-resolution telomeres across cancers and other diseases. Here, ChArmTelo, the first computational toolbox to analyze telomeres at chromosome arm level in human and other animal species, using 10X linked-read and similar technologies, is presented. ChArmTelo currently consists of two algorithms, TeloEM and TeloKnow, for arm-level telomere length (TL) analysis. The algorithms are demonstrated by comprehensive analysis of chromosome arm-level telomere lengths (chArmTLs) in nearly 400 whole genome sequencing samples (WGS) from human populations and animals, including healthy and cancer samples. Notably, considerable performance improvement contributed by using the latest complete telomere-to-telomere reference genome (CHM13v2), compared to hg38, is shown. ChArmTelo reveals population-specific chArmTL differences and liver cancer signatures of chArmTLs and that DNA replication origin disruption may contribute to cancer by affecting TLs. Importantly, ChArmTelo can be readily applied to tens of thousands of cancer and healthy samples with published WGS data.
Collapse
Affiliation(s)
- Mengbiao Guo
- Key Laboratory of Gene Engineering of the Ministry of Education, Institute of Healthy Aging Research, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510006, China
| | - Zhou Songyang
- Key Laboratory of Gene Engineering of the Ministry of Education, Institute of Healthy Aging Research, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510006, China
| | - Yuanyan Xiong
- Key Laboratory of Gene Engineering of the Ministry of Education, Institute of Healthy Aging Research, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510006, China
| |
Collapse
|
32
|
Bredemeyer KR, Hillier L, Harris AJ, Hughes GM, Foley NM, Lawless C, Carroll RA, Storer JM, Batzer MA, Rice ES, Davis BW, Raudsepp T, O'Brien SJ, Lyons LA, Warren WC, Murphy WJ. Single-haplotype comparative genomics provides insights into lineage-specific structural variation during cat evolution. Nat Genet 2023; 55:1953-1963. [PMID: 37919451 PMCID: PMC10845050 DOI: 10.1038/s41588-023-01548-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 09/20/2023] [Indexed: 11/04/2023]
Abstract
The role of structurally dynamic genomic regions in speciation is poorly understood due to challenges inherent in diploid genome assembly. Here we reconstructed the evolutionary dynamics of structural variation in five cat species by phasing the genomes of three interspecies F1 hybrids to generate near-gapless single-haplotype assemblies. We discerned that cat genomes have a paucity of segmental duplications relative to great apes, explaining their remarkable karyotypic stability. X chromosomes were hotspots of structural variation, including enrichment with inversions in a large recombination desert with characteristics of a supergene. The X-linked macrosatellite DXZ4 evolves more rapidly than 99.5% of the genome clarifying its role in felid hybrid incompatibility. Resolved sensory gene repertoires revealed functional copy number changes associated with ecomorphological adaptations, sociality and domestication. This study highlights the value of gapless genomes to reveal structural mechanisms underpinning karyotypic evolution, reproductive isolation and ecological niche adaptation.
Collapse
Affiliation(s)
- Kevin R Bredemeyer
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
- Interdisciplinary Program in Genetics & Genomics, Texas A&M University, College Station, TX, USA
| | - LaDeana Hillier
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Andrew J Harris
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
- Interdisciplinary Program in Genetics & Genomics, Texas A&M University, College Station, TX, USA
| | - Graham M Hughes
- School of Biology & Environmental Sciences, University College Dublin, Dublin, Ireland
| | - Nicole M Foley
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
| | - Colleen Lawless
- School of Biology & Environmental Sciences, University College Dublin, Dublin, Ireland
| | - Rachel A Carroll
- Department of Animal Sciences, University of Missouri, Columbia, MO, USA
| | | | - Mark A Batzer
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
| | - Edward S Rice
- Department of Animal Sciences, University of Missouri, Columbia, MO, USA
| | - Brian W Davis
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
- Interdisciplinary Program in Genetics & Genomics, Texas A&M University, College Station, TX, USA
| | - Terje Raudsepp
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
- Interdisciplinary Program in Genetics & Genomics, Texas A&M University, College Station, TX, USA
| | - Stephen J O'Brien
- Guy Harvey Oceanographic Center, Nova Southeastern University, Fort Lauderdale, FL, USA
| | - Leslie A Lyons
- Department of Veterinary Medicine & Surgery, University of Missouri, Columbia, MO, USA
| | - Wesley C Warren
- Department of Animal Sciences, University of Missouri, Columbia, MO, USA.
| | - William J Murphy
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA.
- Interdisciplinary Program in Genetics & Genomics, Texas A&M University, College Station, TX, USA.
| |
Collapse
|
33
|
Zhang K, Du M, Zhang H, Zhang X, Cao S, Wang X, Wang W, Guan X, Zhou P, Li J, Jiang W, Tang M, Zheng Q, Cao M, Zhou Y, Chen K, Liu Z, Fang Y. The haplotype-resolved T2T genome of teinturier cultivar Yan73 reveals the genetic basis of anthocyanin biosynthesis in grapes. Hortic Res 2023; 10:uhad205. [PMID: 38046853 PMCID: PMC10689054 DOI: 10.1093/hr/uhad205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Accepted: 10/01/2023] [Indexed: 12/05/2023]
Abstract
Teinturier grapes are characterized by the typical accumulation of anthocyanins in grape skin, flesh, and vegetative tissues, endowing them with high utility value in red wine blending and nutrient-enriched foods developing. However, due to the lack of genome information, the mechanism involved in regulating teinturier grape coloring has not yet been elucidated and their genetic utilization research is still insufficient. Here, the cultivar 'Yan73' was used for assembling the telomere-to-telomere (T2T) genome of teinturier grapes by combining the High Fidelity (HiFi), Hi-C and ultralong Oxford Nanopore Technologies (ONT) reads. Two haplotype genomes were assembled, at the sizes of 501.68 Mb and 493.38 Mb, respectively. In the haplotype 1 genome, the transposable elements (TEs) contained 32.77% of long terminal repeats (LTRs), while in the haplotype 2 genome, 31.53% of LTRs were detected in TEs. Furthermore, obvious inversions were identified in chromosome 18 between the two haplotypes. Transcriptome profiling suggested that the gene expression patterns in 'Cabernet Sauvignon' and 'Yan73' were diverse depending on tissues, developmental stages, and varieties. The transcription program of genes in the anthocyanins biosynthesis pathway between the two cultivars exhibited high similarity in different tissues and developmental stages, whereas the expression levels of numerous genes showed significant differences. Compared with other genes, the expression levels of VvMYBA1 and VvUFGT4 in all samples, VvCHS2 except in young shoots and VvPAL9 except in the E-L23 stage of 'Yan73' were higher than those of 'Cabernet Sauvignon'. Further sequence alignments revealed potential variant gene loci and structure variations of anthocyanins biosynthesis related genes and a 816 bp sequence insertion was found in the promoter of VvMYBA1 of 'Yan73' haplotype 2 genome. The 'Yan73' T2T genome assembly and comparative analysis provided valuable foundations for further revealing the coloring mechanism of teinturier grapes and the genetic improvement of grape coloring traits.
Collapse
Affiliation(s)
- Kekun Zhang
- College of Enology, Heyang Viti-Viniculture Station, Ningxia Helan Mountain's East Foothill Wine Experiment and Demonstration Station, Northwest A&F University, Yangling 712100, China
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Mengrui Du
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
- College of Agriculture, Shanxi Agricultural University, Taigu 030801, China
| | - Hongyan Zhang
- College of Enology, Heyang Viti-Viniculture Station, Ningxia Helan Mountain's East Foothill Wine Experiment and Demonstration Station, Northwest A&F University, Yangling 712100, China
| | - Xiaoqian Zhang
- College of Enology, Heyang Viti-Viniculture Station, Ningxia Helan Mountain's East Foothill Wine Experiment and Demonstration Station, Northwest A&F University, Yangling 712100, China
| | - Shuo Cao
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xu Wang
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Wenrui Wang
- College of Enology, Heyang Viti-Viniculture Station, Ningxia Helan Mountain's East Foothill Wine Experiment and Demonstration Station, Northwest A&F University, Yangling 712100, China
| | - Xueqiang Guan
- Shandong Grape Research Institute, Shanda South Road, Jinan 250199, China
| | - Penghui Zhou
- Shandong Technology Innovation Center of Wine Grape and Wine, COFCO Great Wall Wine (Penglai) Co., Ltd., Yantai 265600, China
| | - Jin Li
- Shandong Technology Innovation Center of Wine Grape and Wine, COFCO Great Wall Wine (Penglai) Co., Ltd., Yantai 265600, China
| | | | - Meiling Tang
- Yantai Academy of Agricultural Sciences, Gangcheng West Street, Yantai 264000, China
| | - Qiuling Zheng
- Yantai Academy of Agricultural Sciences, Gangcheng West Street, Yantai 264000, China
| | - Muming Cao
- Viticulture and Wine Research Institute, Guangxi Academy of Agricultural Sciences, Nanning 530007, China
| | - Yongfeng Zhou
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
- National Key Laboratory of Tropical Crop Breeding, Tropical Crops Genetic Resources Institute, Chinese Academy of Tropical Agricultural Sciences, Haikou 570100, China
| | - Keqin Chen
- College of Enology, Heyang Viti-Viniculture Station, Ningxia Helan Mountain's East Foothill Wine Experiment and Demonstration Station, Northwest A&F University, Yangling 712100, China
| | - Zhongjie Liu
- National Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Yulin Fang
- College of Enology, Heyang Viti-Viniculture Station, Ningxia Helan Mountain's East Foothill Wine Experiment and Demonstration Station, Northwest A&F University, Yangling 712100, China
| |
Collapse
|
34
|
Chen J, Xu F. Application of Nanopore Sequencing in the Diagnosis and Treatment of Pulmonary Infections. Mol Diagn Ther 2023; 27:685-701. [PMID: 37563539 PMCID: PMC10590290 DOI: 10.1007/s40291-023-00669-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/18/2023] [Indexed: 08/12/2023]
Abstract
This review provides an in-depth discussion of the development, principles and utility of nanopore sequencing technology and its diverse applications in the identification of various pulmonary pathogens. We examined the emergence and advancements of nanopore sequencing as a significant player in this field. We illustrate the challenges faced in diagnosing mixed infections and further scrutinize the use of nanopore sequencing in the identification of single pathogens, including viruses (with a focus on its use in epidemiology, outbreak investigation, and viral resistance), bacteria (emphasizing 16S targeted sequencing, rare bacterial lung infections, and antimicrobial resistance studies), fungi (employing internal transcribed spacer sequencing), tuberculosis, and atypical pathogens. Furthermore, we discuss the role of nanopore sequencing in metagenomics and its potential for unbiased detection of all pathogens in a clinical setting, emphasizing its advantages in sequencing genome repeat areas and structural variant regions. We discuss the limitations in dealing with host DNA removal, the inherent high error rate of nanopore sequencing technology, along with the complexity of operation and processing, while acknowledging the possibilities provided by recent technological improvements. We compared nanopore sequencing with the BioFire system, a rapid molecular diagnostic system based on polymerase chain reaction. Although the BioFire system serves well for the rapid screening of known and common pathogens, it falls short in the identification of unknown or rare pathogens and in providing comprehensive genome analysis. As technological advancements continue, it is anticipated that the role of nanopore sequencing technology in diagnosing and treating lung infections will become increasingly significant.
Collapse
Affiliation(s)
- Jie Chen
- Department of Infectious Diseases, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310009, Zhejiang, China
| | - Feng Xu
- Department of Infectious Diseases, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310009, Zhejiang, China.
| |
Collapse
|
35
|
Li TT, Xia T, Wu JQ, Hong H, Sun ZL, Wang M, Ding FR, Wang J, Jiang S, Li J, Pan J, Yang G, Feng JN, Dai YP, Zhang XM, Zhou T, Li T. De novo genome assembly depicts the immune genomic characteristics of cattle. Nat Commun 2023; 14:6601. [PMID: 37857610 PMCID: PMC10587341 DOI: 10.1038/s41467-023-42161-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 09/30/2023] [Indexed: 10/21/2023] Open
Abstract
Immunogenomic loci remain poorly understood because of their genetic complexity and size. Here, we report the de novo assembly of a cattle genome and provide a detailed annotation of the immunogenomic loci. The assembled genome contains 143 contigs (N50 ~ 74.0 Mb). In contrast to the current reference genome (ARS-UCD1.2), 156 gaps are closed and 467 scaffolds are located in our assembly. Importantly, the immunogenomic regions, including three immunoglobulin (IG) loci, four T-cell receptor (TR) loci, and the major histocompatibility complex (MHC) locus, are seamlessly assembled and precisely annotated. With the characterization of 258 IG genes and 657 TR genes distributed across seven genomic loci, we present a detailed depiction of immune gene diversity in cattle. Moreover, the MHC gene structures are integrally revealed with properly phased haplotypes. Together, our work describes a more complete cattle genome, and provides a comprehensive view of its complex immune-genome.
Collapse
Affiliation(s)
- Ting-Ting Li
- Nanhu Laboratory, National Center of Biomedical Analysis, Beijing, 100850, China
| | - Tian Xia
- Nanhu Laboratory, National Center of Biomedical Analysis, Beijing, 100850, China
| | - Jia-Qi Wu
- Nanhu Laboratory, National Center of Biomedical Analysis, Beijing, 100850, China
| | - Hao Hong
- Nanhu Laboratory, National Center of Biomedical Analysis, Beijing, 100850, China
| | - Zhao-Lin Sun
- State Key Laboratory of Toxicology and Medical Countermeasures, Beijing Institute of Pharmacology and Toxicology, Beijing, 100850, China
| | - Ming Wang
- State Key Laboratories for Agrobiotechnology, College of Biological Sciences, China Agricultural University, No.2 Yuanmingyuan Xilu, Beijing, 100193, China
- College of Animal Science and Technology, China Agricultural University, No.2 Yuanmingyuan Xilu, Beijing, 100193, China
| | - Fang-Rong Ding
- State Key Laboratories for Agrobiotechnology, College of Biological Sciences, China Agricultural University, No.2 Yuanmingyuan Xilu, Beijing, 100193, China
| | - Jing Wang
- State Key Laboratory of Toxicology and Medical Countermeasures, Beijing Institute of Pharmacology and Toxicology, Beijing, 100850, China
| | - Shuai Jiang
- Nanhu Laboratory, National Center of Biomedical Analysis, Beijing, 100850, China
| | - Jin Li
- Nanhu Laboratory, National Center of Biomedical Analysis, Beijing, 100850, China
| | - Jie Pan
- Nanhu Laboratory, National Center of Biomedical Analysis, Beijing, 100850, China
| | - Guang Yang
- State Key Laboratory of Toxicology and Medical Countermeasures, Beijing Institute of Pharmacology and Toxicology, Beijing, 100850, China
| | - Jian-Nan Feng
- State Key Laboratory of Toxicology and Medical Countermeasures, Beijing Institute of Pharmacology and Toxicology, Beijing, 100850, China
| | - Yun-Ping Dai
- State Key Laboratories for Agrobiotechnology, College of Biological Sciences, China Agricultural University, No.2 Yuanmingyuan Xilu, Beijing, 100193, China
| | - Xue-Min Zhang
- Nanhu Laboratory, National Center of Biomedical Analysis, Beijing, 100850, China
- School of Basic Medical Sciences, Fudan University, Shanghai, 200032, China
| | - Tao Zhou
- Nanhu Laboratory, National Center of Biomedical Analysis, Beijing, 100850, China.
| | - Tao Li
- Nanhu Laboratory, National Center of Biomedical Analysis, Beijing, 100850, China.
- School of Basic Medical Sciences, Fudan University, Shanghai, 200032, China.
| |
Collapse
|
36
|
Rautiainen M, Nurk S, Walenz BP, Logsdon GA, Porubsky D, Rhie A, Eichler EE, Phillippy AM, Koren S. Telomere-to-telomere assembly of diploid chromosomes with Verkko. Nat Biotechnol 2023; 41:1474-1482. [PMID: 36797493 PMCID: PMC10427740 DOI: 10.1038/s41587-023-01662-6] [Citation(s) in RCA: 41] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 01/03/2023] [Indexed: 02/18/2023]
Abstract
The Telomere-to-Telomere consortium recently assembled the first truly complete sequence of a human genome. To resolve the most complex repeats, this project relied on manual integration of ultra-long Oxford Nanopore sequencing reads with a high-resolution assembly graph built from long, accurate PacBio high-fidelity reads. We have improved and automated this strategy in Verkko, an iterative, graph-based pipeline for assembling complete, diploid genomes. Verkko begins with a multiplex de Bruijn graph built from long, accurate reads and progressively simplifies this graph by integrating ultra-long reads and haplotype-specific markers. The result is a phased, diploid assembly of both haplotypes, with many chromosomes automatically assembled from telomere to telomere. Running Verkko on the HG002 human genome resulted in 20 of 46 diploid chromosomes assembled without gaps at 99.9997% accuracy. The complete assembly of diploid genomes is a critical step towards the construction of comprehensive pangenome databases and chromosome-scale comparative genomics.
Collapse
Affiliation(s)
- Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Oxford Nanopore Technologies, Oxford, UK
| | - Brian P Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
37
|
Chrisman B, He C, Jung JY, Stockham N, Paskov K, Washington P, Petereit J, Wall DP. Localizing unmapped sequences with families to validate the Telomere-to-Telomere assembly and identify new hotspots for genetic diversity. Genome Res 2023; 33:1734-1746. [PMID: 37879860 PMCID: PMC10691534 DOI: 10.1101/gr.277175.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 05/25/2023] [Indexed: 10/27/2023]
Abstract
Although it is ubiquitous in genomics, the current human reference genome (GRCh38) is incomplete: It is missing large sections of heterochromatic sequence, and as a singular, linear reference genome, it does not represent the full spectrum of human genetic diversity. To characterize gaps in GRCh38 and human genetic diversity, we developed an algorithm for sequence location approximation using nuclear families (ASLAN) to identify the region of origin of reads that do not align to GRCh38. Using unmapped reads and variant calls from whole-genome sequences (WGSs), ASLAN uses a maximum likelihood model to identify the most likely region of the genome that a subsequence belongs to given the distribution of the subsequence in the unmapped reads and phasings of families. Validating ASLAN on synthetic data and on reads from the alternative haplotypes in the decoy genome, ASLAN localizes >90% of 100-bp sequences with >92% accuracy and ∼1 Mb of resolution. We then ran ASLAN on 100-mers from unmapped reads from WGS from more than 700 families, and compared ASLAN localizations to alignment of the 100-mers to the recently released T2T-CHM13 assembly. We found that many unmapped reads in GRCh38 originate from telomeres and centromeres that are gaps in GRCh38. ASLAN localizations are in high concordance with T2T-CHM13 alignments, except in the centromeres of the acrocentric chromosomes. Comparing ASLAN localizations and T2T-CHM13 alignments, we identified sequences missing from T2T-CHM13 or sequences with high divergence from their aligned region in T2T-CHM13, highlighting new hotspots for genetic diversity.
Collapse
Affiliation(s)
- Brianna Chrisman
- Department of Bioengineering, Stanford University, Stanford, California 94305, USA;
- Nevada Bioinformatics Center, University of Nevada, Reno, Nevada 89557, USA
| | - Chloe He
- Department of Biomedical Data Science, Stanford University, Stanford, California 94305, USA
| | - Jae-Yoon Jung
- Department of Pediatrics (Systems Medicine), Stanford University, Stanford, California 94305, USA
| | - Nate Stockham
- Department of Neuroscience, Stanford University, Stanford, California 94305, USA
| | - Kelley Paskov
- Department of Biomedical Data Science, Stanford University, Stanford, California 94305, USA
| | - Peter Washington
- Department of Bioengineering, Stanford University, Stanford, California 94305, USA
| | - Juli Petereit
- Nevada Bioinformatics Center, University of Nevada, Reno, Nevada 89557, USA
| | - Dennis P Wall
- Department of Biomedical Data Science, Stanford University, Stanford, California 94305, USA
- Department of Pediatrics (Systems Medicine), Stanford University, Stanford, California 94305, USA
| |
Collapse
|
38
|
Sproul JS, Hotaling S, Heckenhauer J, Powell A, Marshall D, Larracuente AM, Kelley JL, Pauls SU, Frandsen PB. Analyses of 600+ insect genomes reveal repetitive element dynamics and highlight biodiversity-scale repeat annotation challenges. Genome Res 2023; 33:1708-1717. [PMID: 37739812 PMCID: PMC10691545 DOI: 10.1101/gr.277387.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 09/20/2023] [Indexed: 09/24/2023]
Abstract
Repetitive elements (REs) are integral to the composition, structure, and function of eukaryotic genomes, yet remain understudied in most taxonomic groups. We investigated REs across 601 insect species and report wide variation in RE dynamics across groups. Analysis of associations between REs and protein-coding genes revealed dynamic evolution at the interface between REs and coding regions across insects, including notably elevated RE-gene associations in lineages with abundant long interspersed nuclear elements (LINEs). We leveraged this large, empirical data set to quantify impacts of long-read technology on RE detection and investigate fundamental challenges to RE annotation in diverse groups. In long-read assemblies, we detected ∼36% more REs than short-read assemblies, with long terminal repeats (LTRs) showing 162% increased detection, whereas DNA transposons and LINEs showed less respective technology-related bias. In most insect lineages, 25%-85% of repetitive sequences were "unclassified" following automated annotation, compared with only ∼13% in Drosophila species. Although the diversity of available insect genomes has rapidly expanded, we show the rate of community contributions to RE databases has not kept pace, preventing efficient annotation and high-resolution study of REs in most groups. We highlight the tremendous opportunity and need for the biodiversity genomics field to embrace REs and suggest collective steps for making progress toward this goal.
Collapse
Affiliation(s)
- John S Sproul
- Department of Biology, Brigham Young University, Provo, Utah 84602, USA;
- Department of Biology, University of Nebraska Omaha, Omaha, Nebraska 68182, USA
- Department of Biology, University of Rochester, Rochester, New York 14627, USA
| | - Scott Hotaling
- School of Biological Sciences, Washington State University, Pullman, Washington 99163, USA
- Department of Watershed Sciences, Utah State University, Logan, Utah 84322, USA
| | - Jacqueline Heckenhauer
- LOEWE Center for Translational Biodiversity Genomics (LOEWE-TBG), 60325 Frankfurt, Germany
- Senckenberg Research Institute and Natural History Museum Frankfurt, 60325 Frankfurt, Germany
| | - Ashlyn Powell
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, Utah 84602, USA
| | - Dez Marshall
- Department of Biology, University of Nebraska Omaha, Omaha, Nebraska 68182, USA
| | | | - Joanna L Kelley
- School of Biological Sciences, Washington State University, Pullman, Washington 99163, USA
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, California 95064, USA
| | - Steffen U Pauls
- LOEWE Center for Translational Biodiversity Genomics (LOEWE-TBG), 60325 Frankfurt, Germany
- Senckenberg Research Institute and Natural History Museum Frankfurt, 60325 Frankfurt, Germany
- Department of Insect Biotechnology, Justus-Liebig-University Gießen, 35392 Gießen, Germany
| | - Paul B Frandsen
- LOEWE Center for Translational Biodiversity Genomics (LOEWE-TBG), 60325 Frankfurt, Germany
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, Utah 84602, USA
- Data Science Lab, Smithsonian Institution, Washington, District of Columbia 20560, USA
| |
Collapse
|
39
|
Yang C, Zhou Y, Song Y, Wu D, Zeng Y, Nie L, Liu P, Zhang S, Chen G, Xu J, Zhou H, Zhou L, Qian X, Liu C, Tan S, Zhou C, Dai W, Xu M, Qi Y, Wang X, Guo L, Fan G, Wang A, Deng Y, Zhang Y, Jin J, He Y, Guo C, Guo G, Zhou Q, Xu X, Yang H, Wang J, Xu S, Mao Y, Jin X, Ruan J, Zhang G. The complete and fully-phased diploid genome of a male Han Chinese. Cell Res 2023; 33:745-761. [PMID: 37452091 PMCID: PMC10542383 DOI: 10.1038/s41422-023-00849-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 06/29/2023] [Indexed: 07/18/2023] Open
Abstract
Since the release of the complete human genome, the priority of human genomic study has now been shifting towards closing gaps in ethnic diversity. Here, we present a fully phased and well-annotated diploid human genome from a Han Chinese male individual (CN1), in which the assemblies of both haploids achieve the telomere-to-telomere (T2T) level. Comparison of this diploid genome with the CHM13 haploid T2T genome revealed significant variations in the centromere. Outside the centromere, we discovered 11,413 structural variations, including numerous novel ones. We also detected thousands of CN1 alleles that have accumulated high substitution rates and a few that have been under positive selection in the East Asian population. Further, we found that CN1 outperforms CHM13 as a reference genome in mapping and variant calling for the East Asian population owing to the distinct structural variants of the two references. Comparison of SNP calling for a large cohort of 8869 Chinese genomes using CN1 and CHM13 as reference respectively showed that the reference bias profoundly impacts rare SNP calling, with nearly 2 million rare SNPs miss-called with different reference genomes. Finally, applying the CN1 as a reference, we discovered 5.80 Mb and 4.21 Mb putative introgression sequences from Neanderthal and Denisovan, respectively, including many East Asian specific ones undetected using CHM13 as the reference. Our analyses reveal the advances of using CN1 as a reference for population genomic studies and paleo-genomic studies. This complete genome will serve as an alternative reference for future genomic studies on the East Asian population.
Collapse
Affiliation(s)
- Chentao Yang
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Yang Zhou
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI Research-Wuhan, BGI, Wuhan, Hubei, China
| | - Yanni Song
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Dongya Wu
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Institute of Crop Science & Institute of Bioinformatics, Zhejiang University, Hangzhou, Zhejiang, China
| | - Yan Zeng
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Lei Nie
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Shilong Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Guangji Chen
- BGI-Shenzhen, Shenzhen, Guangdong, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Jinjin Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Hongling Zhou
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Long Zhou
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Innovation Center of Yangtze River Delta, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xiaobo Qian
- BGI-Shenzhen, Shenzhen, Guangdong, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Chenlu Liu
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | | | | | - Wei Dai
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Mengyang Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Yanwei Qi
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Xiaobo Wang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China
| | - Lidong Guo
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Guangyi Fan
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Aijun Wang
- BGI-Qingdao, BGI-Shenzhen, Qingdao, Shandong, China
| | - Yuan Deng
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Yong Zhang
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Yunqiu He
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Chunxue Guo
- BGI-Shenzhen, Shenzhen, Guangdong, China
- BGI-Hangzhou, Hangzhou, Zhejiang, China
| | - Guoji Guo
- School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Qing Zhou
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xun Xu
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | | | - Jian Wang
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Shuhua Xu
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
- Human Phenome Institute, Zhangjiang Fudan International Innovation Center, and Ministry of Education Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai, China
- Jiangsu Key Laboratory of Phylogenomics & Comparative Genomics, International Joint Center of Genomics of Jiangsu Province School of Life Sciences, Jiangsu Normal University, Xuzhou, Jiangsu, China
- Department of Liver Surgery and Transplantation Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Xin Jin
- BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Jue Ruan
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, China.
| | - Guojie Zhang
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China.
- Center for Evolutionary & Organismal Biology, & Women's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China.
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, Zhejiang, China.
- Innovation Center of Yangtze River Delta, Zhejiang University, Hangzhou, Zhejiang, China.
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.
| |
Collapse
|
40
|
Abstract
Our ancestors acquired morphological, cognitive and metabolic modifications that enabled humans to colonize diverse habitats, develop extraordinary technologies and reshape the biosphere. Understanding the genetic, developmental and molecular bases for these changes will provide insights into how we became human. Connecting human-specific genetic changes to species differences has been challenging owing to an abundance of low-effect size genetic changes, limited descriptions of phenotypic differences across development at the level of cell types and lack of experimental models. Emerging approaches for single-cell sequencing, genetic manipulation and stem cell culture now support descriptive and functional studies in defined cell types with a human or ape genetic background. In this Review, we describe how the sequencing of genomes from modern and archaic hominins, great apes and other primates is revealing human-specific genetic changes and how new molecular and cellular approaches - including cell atlases and organoids - are enabling exploration of the candidate causal factors that underlie human-specific traits.
Collapse
Affiliation(s)
- Alex A Pollen
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA.
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA.
| | - Umut Kilik
- Institute of Human Biology (IHB), Roche Pharma Research and Early Development, Roche Innovation Center Basel, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Craig B Lowe
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC, USA.
| | - J Gray Camp
- Institute of Human Biology (IHB), Roche Pharma Research and Early Development, Roche Innovation Center Basel, Basel, Switzerland.
- University of Basel, Basel, Switzerland.
| |
Collapse
|
41
|
Schmid-Siegert E, Qin M, Tian H, Arpat B, Chen B, Xenarios I. Reference genomes for BALB/c Nude and NOD/SCID mouse models. G3 (Bethesda) 2023; 13:jkad188. [PMID: 37594081 PMCID: PMC10542179 DOI: 10.1093/g3journal/jkad188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 08/01/2023] [Indexed: 08/19/2023]
Abstract
Mouse xenograft models play a vital role in tumor studies for research as well as for screening of drugs for the pharmaceutical industry. In particular, models with compromised immunity are favorable to increase the success of transplantation, such as, e.g. NOD/SCID and BALB/c Nude strains. The genomic sequence and alterations of many of these models still remain elusive and might hamper a model's further optimization or proper adapted usage. This can be in respect to treatments (e.g. NOD/SCID sensitivity to radiation), experiments or analysis of derived sequencing data of such models. Here we present the genome assemblies for the NOD/SCID and BALB/c Nude strains to overcome this short-coming for the future and improve our understanding of these models in the process. We highlight as well first insights into observed genomic differences for these models compared to the C57BL/6 reference genome. Genome assemblies for both are close to full-chromosome representations and provided with liftover annotations from the GRCm39 reference genome.
Collapse
Affiliation(s)
- Emanuel Schmid-Siegert
- JSR Life Sciences, NGS-AI CH DivisionRoute de la Corniche 3, 1066 Epalinges, Switzerland
| | - Mengting Qin
- JSR Life Sciences, NGS-AI CN Division, Industrial Park, Suzhou, Jiangsu 215000, P.R. China
| | - Huan Tian
- JSR Life Sciences, NGS-AI CN Division, Industrial Park, Suzhou, Jiangsu 215000, P.R. China
| | - Bulak Arpat
- JSR Life Sciences, NGS-AI CH DivisionRoute de la Corniche 3, 1066 Epalinges, Switzerland
| | - Bonnie Chen
- JSR Life Sciences, NGS-AI CN Division, Industrial Park, Suzhou, Jiangsu 215000, P.R. China
| | - Ioannis Xenarios
- JSR Life Sciences, NGS-AI CH DivisionRoute de la Corniche 3, 1066 Epalinges, Switzerland
| |
Collapse
|
42
|
Wang J, Veldsman WP, Fang X, Huang Y, Xie X, Lyu A, Zhang L. Benchmarking multi-platform sequencing technologies for human genome assembly. Brief Bioinform 2023; 24:bbad300. [PMID: 37594299 DOI: 10.1093/bib/bbad300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 07/12/2023] [Accepted: 07/26/2023] [Indexed: 08/19/2023] Open
Abstract
Genome assembly is a computational technique that involves piecing together deoxyribonucleic acid (DNA) fragments generated by sequencing technologies to create a comprehensive and precise representation of the entire genome. Generating a high-quality human reference genome is a crucial prerequisite for comprehending human biology, and it is also vital for downstream genomic variation analysis. Many efforts have been made over the past few decades to create a complete and gapless reference genome for humans by using a diverse range of advanced sequencing technologies. Several available tools are aimed at enhancing the quality of haploid and diploid human genome assemblies, which include contig assembly, polishing of contig errors, scaffolding and variant phasing. Selecting the appropriate tools and technologies remains a daunting task despite several studies have investigated the pros and cons of different assembly strategies. The goal of this paper was to benchmark various strategies for human genome assembly by combining sequencing technologies and tools on two publicly available samples (NA12878 and NA24385) from Genome in a Bottle. We then compared their performances in terms of continuity, accuracy, completeness, variant calling and phasing. We observed that PacBio HiFi long-reads are the optimal choice for generating an assembly with low base errors. On the other hand, we were able to produce the most continuous contigs with Oxford Nanopore long-reads, but they may require further polishing to improve on quality. We recommend using short-reads rather than long-reads themselves to improve the base accuracy of contigs from Oxford Nanopore long-reads. Hi-C is the best choice for chromosome-level scaffolding because it can capture the longest-range DNA connectedness compared to 10× linked-reads and Bionano optical maps. However, a combination of multiple technologies can be used to further improve the quality and completeness of genome assembly. For diploid assembly, hifiasm is the best tool for human diploid genome assembly using PacBio HiFi and Hi-C data. Looking to the future, we expect that further advancements in human diploid assemblers will leverage the power of PacBio HiFi reads and other technologies with long-range DNA connectedness to enable the generation of high-quality, chromosome-level and haplotype-resolved human genome assemblies.
Collapse
Affiliation(s)
- Jingjing Wang
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | - Werner Pieter Veldsman
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | | | | | | | - Aiping Lyu
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
- Institute for Research and Continuing Education, Hong Kong Baptist University, Shenzhen, China
| |
Collapse
|
43
|
Liao X, Zhu W, Zhou J, Li H, Xu X, Zhang B, Gao X. Repetitive DNA sequence detection and its role in the human genome. Commun Biol 2023; 6:954. [PMID: 37726397 PMCID: PMC10509279 DOI: 10.1038/s42003-023-05322-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 09/04/2023] [Indexed: 09/21/2023] Open
Abstract
Repetitive DNA sequences playing critical roles in driving evolution, inducing variation, and regulating gene expression. In this review, we summarized the definition, arrangement, and structural characteristics of repeats. Besides, we introduced diverse biological functions of repeats and reviewed existing methods for automatic repeat detection, classification, and masking. Finally, we analyzed the type, structure, and regulation of repeats in the human genome and their role in the induction of complex diseases. We believe that this review will facilitate a comprehensive understanding of repeats and provide guidance for repeat annotation and in-depth exploration of its association with human diseases.
Collapse
Affiliation(s)
- Xingyu Liao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Wufei Zhu
- Department of Endocrinology, Yichang Central People's Hospital, The First College of Clinical Medical Science, China Three Gorges University, 443000, Yichang, P.R. China
| | - Juexiao Zhou
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Haoyang Li
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Xiaopeng Xu
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Bin Zhang
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia.
| |
Collapse
|
44
|
Li J, Cullis C. Comparative Analysis of Tylosema esculentum Mitochondrial DNA Revealed Two Distinct Genome Structures. Biology (Basel) 2023; 12:1244. [PMID: 37759643 PMCID: PMC10525999 DOI: 10.3390/biology12091244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 09/11/2023] [Accepted: 09/13/2023] [Indexed: 09/29/2023]
Abstract
Tylosema esculentum, commonly known as the marama bean, is an underutilized legume with nutritious seeds, holding potential to enhance food security in southern Africa due to its resilience to prolonged drought and heat. To promote the selection of this agronomically valuable germplasm, this study assembled and compared the mitogenomes of 84 marama individuals, identifying variations in genome structure, single-nucleotide polymorphisms (SNPs), insertions/deletions (indels), heteroplasmy, and horizontal transfer. Two distinct germplasms were identified, and a novel mitogenome structure consisting of three circular molecules and one long linear chromosome was discovered. The structural variation led to an increased copy number of specific genes, nad5, nad9, rrnS, rrn5, trnC, and trnfM. The two mitogenomes also exhibited differences at 230 loci, with only one notable nonsynonymous substitution in the matR gene. Heteroplasmy was concentrated at certain loci on chromosome LS1 (OK638188). Moreover, the marama mitogenome contained an over 9 kb insertion of cpDNA, originating from chloroplast genomes, but had accumulated mutations and lost gene functionality. The evolutionary and comparative genomics analysis indicated that mitogenome divergence in marama might not be solely constrained by geographical factors. Additionally, marama, as a member from the Cercidoideae subfamily, tends to possess a more complete set of mitochondrial genes than Faboideae legumes.
Collapse
Affiliation(s)
| | - Christopher Cullis
- Department of Biology, Case Western Reserve University, Cleveland, OH 44106, USA;
| |
Collapse
|
45
|
Uno N, Satofuka H, Miyamoto H, Honma K, Suzuki T, Yamazaki K, Ito R, Moriwaki T, Hamamichi S, Tomizuka K, Oshimura M, Kazuki Y. Treatment of CHO cells with Taxol and reversine improves micronucleation and microcell-mediated chromosome transfer efficiency. Mol Ther Nucleic Acids 2023; 33:391-403. [PMID: 37547291 PMCID: PMC10403731 DOI: 10.1016/j.omtn.2023.07.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 07/11/2023] [Indexed: 08/08/2023]
Abstract
Microcell-mediated chromosome transfer is an attractive technique for transferring chromosomes from donor cells to recipient cells and has enabled the generation of cell lines and humanized animal models that contain megabase-sized gene(s). However, improvements in chromosomal transfer efficiency are still needed to accelerate the production of these cells and animals. The chromosomal transfer protocol consists of micronucleation, microcell formation, and fusion of donor cells with recipient cells. We found that the combination of Taxol (paclitaxel) and reversine rather than the conventional reagent colcemid resulted in highly efficient micronucleation and substantially improved chromosomal transfer efficiency from Chinese hamster ovary donor cells to HT1080 and NIH3T3 recipient cells by up to 18.3- and 4.9-fold, respectively. Furthermore, chromosome transfer efficiency to human induced pluripotent stem cells, which rarely occurred with colcemid, was also clearly improved after Taxol and reversine treatment. These results might be related to Taxol increasing the number of spindle poles, leading to multinucleation and delaying mitosis, and reversine inducing mitotic slippage and decreasing the duration of mitosis. Here, we demonstrated that an alternative optimized protocol improved chromosome transfer efficiency into various cell lines. These data advance chromosomal engineering technology and the use of human artificial chromosomes in genetic and regenerative medical research.
Collapse
Affiliation(s)
- Narumi Uno
- Laboratory of Bioengineering, Faculty of Life Sciences, Tokyo University of Pharmacy and Life Sciences, 1432-1 Horinouchi, Hachioji, Tokyo 192-0392, Japan
- Chromosome Engineering Research Center, Tottori University, 86 Nishi-cho, Yonago, Tottori 683-8503, Japan
| | - Hiroyuki Satofuka
- Chromosome Engineering Research Center, Tottori University, 86 Nishi-cho, Yonago, Tottori 683-8503, Japan
| | - Hitomaru Miyamoto
- Department of Chromosome Biomedical Engineering, School of Life Science, Faculty of Medicine, Tottori University, 86 Nishi-cho, Yonago, Tottori 683-8503, Japan
| | - Kazuhisa Honma
- Department of Chromosome Biomedical Engineering, School of Life Science, Faculty of Medicine, Tottori University, 86 Nishi-cho, Yonago, Tottori 683-8503, Japan
| | - Teruhiko Suzuki
- Stem Cell Project, Tokyo Metropolitan Institute of Medical Science, Kamikitazawa, Setagaya-ku, Tokyo 156-8506, Japan
| | - Kyotaro Yamazaki
- Department of Chromosome Biomedical Engineering, School of Life Science, Faculty of Medicine, Tottori University, 86 Nishi-cho, Yonago, Tottori 683-8503, Japan
- Chromosome Engineering Research Group, The Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of Natural Sciences, 5-1 Higashiyama, Myodaiji, Okazaki, Aichi 444-8787, Japan
| | - Ryota Ito
- Laboratory of Bioengineering, Faculty of Life Sciences, Tokyo University of Pharmacy and Life Sciences, 1432-1 Horinouchi, Hachioji, Tokyo 192-0392, Japan
| | - Takashi Moriwaki
- Chromosome Engineering Research Center, Tottori University, 86 Nishi-cho, Yonago, Tottori 683-8503, Japan
- Department of Chromosome Biomedical Engineering, School of Life Science, Faculty of Medicine, Tottori University, 86 Nishi-cho, Yonago, Tottori 683-8503, Japan
| | - Shusei Hamamichi
- Chromosome Engineering Research Center, Tottori University, 86 Nishi-cho, Yonago, Tottori 683-8503, Japan
| | - Kazuma Tomizuka
- Laboratory of Bioengineering, Faculty of Life Sciences, Tokyo University of Pharmacy and Life Sciences, 1432-1 Horinouchi, Hachioji, Tokyo 192-0392, Japan
| | - Mitsuo Oshimura
- Chromosome Engineering Research Center, Tottori University, 86 Nishi-cho, Yonago, Tottori 683-8503, Japan
| | - Yasuhiro Kazuki
- Chromosome Engineering Research Center, Tottori University, 86 Nishi-cho, Yonago, Tottori 683-8503, Japan
- Department of Chromosome Biomedical Engineering, School of Life Science, Faculty of Medicine, Tottori University, 86 Nishi-cho, Yonago, Tottori 683-8503, Japan
- Chromosome Engineering Research Group, The Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of Natural Sciences, 5-1 Higashiyama, Myodaiji, Okazaki, Aichi 444-8787, Japan
| |
Collapse
|
46
|
Ahmadi E, Sadeghi A, Chakraborty S. Slip-Coupled Electroosmosis and Electrophoresis Dictate DNA Translocation Speed in Solid-State Nanopores. Langmuir 2023; 39:12292-12301. [PMID: 37603825 DOI: 10.1021/acs.langmuir.3c01230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/23/2023]
Abstract
Controlling the DNA translocation speed is critical in nanopore sequencing, but remains rather challenging in practice, as attributable to a complex coupling between nanoscale fluidics and electrically mediated migration of DNA in a dynamically evolving manner. One important factor influencing the translocation speed is the DNA-liquid slippage stemming from the hydrophobic nature of the oligonucleotide, an aspect that has been widely ignored in the reported literature. In an effort to circumvent this conceptual deficit, here we first develop an analytical model to bring out the slip-mediated coupling between the electroosmosis and DNA-electrophoresis in a solid-state nanopore at low surface charge limits, ignoring the end effects. Subsequently, we compare these results with the numerical simulation data on electrokinetically modulated DNA translocation in such a nanopore, albeit of finite length with due accommodation of the end effects, connecting two end reservoirs by deploying a fully coupled Poisson-Nernst-Plank-Stokes flow model. Both the numerical and analytical results indicate that the DNA translocation speed is a linearly increasing function of the slip length, with more than four-fold increase being observed for a slip length as minimal as 0.5 nm as compared to the no-slip scenario. Considering specific strategies on demand for arresting high translocation speeds for accurate DNA sequencing, the above results establish a theoretical proposition for the same, premised on an analytical expression of the DNA-hydrophobicity modulated enhancement in the translocation speed for designing a nanopore-based sequencing platform─a paradigm that remained to be underemphasized thus far.
Collapse
Affiliation(s)
- Elham Ahmadi
- Department of Mechanical Engineering, University of Kurdistan, Sanandaj 66177-15175, Iran
| | - Arman Sadeghi
- Department of Mechanical Engineering, University of Kurdistan, Sanandaj 66177-15175, Iran
| | - Suman Chakraborty
- Department of Mechanical Engineering, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| |
Collapse
|
47
|
van Dijk EL, Naquin D, Gorrichon K, Jaszczyszyn Y, Ouazahrou R, Thermes C, Hernandez C. Genomics in the long-read sequencing era. Trends Genet 2023; 39:649-671. [PMID: 37230864 DOI: 10.1016/j.tig.2023.04.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 04/21/2023] [Accepted: 04/25/2023] [Indexed: 05/27/2023]
Abstract
Long-read sequencing (LRS) technologies have provided extremely powerful tools to explore genomes. While in the early years these methods suffered technical limitations, they have recently made significant progress in terms of read length, throughput, and accuracy and bioinformatics tools have strongly improved. Here, we aim to review the current status of LRS technologies, the development of novel methods, and the impact on genomics research. We will explore the most impactful recent findings made possible by these technologies focusing on high-resolution sequencing of genomes and transcriptomes and the direct detection of DNA and RNA modifications. We will also discuss how LRS methods promise a more comprehensive understanding of human genetic variation, transcriptomics, and epigenetics for the coming years.
Collapse
Affiliation(s)
- Erwin L van Dijk
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France.
| | - Delphine Naquin
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Kévin Gorrichon
- National Center of Human Genomics Research (CNRGH), 91000 Évry-Courcouronnes, France
| | - Yan Jaszczyszyn
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Rania Ouazahrou
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Claude Thermes
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Céline Hernandez
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| |
Collapse
|
48
|
Huang W, Qu S, Qin Q, Yang X, Han W, Lai Y, Chen J, Zhou S, Yang X, Zhou W. Nanopore Third-Generation Sequencing for Comprehensive Analysis of Hemoglobinopathy Variants. Clin Chem 2023; 69:1062-1071. [PMID: 37311260 DOI: 10.1093/clinchem/hvad073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 05/03/2023] [Indexed: 06/15/2023]
Abstract
BACKGROUND Oxford Nanopore Technology (ONT) third-generation sequencing (TGS) is a versatile genetic diagnostic platform. However, it is nonetheless challenging to prepare long-template libraries for long-read TGS, particularly the ONT method for analysis of hemoglobinopathy variants involving complex structures and occurring in GC-rich and/or homologous regions. METHODS A multiplex long PCR was designed to prepare library templates, including the whole-gene amplicons for HBA2/1, HBG2/1, HBD, and HBB, as well as the allelic amplicons for targeted deletions and special structural variations. Library construction was performed using long-PCR products, and sequencing was conducted on an Oxford Nanopore MinION instrument. Genotypes were identified based on integrative genomics viewer (IGV) plots. RESULTS This novel long-read TGS method distinguished all single nucleotide variants and structural variants within HBA2/1, HBG2/1, HBD, and HBB based on the whole-gene sequence reads. Targeted deletions and special structural variations were also identified according to the specific allelic reads. The result of 158 α-/β-thalassemia samples showed 100% concordance with previously known genotypes. CONCLUSIONS This ONT TGS method is high-throughput, which can be used for molecular screening and genetic diagnosis of hemoglobinopathies. The strategy of multiplex long PCR is an efficient strategy for library preparation, providing a practical reference for TGS assay development.
Collapse
Affiliation(s)
- Weilun Huang
- Department of Medical Genetics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Shoufang Qu
- Division of In Vitro Diagnostics for Non-infectious diseases, National Institutes for Food and Drug Control, Beijing, China
| | - Qiongzhen Qin
- Department of Medical Genetics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Xu Yang
- Guangzhou Darui Biotechnology Co., Ltd., Guangzhou, China
| | - Wanqing Han
- Guangzhou Darui Biotechnology Co., Ltd., Guangzhou, China
| | - Yongli Lai
- Department of Medical Genetics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Jiaqi Chen
- Department of Pediatrics, Southern Medical University Nanfang Hospital, Guangzhou, China
| | - Shihao Zhou
- Department of Genetics, Changsha Hospital for Maternal and Child Health Care, Changsha, China
| | - Xuexi Yang
- Institute of Antibody Engineering, School of Laboratory Medicine and Biotechnology, Southern Medical University, Guangzhou, China
| | - Wanjun Zhou
- Department of Medical Genetics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
- Department of Laboratory Medicine, Southern Medical University Nanfang Hospital, Guangzhou, China
| |
Collapse
|
49
|
Bzikadze AV, Pevzner PA. UniAligner: a parameter-free framework for fast sequence alignment. Nat Methods 2023; 20:1346-1354. [PMID: 37580559 DOI: 10.1038/s41592-023-01970-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 07/05/2023] [Indexed: 08/16/2023]
Abstract
Even though the recent advances in 'complete genomics' revealed the previously inaccessible genomic regions, analysis of variations in centromeres and other extra-long tandem repeats (ETRs) faces an algorithmic challenge since there are currently no tools for accurate sequence comparison of ETRs. Counterintuitively, the classical alignment approaches, such as the Smith-Waterman algorithm, fail to construct biologically adequate alignments of ETRs. We present UniAligner-the parameter-free sequence alignment algorithm with sequence-dependent alignment scoring that automatically changes for any pair of compared sequences. UniAligner prioritizes matches of rare substrings that are more likely to be relevant to the evolutionary relationship between two sequences. We apply UniAligner to estimate the mutation rates in human centromeres, and quantify the extremely high rate of large duplications and deletions in centromeres. This high rate suggests that centromeres may represent some of the most rapidly evolving regions of the human genome with respect to their structural organization.
Collapse
Affiliation(s)
- Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, La Jolla, CA, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
50
|
Hallast P, Ebert P, Loftus M, Yilmaz F, Audano PA, Logsdon GA, Bonder MJ, Zhou W, Höps W, Kim K, Li C, Hoyt SJ, Dishuck PC, Porubsky D, Tsetsos F, Kwon JY, Zhu Q, Munson KM, Hasenfeld P, Harvey WT, Lewis AP, Kordosky J, Hoekzema K, O'Neill RJ, Korbel JO, Tyler-Smith C, Eichler EE, Shi X, Beck CR, Marschall T, Konkel MK, Lee C. Assembly of 43 human Y chromosomes reveals extensive complexity and variation. Nature 2023; 621:355-364. [PMID: 37612510 PMCID: PMC10726138 DOI: 10.1038/s41586-023-06425-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 07/11/2023] [Indexed: 08/25/2023]
Abstract
The prevalence of highly repetitive sequences within the human Y chromosome has prevented its complete assembly to date1 and led to its systematic omission from genomic analyses. Here we present de novo assemblies of 43 Y chromosomes spanning 182,900 years of human evolution and report considerable diversity in size and structure. Half of the male-specific euchromatic region is subject to large inversions with a greater than twofold higher recurrence rate compared with all other chromosomes2. Ampliconic sequences associated with these inversions show differing mutation rates that are sequence context dependent, and some ampliconic genes exhibit evidence for concerted evolution with the acquisition and purging of lineage-specific pseudogenes. The largest heterochromatic region in the human genome, Yq12, is composed of alternating repeat arrays that show extensive variation in the number, size and distribution, but retain a 1:1 copy-number ratio. Finally, our data suggest that the boundary between the recombining pseudoautosomal region 1 and the non-recombining portions of the X and Y chromosomes lies 500 kb away from the currently established1 boundary. The availability of fully sequence-resolved Y chromosomes from multiple individuals provides a unique opportunity for identifying new associations of traits with specific Y-chromosomal variants and garnering insights into the evolution and function of complex regions of the human genome.
Collapse
Affiliation(s)
- Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Mark Loftus
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marc Jan Bonder
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Weichen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Wolfram Höps
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Kwondo Kim
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Chong Li
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Savannah J Hoyt
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Philip C Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Fotios Tsetsos
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Jee Young Kwon
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Patrick Hasenfeld
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jennifer Kordosky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Rachel J O'Neill
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- The University of Connecticut Health Center, Farmington, CT, USA
| | - Jan O Korbel
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | | | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Xinghua Shi
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- The University of Connecticut Health Center, Farmington, CT, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Miriam K Konkel
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
| |
Collapse
|