1
|
Yoo D, Rhie A, Hebbar P, Antonacci F, Logsdon GA, Solar SJ, Antipov D, Pickett BD, Safonova Y, Montinaro F, Luo Y, Malukiewicz J, Storer JM, Lin J, Sequeira AN, Mangan RJ, Hickey G, Monfort Anez G, Balachandran P, Bankevich A, Beck CR, Biddanda A, Borchers M, Bouffard GG, Brannan E, Brooks SY, Carbone L, Carrel L, Chan AP, Crawford J, Diekhans M, Engelbrecht E, Feschotte C, Formenti G, Garcia GH, de Gennaro L, Gilbert D, Green RE, Guarracino A, Gupta I, Haddad D, Han J, Harris RS, Hartley GA, Harvey WT, Hiller M, Hoekzema K, Houck ML, Jeong H, Kamali K, Kellis M, Kille B, Lee C, Lee Y, Lees W, Lewis AP, Li Q, Loftus M, Loh YHE, Loucks H, Ma J, Mao Y, Martinez JFI, Masterson P, McCoy RC, McGrath B, McKinney S, Meyer BS, Miga KH, Mohanty SK, Munson KM, Pal K, Pennell M, Pevzner PA, Porubsky D, Potapova T, Ringeling FR, Rocha JL, Ryder OA, Sacco S, Saha S, Sasaki T, Schatz MC, Schork NJ, Shanks C, Smeds L, Son DR, Steiner C, Sweeten AP, Tassia MG, Thibaud-Nissen F, Torres-González E, Trivedi M, Wei W, Wertz J, Yang M, Zhang P, Zhang S, Zhang Y, Zhang Z, et alYoo D, Rhie A, Hebbar P, Antonacci F, Logsdon GA, Solar SJ, Antipov D, Pickett BD, Safonova Y, Montinaro F, Luo Y, Malukiewicz J, Storer JM, Lin J, Sequeira AN, Mangan RJ, Hickey G, Monfort Anez G, Balachandran P, Bankevich A, Beck CR, Biddanda A, Borchers M, Bouffard GG, Brannan E, Brooks SY, Carbone L, Carrel L, Chan AP, Crawford J, Diekhans M, Engelbrecht E, Feschotte C, Formenti G, Garcia GH, de Gennaro L, Gilbert D, Green RE, Guarracino A, Gupta I, Haddad D, Han J, Harris RS, Hartley GA, Harvey WT, Hiller M, Hoekzema K, Houck ML, Jeong H, Kamali K, Kellis M, Kille B, Lee C, Lee Y, Lees W, Lewis AP, Li Q, Loftus M, Loh YHE, Loucks H, Ma J, Mao Y, Martinez JFI, Masterson P, McCoy RC, McGrath B, McKinney S, Meyer BS, Miga KH, Mohanty SK, Munson KM, Pal K, Pennell M, Pevzner PA, Porubsky D, Potapova T, Ringeling FR, Rocha JL, Ryder OA, Sacco S, Saha S, Sasaki T, Schatz MC, Schork NJ, Shanks C, Smeds L, Son DR, Steiner C, Sweeten AP, Tassia MG, Thibaud-Nissen F, Torres-González E, Trivedi M, Wei W, Wertz J, Yang M, Zhang P, Zhang S, Zhang Y, Zhang Z, Zhao SA, Zhu Y, Jarvis ED, Gerton JL, Rivas-González I, Paten B, Szpiech ZA, Huber CD, Lenz TL, Konkel MK, Yi SV, Canzar S, Watson CT, Sudmant PH, Molloy E, Garrison E, Lowe CB, Ventura M, O'Neill RJ, Koren S, Makova KD, Phillippy AM, Eichler EE. Complete sequencing of ape genomes. Nature 2025; 641:401-418. [PMID: 40205052 PMCID: PMC12058530 DOI: 10.1038/s41586-025-08816-3] [Show More Authors] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Accepted: 02/19/2025] [Indexed: 04/11/2025]
Abstract
The most dynamic and repetitive regions of great ape genomes have traditionally been excluded from comparative studies1-3. Consequently, our understanding of the evolution of our species is incomplete. Here we present haplotype-resolved reference genomes and comparative analyses of six ape species: chimpanzee, bonobo, gorilla, Bornean orangutan, Sumatran orangutan and siamang. We achieve chromosome-level contiguity with substantial sequence accuracy (<1 error in 2.7 megabases) and completely sequence 215 gapless chromosomes telomere-to-telomere. We resolve challenging regions, such as the major histocompatibility complex and immunoglobulin loci, to provide in-depth evolutionary insights. Comparative analyses enabled investigations of the evolution and diversity of regions previously uncharacterized or incompletely studied without bias from mapping to the human reference genome. Such regions include newly minted gene families in lineage-specific segmental duplications, centromeric DNA, acrocentric chromosomes and subterminal heterochromatin. This resource serves as a comprehensive baseline for future evolutionary studies of humans and our closest living ape relatives.
Collapse
Affiliation(s)
- DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arang Rhie
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Prajna Hebbar
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Francesca Antonacci
- Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, Italy
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Steven J Solar
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Dmitry Antipov
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Brandon D Pickett
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Yana Safonova
- Computer Science and Engineering Department, Huck Institutes of Life Sciences, Pennsylvania State University, State College, PA, USA
| | - Francesco Montinaro
- Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, Italy
- Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Yanting Luo
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC, USA
| | - Joanna Malukiewicz
- Research Unit for Evolutionary Immunogenomics, Department of Biology, University of Hamburg, Hamburg, Germany
- German Primate Center, Primate Genetics Laboratory, Goettingen, Germany
| | - Jessica M Storer
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
| | - Jiadong Lin
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Riley J Mangan
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Genetics Training Program, Harvard Medical School, Boston, MA, USA
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | | | - Anton Bankevich
- Computer Science and Engineering Department, Huck Institutes of Life Sciences, Pennsylvania State University, State College, PA, USA
| | - Christine R Beck
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA
| | - Arjun Biddanda
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | | | - Gerard G Bouffard
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Emry Brannan
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Shelise Y Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Lucia Carbone
- Department of Medicine, KCVI, Oregon Health Sciences University, Portland, OR, USA
- Division of Genetics, Oregon National Primate Research Center, Beaverton, OR, USA
| | - Laura Carrel
- PSU Medical School, Penn State University School of Medicine, Hershey, PA, USA
| | - Agnes P Chan
- The Translational Genomics Research Institute, City of Hope National Medical Center, Phoenix, AZ, USA
| | - Juyun Crawford
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Eric Engelbrecht
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, USA
| | - Cedric Feschotte
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
| | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Gage H Garcia
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Luciana de Gennaro
- Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, Italy
| | - David Gilbert
- San Diego Biomedical Research Institute, San Diego, CA, USA
| | - Richard E Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ishaan Gupta
- Department of Computer Science and Engineering, University of California, San Diego, San Diego, CA, USA
| | - Diana Haddad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Junmin Han
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Robert S Harris
- Department of Biology, Penn State University, University Park, PA, USA
| | | | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Michael Hiller
- LOEWE Centre for Translational Biodiversity Genomics, Frankfurt, Germany
- Senckenberg Research Institute, Frankfurt, Germany
- Institute of Cell Biology and Neuroscience, Faculty of Biosciences, Goethe University Frankfurt, Frankfurt, Germany
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Hyeonsoo Jeong
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kaivan Kamali
- Department of Biology, Penn State University, University Park, PA, USA
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Bryce Kille
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Chul Lee
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Youngho Lee
- Laboratory of Bioinformatics and Population Genetics, Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - William Lees
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, USA
- Bioengineering Program, Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Mark Loftus
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Yong Hwee Eddie Loh
- Neuroscience Research Institute, University of California, Santa Barbara, Santa Barbara, CA, USA
| | - Hailey Loucks
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
- Center for Genomic Research, International Institutes of Medicine, Fourth Affiliated Hospital, Zhejiang University, Yiwu, China
- Shanghai Jiao Tong University Chongqing Research Institute, Chongqing, China
| | - Juan F I Martinez
- Computer Science and Engineering Department, Huck Institutes of Life Sciences, Pennsylvania State University, State College, PA, USA
| | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Barbara McGrath
- Department of Biology, Penn State University, University Park, PA, USA
| | - Sean McKinney
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Britta S Meyer
- Research Unit for Evolutionary Immunogenomics, Department of Biology, University of Hamburg, Hamburg, Germany
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Saswat K Mohanty
- Department of Biology, Penn State University, University Park, PA, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Karol Pal
- Department of Biology, Penn State University, University Park, PA, USA
| | - Matt Pennell
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, San Diego, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Francisca R Ringeling
- Faculty of Informatics and Data Science, University of Regensburg, Regensburg, Germany
| | - Joana L Rocha
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA
| | | | - Samuel Sacco
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Swati Saha
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, USA
| | - Takayo Sasaki
- San Diego Biomedical Research Institute, San Diego, CA, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Nicholas J Schork
- The Translational Genomics Research Institute, City of Hope National Medical Center, Phoenix, AZ, USA
| | - Cole Shanks
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Linnéa Smeds
- Department of Biology, Penn State University, University Park, PA, USA
| | - Dongmin R Son
- Department of Ecology, Evolution and Marine Biology, Neuroscience Research Institute, University of California, Santa Barbara, Santa Barbara, CA, USA
| | | | - Alexander P Sweeten
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Michael G Tassia
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | - Mihir Trivedi
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Wenjie Wei
- School of Life Sciences, Westlake University, Hangzhou, China
- National Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
| | - Julie Wertz
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Muyu Yang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Panpan Zhang
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
| | - Shilong Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Yang Zhang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Zhenmiao Zhang
- Department of Computer Science and Engineering, University of California, San Diego, San Diego, CA, USA
| | - Sarah A Zhao
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Yixin Zhu
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
| | - Erich D Jarvis
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | | | - Iker Rivas-González
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Zachary A Szpiech
- Department of Biology, Penn State University, University Park, PA, USA
| | - Christian D Huber
- Department of Biology, Penn State University, University Park, PA, USA
| | - Tobias L Lenz
- Research Unit for Evolutionary Immunogenomics, Department of Biology, University of Hamburg, Hamburg, Germany
| | - Miriam K Konkel
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Soojin V Yi
- Department of Ecology, Evolution and Marine Biology, Neuroscience Research Institute, University of California, Santa Barbara, Santa Barbara, CA, USA
- Department of Molecular, Cellular and Developmental Biology, Neuroscience Research Institute, University of California, Santa Barbara, Santa Barbara, CA, USA
| | - Stefan Canzar
- Faculty of Informatics and Data Science, University of Regensburg, Regensburg, Germany
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, USA
| | - Peter H Sudmant
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Erin Molloy
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Craig B Lowe
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC, USA
| | - Mario Ventura
- Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, Italy
| | - Rachel J O'Neill
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Sergey Koren
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Kateryna D Makova
- Department of Biology, Penn State University, University Park, PA, USA.
| | - Adam M Phillippy
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| |
Collapse
|
2
|
Porubsky D, Dashnow H, Sasani TA, Logsdon GA, Hallast P, Noyes MD, Kronenberg ZN, Mokveld T, Koundinya N, Nolan C, Steely CJ, Guarracino A, Dolzhenko E, Harvey WT, Rowell WJ, Grigorev K, Nicholas TJ, Goldberg ME, Oshima KK, Lin J, Ebert P, Watkins WS, Leung TY, Hanlon VCT, McGee S, Pedersen BS, Happ HC, Jeong H, Munson KM, Hoekzema K, Chan DD, Wang Y, Knuth J, Garcia GH, Fanslow C, Lambert C, Lee C, Smith JD, Levy S, Mason CE, Garrison E, Lansdorp PM, Neklason DW, Jorde LB, Quinlan AR, Eberle MA, Eichler EE. Human de novo mutation rates from a four-generation pedigree reference. Nature 2025:10.1038/s41586-025-08922-2. [PMID: 40269156 DOI: 10.1038/s41586-025-08922-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Accepted: 03/20/2025] [Indexed: 04/25/2025]
Abstract
Understanding the human de novo mutation (DNM) rate requires complete sequence information1. Here using five complementary short-read and long-read sequencing technologies, we phased and assembled more than 95% of each diploid human genome in a four-generation, twenty-eight-member family (CEPH 1463). We estimate 98-206 DNMs per transmission, including 74.5 de novo single-nucleotide variants, 7.4 non-tandem repeat indels, 65.3 de novo indels or structural variants originating from tandem repeats, and 4.4 centromeric DNMs. Among male individuals, we find 12.4 de novo Y chromosome events per generation. Short tandem repeats and variable-number tandem repeats are the most mutable, with 32 loci exhibiting recurrent mutation through the generations. We accurately assemble 288 centromeres and six Y chromosomes across the generations and demonstrate that the DNM rate varies by an order of magnitude depending on repeat content, length and sequence identity. We show a strong paternal bias (75-81%) for all forms of germline DNM, yet we estimate that 16% of de novo single-nucleotide variants are postzygotic in origin with no paternal bias, including early germline mosaic mutations. We place all this variation in the context of a high-resolution recombination map (~3.4 kb breakpoint resolution) and find no correlation between meiotic crossover and de novo structural variants. These near-telomere-to-telomere familial genomes provide a truth set to understand the most fundamental processes underlying human genetic variation.
Collapse
Affiliation(s)
- David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Thomas A Sasani
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Michelle D Noyes
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Nidhi Koundinya
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Cody J Steely
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Department of Internal Medicine, University of Kentucky College of Medicine, Lexington, KY, USA
| | - Andrea Guarracino
- Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | | | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Kirill Grigorev
- Space Biosciences Research Branch, NASA Ames Research Center, Moffett Field, CA, USA
- Blue Marble Space Institute of Science, Seattle, WA, USA
| | - Thomas J Nicholas
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Michael E Goldberg
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Keisuke K Oshima
- Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jiadong Lin
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter Ebert
- Core Unit Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - W Scott Watkins
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Tiffany Y Leung
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, British Columbia, Canada
| | - Vincent C T Hanlon
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, British Columbia, Canada
| | - Sean McGee
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Brent S Pedersen
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Hannah C Happ
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Hyeonsoo Jeong
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Altos Labs, San Diego, CA, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Daniel D Chan
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, British Columbia, Canada
| | - Yanni Wang
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, British Columbia, Canada
| | - Jordan Knuth
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Gage H Garcia
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Joshua D Smith
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
- The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, USA
| | - Erik Garrison
- Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Peter M Lansdorp
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, British Columbia, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Deborah W Neklason
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Lynn B Jorde
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | | | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
3
|
Li Q, Keskus AG, Wagner J, Izydorczyk MB, Timp W, Sedlazeck FJ, Klein AP, Zook JM, Kolmogorov M, Schatz MC. Unraveling the hidden complexity of cancer through long-read sequencing. Genome Res 2025; 35:599-620. [PMID: 40113261 PMCID: PMC12047254 DOI: 10.1101/gr.280041.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/22/2025]
Abstract
Cancer is fundamentally a disease of the genome, characterized by extensive genomic, transcriptomic, and epigenomic alterations. Most current studies predominantly use short-read sequencing, gene panels, or microarrays to explore these alterations; however, these technologies can systematically miss or misrepresent certain types of alterations, especially structural variants, complex rearrangements, and alterations within repetitive regions. Long-read sequencing is rapidly emerging as a transformative technology for cancer research by providing a comprehensive view across the genome, transcriptome, and epigenome, including the ability to detect alterations that previous technologies have overlooked. In this Perspective, we explore the current applications of long-read sequencing for both germline and somatic cancer analysis. We provide an overview of the computational methodologies tailored to long-read data and highlight key discoveries and resources within cancer genomics that were previously inaccessible with prior technologies. We also address future opportunities and persistent challenges, including the experimental and computational requirements needed to scale to larger sample sizes, the hurdles in sequencing and analyzing complex cancer genomes, and opportunities for leveraging machine learning and artificial intelligence technologies for cancer informatics. We further discuss how the telomere-to-telomere genome and the emerging human pangenome could enhance the resolution of cancer genome analysis, potentially revolutionizing early detection and disease monitoring in patients. Finally, we outline strategies for transitioning long-read sequencing from research applications to routine clinical practice.
Collapse
Affiliation(s)
- Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Ayse G Keskus
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, Maryland 20892, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - Michal B Izydorczyk
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Texas 77030, USA
- Department of Computer Science, Rice University, Houston, Texas 77251, USA
| | - Alison P Klein
- Sidney Kimmel Comprehensive Cancer Center, Department of Oncology, Johns Hopkins Medicine, Baltimore, Maryland 21031, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - Mikhail Kolmogorov
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, Maryland 20892, USA;
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA;
- Sidney Kimmel Comprehensive Cancer Center, Department of Oncology, Johns Hopkins Medicine, Baltimore, Maryland 21031, USA
| |
Collapse
|
4
|
Groza C, Ge B, Cheung WA, Pastinen T, Bourque G. Expanded methylome and quantitative trait loci detection by long-read profiling of personal DNA. Genome Res 2025; 35:644-652. [PMID: 40113263 PMCID: PMC12047246 DOI: 10.1101/gr.279240.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Accepted: 02/11/2025] [Indexed: 03/22/2025]
Abstract
Structural variants (SVs) are omnipresent in human DNA, yet their genotype and methylation statuses are rarely characterized due to previous limitations in genome assembly and detection of modified nucleotides. Also, the extent to which SVs act as methylation quantitative trait loci (SV-mQTLs) is largely unknown. Here, we generated a pangenome graph summarizing SVs in 782 de novo assemblies obtained from Genomic Answers for Kids, capturing 14.6 million CpG dinucleotides that are absent from the CHM13v2 reference (SV-CpGs), thus expanding their number by 43.6%. Using 435 methylomes, we genotyped 4.06 million SV-CpGs, of which 3.93 million (96.8%) are methylated at least once. Nonrepeat sequences contribute 1.59 × 106 novel SV-CpGs, followed by centromeric satellites (6.57 × 105), simple repeats (5.40 × 105), Alu elements (5.07 × 105), satellites (2.17 × 105), LINE-1s (1.83 × 105), and SVA (SINE-VNTR-Alu) elements (1.50 × 105). Centromeric satellites, simple repeats, and SVAs are overrepresented in SV-CpGs versus reference CpGs. Similarly, methylation levels in SV-CpGs are more variable than in reference CpGs. To explore if SVs are potentially causal for functional variation, we measured SV-mQTLs. This revealed over 230,464 methylation bins where the methylation is associated with common SVs within 100 kbp. Finally, we identified 65,659 methylation bins (28.5%) where the leading QTL variant is an SV. In conclusion, we demonstrate that graph pangenomes provide full SV structures, the associated methylation variation, and reveal tens of thousands of SV-mQTLs, underscoring the importance of assembly based analyses of human traits.
Collapse
Affiliation(s)
- Cristian Groza
- Université de Montréal, Montréal Heart Institute, Montréal, Québec H1T 1C8, Canada
| | - Bing Ge
- McGill University, McGill University and Genome Quebec Innovation Centre, Montréal, Québec H3A 2T8, Canada
| | - Warren A Cheung
- Children's Mercy Hospital and Research Institute, Genomic Medicine Center, Kansas City, Missouri 64108, USA
| | - Tomi Pastinen
- Children's Mercy Hospital and Research Institute, Genomic Medicine Center, Kansas City, Missouri 64108, USA;
| | - Guillaume Bourque
- McGill University, Human Genetics, Montréal, Québec H3A 0C7, Canada;
- Canadian Center for Computational Genomics, McGill University, Montréal, Québec H3A 2R7, Canada
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, Québec H3A 0G1, Canada
| |
Collapse
|
5
|
Montano C, Timp W. Evolution of genome-wide methylation profiling technologies. Genome Res 2025; 35:572-582. [PMID: 40228903 PMCID: PMC12047278 DOI: 10.1101/gr.278407.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2025]
Abstract
In this mini-review, we explore the advancements in genome-wide DNA methylation profiling, tracing the evolution from traditional methods such as methylation arrays and whole-genome bisulfite sequencing to the cutting-edge single-molecule profiling enabled by long-read sequencing (LRS) technologies. We highlight how LRS is transforming clinical and translational research, particularly by its ability to simultaneously measure genetic and epigenetic information, providing a more comprehensive understanding of complex disease mechanisms. We discuss current challenges and future directions in the field, emphasizing the need for innovative computational tools and robust, reproducible approaches to fully harness the capabilities of LRS in molecular diagnostics.
Collapse
Affiliation(s)
- Carolina Montano
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA
- Division of Human Genetics, Department of Pediatrics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA;
| |
Collapse
|
6
|
Mahmoud M, Agustinho DP, Sedlazeck FJ. A Hitchhiker's Guide to long-read genomic analysis. Genome Res 2025; 35:545-558. [PMID: 40228901 PMCID: PMC12047252 DOI: 10.1101/gr.279975.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2025]
Abstract
Over the past decade, long-read sequencing has evolved into a pivotal technology for uncovering the hidden and complex regions of the genome. Significant cost efficiency, scalability, and accuracy advancements have driven this evolution. Concurrently, novel analytical methods have emerged to harness the full potential of long reads. These advancements have enabled milestones such as the first fully completed human genome, enhanced identification and understanding of complex genomic variants, and deeper insights into the interplay between epigenetics and genomic variation. This mini-review provides a comprehensive overview of the latest developments in long-read DNA sequencing analysis, encompassing reference-based and de novo assembly approaches. We explore the entire workflow, from initial data processing to variant calling and annotation, focusing on how these methods improve our ability to interpret a wide array of genomic variants. Additionally, we discuss the current challenges, limitations, and future directions in the field, offering a detailed examination of the state-of-the-art bioinformatics methods for long-read sequencing.
Collapse
Affiliation(s)
- Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Daniel P Agustinho
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA;
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| |
Collapse
|
7
|
Rausch T, Marschall T, Korbel JO. The impact of long-read sequencing on human population-scale genomics. Genome Res 2025; 35:593-598. [PMID: 40228902 PMCID: PMC12047236 DOI: 10.1101/gr.280120.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2025]
Abstract
Long-read sequencing technologies, particularly those from Pacific Biosciences and Oxford Nanopore Technologies, are revolutionizing genome research by providing high-resolution insights into complex and repetitive regions of the human genome that were previously inaccessible. These advances have been particularly enabling for the comprehensive detection of genomic structural variants (SVs), which is critical for linking genotype to phenotype in population-scale and rare disease studies, as well as in cancer. Recent developments in sequencing throughput and computational methods, such as pangenome graphs and haplotype-resolved assemblies, are paving the way for the future inclusion of long-read sequencing in clinical cohort studies and disease diagnostics. DNA methylation signals directly obtained from long reads enhance the utility of single-molecule long-read sequencing technologies by enabling molecular phenotypes to be interpreted, and by allowing the identification of the parent of origin of de novo mutations. Despite this recent progress, challenges remain in scaling long-read technologies to large populations due to cost, computational complexity, and the lack of tools to facilitate the efficient interpretation of SVs in graphs. This perspective provides a succinct review on the current state of long-read sequencing in genomics by highlighting its transformative potential and key hurdles, and emphasizing future opportunities for advancing the understanding of human genetic diversity and diseases through population-scale long-read analysis.
Collapse
Affiliation(s)
- Tobias Rausch
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany;
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, 40225 Düsseldorf, Germany;
- Center for Digital Medicine, Heinrich Heine University, 40225 Düsseldorf, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany;
| |
Collapse
|
8
|
Smeds L, Kamali K, Kejnovská I, Kejnovský E, Chiaromonte F, Makova KD. Non-canonical DNA in human and other ape telomere-to-telomere genomes. Nucleic Acids Res 2025; 53:gkaf298. [PMID: 40226919 PMCID: PMC11995269 DOI: 10.1093/nar/gkaf298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2024] [Revised: 02/28/2025] [Accepted: 04/07/2025] [Indexed: 04/15/2025] Open
Abstract
Non-canonical (non-B) DNA structures-e.g. bent DNA, hairpins, G-quadruplexes (G4s), Z-DNA, etc.-which form at certain sequence motifs (e.g. A-phased repeats, inverted repeats, etc.), have emerged as important regulators of cellular processes and drivers of genome evolution. Yet, they have been understudied due to their repetitive nature and potentially inaccurate sequences generated with short-read technologies. Here we comprehensively characterize such motifs in the long-read telomere-to-telomere (T2T) genomes of human, bonobo, chimpanzee, gorilla, Bornean orangutan, Sumatran orangutan, and siamang. Non-B DNA motifs are enriched at the genomic regions added to T2T assemblies and occupy 9%-15%, 9%-11%, and 12%-38% of autosomes and chromosomes X and Y, respectively. G4s and Z-DNA are enriched at promoters and enhancers, as well as at origins of replication. Repetitive sequences harbor more non-B DNA motifs than non-repetitive sequences, especially in the short arms of acrocentric chromosomes. Most centromeres and/or their flanking regions are enriched in at least one non-B DNA motif type, consistent with a potential role of non-B structures in determining centromeres. Our results highlight the uneven distribution of predicted non-B DNA structures across ape genomes and suggest their novel functions in previously inaccessible genomic regions.
Collapse
Affiliation(s)
- Linnéa Smeds
- Department of Biology, Penn State University, University Park, PA 16802, United States
| | - Kaivan Kamali
- Department of Biology, Penn State University, University Park, PA 16802, United States
| | - Iva Kejnovská
- Department of Biophysics of Nucleic Acids, Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 612 65 Brno, Czech Republic
| | - Eduard Kejnovský
- Department of Plant Developmental Genetics, Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 612 65 Brno, Czech Republic
| | - Francesca Chiaromonte
- Department of Statistics, Penn State University, University Park, PA 16802, United States
- Center for Medical Genomics, Penn State University, University Park, PA 16802, United States
- L’EMbeDS, Sant’Anna School of Advanced Studies, 56127 Pisa, Italy
| | - Kateryna D Makova
- Department of Biology, Penn State University, University Park, PA 16802, United States
- Center for Medical Genomics, Penn State University, University Park, PA 16802, United States
| |
Collapse
|
9
|
Dubocanin D, Hartley GA, Sedeño Cortés AE, Mao Y, Hedouin S, Ranchalis J, Agarwal A, Logsdon GA, Munson KM, Real T, Mallory BJ, Eichler EE, Biggins S, O'Neill RJ, Stergachis AB. Conservation of dichromatin organization along regional centromeres. CELL GENOMICS 2025; 5:100819. [PMID: 40147439 PMCID: PMC12008808 DOI: 10.1016/j.xgen.2025.100819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2024] [Revised: 12/20/2024] [Accepted: 02/27/2025] [Indexed: 03/29/2025]
Abstract
The attachment of the kinetochore to the centromere is essential for genome maintenance, yet the highly repetitive nature of satellite regional centromeres limits our understanding of their chromatin organization. We demonstrate that single-molecule chromatin fiber sequencing (Fiber-seq) can uniquely co-resolve kinetochore and surrounding chromatin architectures along point centromeres, revealing largely homogeneous single-molecule kinetochore occupancy. In contrast, the application of Fiber-seq to regional centromeres exposed marked per-molecule heterogeneity in their chromatin organization. Regional centromere cores uniquely contain a dichotomous chromatin organization (dichromatin) composed of compacted nucleosome arrays punctuated with highly accessible chromatin patches. CENP-B occupancy phases dichromatin to the underlying alpha-satellite repeat within centromere cores but is not necessary for dichromatin formation. Centromere core dichromatin is conserved between humans and primates, including along regional centromeres lacking satellite repeats. Overall, the chromatin organization of regional centromeres is defined by marked per-molecule heterogeneity, buffering kinetochore attachment against sequence and structural variability within regional centromeres.
Collapse
Affiliation(s)
- Danilo Dubocanin
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - Gabrielle A Hartley
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| | - Adriana E Sedeño Cortés
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - Yizi Mao
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - Sabrine Hedouin
- Fred Hutchinson Cancer Center, Basic Sciences Division, Seattle, WA 98109, USA
| | - Jane Ranchalis
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - Aman Agarwal
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Taylor Real
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA; Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Benjamin J Mallory
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA; Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Sue Biggins
- Howard Hughes Medical Institute, Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Rachel J O'Neill
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA; Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269, USA; Department of Genomics and Genome Sciences, UConn Health, Farmington, CT 06269, USA
| | - Andrew B Stergachis
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA; Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA.
| |
Collapse
|
10
|
Hartley GA, Okhovat M, Hoyt SJ, Fuller E, Pauloski N, Alexandre N, Alexandrov I, Drennan R, Dubocanin D, Gilbert DM, Mao Y, McCann C, Neph S, Ryabov F, Sasaki T, Storer JM, Svendsen D, Troy W, Wells J, Core L, Stergachis A, Carbone L, O'Neill RJ. Centromeric transposable elements and epigenetic status drive karyotypic variation in the eastern hoolock gibbon. CELL GENOMICS 2025; 5:100808. [PMID: 40088887 PMCID: PMC12008813 DOI: 10.1016/j.xgen.2025.100808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Revised: 12/10/2024] [Accepted: 02/12/2025] [Indexed: 03/17/2025]
Abstract
Great apes have maintained a stable karyotype with few large-scale rearrangements; in contrast, gibbons have undergone a high rate of chromosomal rearrangements coincident with rapid centromere turnover. Here, we characterize fully assembled centromeres in the eastern hoolock gibbon, Hoolock leuconedys (HLE), finding a diverse group of transposable elements (TEs) that differ from the canonical alpha-satellites found across centromeres of other apes. We find that HLE centromeres contain a CpG methylation centromere dip region, providing evidence that this epigenetic feature is conserved in the absence of satellite arrays. We uncovered a variety of atypical centromeric features, including protein-coding genes and mismatched replication timing. Further, we identify duplications and deletions in HLE centromeres that distinguish them from other gibbons. Finally, we observed differentially methylated TEs, topologically associated domain boundaries, and segmental duplications at chromosomal breakpoints, and thus propose that a combination of multiple genomic attributes with propensities for chromosome instability shaped gibbon centromere evolution.
Collapse
Affiliation(s)
- Gabrielle A Hartley
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA; Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Mariam Okhovat
- Department of Medicine, Knight Cardiovascular Institute, Oregon Health and Science University, Portland, OR, USA
| | - Savannah J Hoyt
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA; Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Emily Fuller
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA; Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Nicole Pauloski
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA; Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Nicolas Alexandre
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Ivan Alexandrov
- Department of Anatomy and Anthropology and Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Ryan Drennan
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA; Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Danilo Dubocanin
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA
| | - David M Gilbert
- San Diego Biomedical Research Institute, San Diego, CA 92121, USA
| | - Yizi Mao
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Christine McCann
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA; Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Shane Neph
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Fedor Ryabov
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Takayo Sasaki
- San Diego Biomedical Research Institute, San Diego, CA 92121, USA
| | - Jessica M Storer
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA; Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Derek Svendsen
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA; Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | | | - Jackson Wells
- Department of Medicine, Knight Cardiovascular Institute, Oregon Health and Science University, Portland, OR, USA
| | - Leighton Core
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA; Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Andrew Stergachis
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Lucia Carbone
- Department of Medicine, Knight Cardiovascular Institute, Oregon Health and Science University, Portland, OR, USA; Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, USA; Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR, USA; Division of Genetics, Oregon National Primate Research Center, Portland, OR, USA
| | - Rachel J O'Neill
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA; Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA; Department of Genetics and Genome Sciences, UConn Health, Farmington, CT, USA.
| |
Collapse
|
11
|
Mahlke MA, Lumerman L, Nath P, Chittenden C, Hoyt S, Koeppel J, Xu Y, Raphael R, Zaffina K, Hook PW, Timp W, Miga KH, Campbell PJ, O'Neill RJ, Altemose N, Nechemia-Arbely Y. Evolution and instability of human centromeres are accelerated by heterochromatin boundary loss and CENP-A overexpression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.03.636285. [PMID: 39975122 PMCID: PMC11838504 DOI: 10.1101/2025.02.03.636285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Centromere location is specified by CENP-A, a centromere-specific histone that epigenetically defines centromere identity. How CENP-A is maintained at one location in rapidly evolving centromeric DNA is unknown. Using single-cell-derived clones of human cell lines, we demonstrate single-cell heterogeneity in CENP-A position within cell populations at neocentromeres and a native centromere. CENP-A heterogeneity is accompanied by unique DNA methylation and H3K9me3 patterns, with DNA methylation shifting according to CENP-A position. We further demonstrate centromere epigenetic evolution over prolonged proliferation, with native centromeres maintaining stable heterochromatin boundaries, but neocentromeres exhibiting DNA methylation instability, H3K9me3 gain, boundary loss and fragility. Lastly, prolonged CENP-A and HJURP overexpression leads to centromere and neocentromere expansion, gradual CENP-A depletion, neocentromere destabilization and CENP-A re-localization that is accompanied by local heterochromatin remodeling. This study reveals the naturally evolving epigenetic plasticity of human centromeres and neocentromeres and highlights the importance of repressive chromatin boundaries in maintaining centromere stability.
Collapse
|
12
|
Hu G, Wang Z, Tian Z, Wang K, Ji G, Wang X, Zhang X, Yang Z, Liu X, Niu R, Zhu D, Zhang Y, Duan L, Ma X, Xiong X, Kong J, Zhao X, Zhang Y, Zhao J, He S, Grover CE, Su J, Feng K, Yu G, Han J, Zang X, Wu Z, Pan W, Wendel JF, Ma X. A telomere-to-telomere genome assembly of cotton provides insights into centromere evolution and short-season adaptation. Nat Genet 2025; 57:1031-1043. [PMID: 40097785 DOI: 10.1038/s41588-025-02130-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 02/14/2025] [Indexed: 03/19/2025]
Abstract
Cotton (Gossypium hirsutum L.) is a key allopolyploid crop with global economic importance. Here we present a telomere-to-telomere assembly of the elite variety Zhongmian 113. Leveraging technologies including PacBio HiFi, Oxford Nanopore Technology (ONT) ultralong-read sequencing and Hi-C, our assembly surpasses previous genomes in contiguity and completeness, resolving 26 centromeric and 52 telomeric regions, 5S rDNA clusters and nucleolar organizer regions. A phylogenetically recent centromere repositioning on chromosome D08 was discovered specific to G. hirsutum, involving deactivation of an ancestral centromere and the formation of a unique, satellite repeat-based centromere. Genomic analyses evaluated favorable allele aggregation for key agronomic traits and uncovered an early-maturing haplotype derived from an 11 Mb pericentric inversion that evolved early during G. hirsutum domestication. Our study sheds light on the genomic origins of short-season adaptation, potentially involving introgression of an inversion from primitively domesticated forms, followed by subsequent haplotype differentiation in modern breeding programs.
Collapse
Affiliation(s)
- Guanjing Hu
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, China
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Zhenyu Wang
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, China
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, China
| | - Zunzhe Tian
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Kai Wang
- School of Life Sciences, Nantong University, Nantong, China
| | - Gaoxiang Ji
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, China
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, China
| | - Xingxing Wang
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, China
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, China
| | - Xianliang Zhang
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, China
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, China
- Western Research Institute, Chinese Academy of Agricultural Sciences, Changji, China
| | - Zhaoen Yang
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, China
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, China
| | - Xuan Liu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, China
| | - Ruoyu Niu
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, China
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - De Zhu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Yuzhi Zhang
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, China
| | - Lian Duan
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, China
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Xueyuan Ma
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Xianpeng Xiong
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Jiali Kong
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Xianjia Zhao
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Ya Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Junjie Zhao
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, China
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, China
| | - Shoupu He
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, China
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, China
| | - Corrinne E Grover
- Department of Ecology, Evolution and Organismal Biology, Iowa State University, Ames, Iowa, USA
| | - Junji Su
- State Key Laboratory of Aridland Crop Science, College of Life Science and Technology, Gansu Agricultural University, Lanzhou, China
| | - Keyun Feng
- Crop Research Institute, Gansu Academy of Agricultural Sciences, Lanzhou, China
| | - Guangrun Yu
- School of Life Sciences, Nantong University, Nantong, China
| | - Jinlei Han
- School of Life Sciences, Nantong University, Nantong, China
| | - Xinshan Zang
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, China
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, China
| | - Zhiqiang Wu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Weihua Pan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Jonathan F Wendel
- Department of Ecology, Evolution and Organismal Biology, Iowa State University, Ames, Iowa, USA
| | - Xiongfeng Ma
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, China.
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, China.
| |
Collapse
|
13
|
Kovaka S, Hook PW, Jenike KM, Shivakumar V, Morina LB, Razaghi R, Timp W, Schatz MC. Uncalled4 improves nanopore DNA and RNA modification detection via fast and accurate signal alignment. Nat Methods 2025; 22:681-691. [PMID: 40155722 PMCID: PMC11978507 DOI: 10.1038/s41592-025-02631-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 02/16/2025] [Indexed: 04/01/2025]
Abstract
Nanopore signal analysis enables detection of nucleotide modifications from native DNA and RNA sequencing, providing both accurate genetic or transcriptomic and epigenetic information without additional library preparation. At present, only a limited set of modifications can be directly basecalled (for example, 5-methylcytosine), while most others require exploratory methods that often begin with alignment of nanopore signal to a nucleotide reference. We present Uncalled4, a toolkit for nanopore signal alignment, analysis and visualization. Uncalled4 features an efficient banded signal alignment algorithm, BAM signal alignment file format, statistics for comparing signal alignment methods and a reproducible de novo training method for k-mer-based pore models, revealing potential errors in Oxford Nanopore Technologies' state-of-the-art DNA model. We apply Uncalled4 to RNA 6-methyladenine (m6A) detection in seven human cell lines, identifying 26% more modifications than Nanopolish using m6Anet, including in several genes where m6A has known implications in cancer. Uncalled4 is available open source at github.com/skovaka/uncalled4 .
Collapse
Affiliation(s)
- Sam Kovaka
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
| | - Paul W Hook
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Katharine M Jenike
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Vikram Shivakumar
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Luke B Morina
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Roham Razaghi
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
14
|
Golub Y, Wulff A, Plösch T. From haze to horizon: epigenetic research and artificial intelligence in child and adolescent psychiatry. Eur Child Adolesc Psychiatry 2025; 34:1245-1248. [PMID: 40111558 PMCID: PMC12000212 DOI: 10.1007/s00787-025-02686-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/17/2025] [Accepted: 02/18/2025] [Indexed: 03/22/2025]
Affiliation(s)
- Yulia Golub
- Department of Child and Adolescent Psychiatry, Psychosomatic and Psychotherapy, School of Medicine and Health Sciences, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany.
| | - Antje Wulff
- Big Data in Medicine, Department of Health Services Research, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| | - Torsten Plösch
- Department of Human Medicine, Division of Perinatal Neurobiology, School of Medicine and Health Science, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
- Department of Obstetrics and Gynaecology, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| |
Collapse
|
15
|
Fu Y, Timp W, Sedlazeck FJ. Computational analysis of DNA methylation from long-read sequencing. Nat Rev Genet 2025:10.1038/s41576-025-00822-5. [PMID: 40155770 DOI: 10.1038/s41576-025-00822-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/30/2025] [Indexed: 04/01/2025]
Abstract
DNA methylation is a critical epigenetic mechanism in numerous biological processes, including gene regulation, development, ageing and the onset of various diseases such as cancer. Studies of methylation are increasingly using single-molecule long-read sequencing technologies to simultaneously measure epigenetic states such as DNA methylation with genomic variation. These long-read data sets have spurred the continuous development of advanced computational methods to gain insights into the roles of methylation in regulating chromatin structure and gene regulation. In this Review, we discuss the computational methods for calling methylation signals, contrasting methylation between samples, analysing cell-type diversity and gaining additional genomic insights, and then further discuss the challenges and future perspectives of tool development for DNA methylation research.
Collapse
Affiliation(s)
- Yilei Fu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
16
|
Jayakrishnan M, Havlová M, Veverka V, Regnard C, Becker PB. Genomic context-dependent histone H3K36 methylation by three Drosophila methyltransferases and implications for dedicated chromatin readers. Nucleic Acids Res 2025; 53:gkaf202. [PMID: 40164442 DOI: 10.1093/nar/gkaf202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2025] [Accepted: 03/06/2025] [Indexed: 04/02/2025] Open
Abstract
Methylation of histone H3 at lysine 36 (H3K36me3) marks active chromatin. The mark is interpreted by epigenetic readers that assist transcription and safeguard chromatin fiber integrity. In Drosophila, the chromodomain protein MSL3 binds H3K36me3 at X-chromosomal genes to implement dosage compensation. The PWWP-domain protein JASPer recruits the JIL1 kinase to active chromatin on all chromosomes. Because depletion of K36me3 had variable, locus-specific effects on the interactions of those readers, we systematically studied K36 methylation in a defined cellular model. Contrasting prevailing models, we found that K36me1, K36me2, and K36me3 each contribute to distinct chromatin states. Monitoring the changing K36 methylation landscape upon depletion of the three methyltransferases Set2, NSD, and Ash1 revealed local, context-specific methylation signatures. Each methyltransferase governs K36 methylation in dedicated genomic regions, with minor overlaps. Set2 catalyzes K36me3 predominantly at transcriptionally active euchromatin. NSD places K36me2/3 at defined loci within pericentric heterochromatin and on weakly transcribed euchromatic genes. Ash1 deposits K36me1 at putative enhancers. The mapping of MSL3 and JASPer suggested that they bind K36me2 in addition to K36me3, which was confirmed by direct affinity measurement. This dual specificity attracts the readers to a broader range of chromosomal locations and increases the robustness of their actions.
Collapse
Affiliation(s)
- Muhunden Jayakrishnan
- Molecular Biology Division, Biomedical Center, Ludwig-Maximilians-Universität, 82152 Munich, Germany
| | - Magdalena Havlová
- Institute of Organic Chemistry and Biochemistry (IOCB) of the Czech Academy of Sciences, 166 10 Prague, Czech Republic
| | - Václav Veverka
- Institute of Organic Chemistry and Biochemistry (IOCB) of the Czech Academy of Sciences, 166 10 Prague, Czech Republic
- Department of Cell Biology, Faculty of Science, Charles University, 128 44 Prague, Czech Republic
| | - Catherine Regnard
- Molecular Biology Division, Biomedical Center, Ludwig-Maximilians-Universität, 82152 Munich, Germany
| | - Peter B Becker
- Molecular Biology Division, Biomedical Center, Ludwig-Maximilians-Universität, 82152 Munich, Germany
| |
Collapse
|
17
|
Koutsi M, Pouliou M, Chatzopoulos D, Champezou L, Zagkas K, Vasilogianni M, Kouroukli A, Agelopoulos M. An evolutionarily conserved constellation of functional cis-elements programs the virus-responsive fate of the human (epi)genome. Nucleic Acids Res 2025; 53:gkaf207. [PMID: 40131776 PMCID: PMC11934927 DOI: 10.1093/nar/gkaf207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 02/11/2025] [Accepted: 03/04/2025] [Indexed: 03/27/2025] Open
Abstract
Human health depends on perplexing defensive cellular responses against microbial pathogens like Viruses. Despite the major effort undertaken, the (epi)genomic mechanisms that human cells utilize to tailor defensive gene expression programs against microbial attacks have remained inadequately understood, mainly due to a significant lack of recording of the in vivo functional cis-regulatory modules (CRMs) of the human genome. Here, we introduce the virus-responsive fate of the human (epi)genome as characterized in naïve and infected cells by functional genomics, computational biology, DNA evolution, and DNA Grammar and Syntax investigations. We discovered that multitudes of novel functional virus-responsive CRMs (vrCRMs) compose typical enhancers (tEs), super-enhancers (SEs), repetitive-DNA enhancers (rDEs), and stand-alone functional genomic stretches that grant human cells regulatory underpinnings for layering basal immunity and eliminating illogical/harmful defensive responses under homeostasis, yet stimulating virus-responsive genes and transposable elements (TEs) upon infection. Moreover, extensive epigenomic reprogramming of previously unknown SE landscapes marks the transition from naïve to antiviral human cell states and involves the functions of the antimicrobial transcription factors (TFs), including interferon response factor 3 (IRF3) and nuclear factor-κB (NF-κB), as well as coactivators and transcriptional apparatus, along with intensive modifications/alterations in histone marks and chromatin accessibility. Considering the polyphyletic evolutionary fingerprints of the composite DNA sequences of the vrCRMs assessed by TFs-STARR-seq, ranging from the animal to microbial kingdoms, the conserved features of antimicrobial TFs and chromatin complexes, and their pluripotent stimulus-induced activation, these findings shed light on how mammalian (epi)genomes evolved their functions to interpret the exogenous stress inflicted and program defensive transcriptional responses against microbial agents. Crucially, many known human short variants, e.g. single-nucleotide polymorphisms (SNPs), insertions, deletions etc., and quantitative trait loci (QTLs) linked to autoimmune diseases, such as multiple sclerosis (MS), systemic lupus erythematosus (SLE), Crohn's disease (CD) etc., were mapped within or vastly proximal (±2.5 kb) to the novel in vivo functional SEs and vrCRMs discovered, thus underscoring the impact of their (mal)functions on human physiology and disease development. Hence, we delved into the virus-responsive fate of the human (epi)genome and illuminated its architecture, function, evolutionary origins, and its significance for cellular homeostasis. These results allow us to chart the "Human hyper-Atlas of virus-infection", an integrated "molecular in silico" encyclopedia situated in the UCSC Genome Browser that benefits our mechanistic understanding of human infectious/(auto)immune diseases development and can facilitate the generation of in vivo preclinical animal models, drug design, and evolution of therapeutic applications.
Collapse
Affiliation(s)
- Marianna A Koutsi
- Center of Basic Research, Biomedical Research Foundation, Academy of Athens, Athens 11527, Greece
| | - Marialena Pouliou
- Center of Basic Research, Biomedical Research Foundation, Academy of Athens, Athens 11527, Greece
| | - Dimitris Chatzopoulos
- Center of Basic Research, Biomedical Research Foundation, Academy of Athens, Athens 11527, Greece
| | - Lydia Champezou
- Center of Basic Research, Biomedical Research Foundation, Academy of Athens, Athens 11527, Greece
| | - Konstantinos Zagkas
- Center of Basic Research, Biomedical Research Foundation, Academy of Athens, Athens 11527, Greece
| | - Marili Vasilogianni
- Center of Basic Research, Biomedical Research Foundation, Academy of Athens, Athens 11527, Greece
| | - Alexandra G Kouroukli
- Center of Basic Research, Biomedical Research Foundation, Academy of Athens, Athens 11527, Greece
| | - Marios Agelopoulos
- Center of Basic Research, Biomedical Research Foundation, Academy of Athens, Athens 11527, Greece
| |
Collapse
|
18
|
Smeds L, Kamali K, Kejnovská I, Kejnovský E, Chiaromonte F, Makova KD. Non-canonical DNA in human and other ape telomere-to-telomere genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.09.02.610891. [PMID: 39713403 PMCID: PMC11661062 DOI: 10.1101/2024.09.02.610891] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]
Abstract
Non-canonical (non-B) DNA structures-e.g., bent DNA, hairpins, G-quadruplexes (G4s), Z-DNA, etc.-which form at certain sequence motifs (e.g., A-phased repeats, inverted repeats, etc.), have emerged as important regulators of cellular processes and drivers of genome evolution. Yet, they have been understudied due to their repetitive nature and potentially inaccurate sequences generated with short-read technologies. Here we comprehensively characterize such motifs in the long-read telomere-to-telomere (T2T) genomes of human, bonobo, chimpanzee, gorilla, Bornean orangutan, Sumatran orangutan, and siamang. Non-B DNA motifs are enriched at the genomic regions added to T2T assemblies, and occupy 9-15%, 9-11%, and 12-38% of autosomes, and chromosomes X and Y, respectively. G4s and Z-DNA are enriched at promoters and enhancers, as well as at origins of replication. Repetitive sequences harbor more non-B DNA motifs than non-repetitive sequences, especially in the short arms of acrocentric chromosomes. Most centromeres and/or their flanking regions are enriched in at least one non-B DNA motif type, consistent with a potential role of non-B structures in determining centromeres. Our results highlight the uneven distribution of predicted non-B DNA structures across ape genomes and suggest their novel functions in previously inaccessible genomic regions.
Collapse
Affiliation(s)
- Linnéa Smeds
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Kaivan Kamali
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Iva Kejnovská
- Department of Biophysics of Nucleic Acids, Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 612 65 Brno, Czech Republic
| | - Eduard Kejnovský
- Department of Plant Developmental Genetics, Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 612 65 Brno, Czech Republic
| | - Francesca Chiaromonte
- Department of Statistics, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park, PA 16802 USA
- L'EMbeDS, Sant'Anna School of Advanced Studies, 56127 Pisa, Italy
| | - Kateryna D Makova
- Department of Biology, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park, PA 16802 USA
| |
Collapse
|
19
|
Kixmoeller K, Tarasovetc EV, Mer E, Chang YW, Black BE. Centromeric chromatin clearings demarcate the site of kinetochore formation. Cell 2025; 188:1280-1296.e19. [PMID: 39855195 PMCID: PMC11890969 DOI: 10.1016/j.cell.2024.12.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 11/24/2024] [Accepted: 12/18/2024] [Indexed: 01/27/2025]
Abstract
The centromere is the chromosomal locus that recruits the kinetochore, directing faithful propagation of the genome during cell division. Using cryo-ET on human mitotic chromosomes, we reveal a distinctive architecture at the centromere: clustered 20- to 25-nm nucleosome-associated complexes within chromatin clearings that delineate them from surrounding chromatin. Centromere components CENP-C and CENP-N are each required for the integrity of the complexes, while CENP-C is also required to maintain the chromatin clearing. We find that CENP-C is required in mitosis, not just for kinetochore assembly, likely reflecting its role in organizing the inner kinetochore during chromosome segregation. We further visualize the scaffold of the fibrous corona, a structure amplified at unattached kinetochores, revealing crescent-shaped parallel arrays of fibrils extending >1 μm. Thus, we reveal how the organization of centromeric chromatin creates a clearing at the site of kinetochore formation as well as the nature of kinetochore amplification mediated by corona fibrils.
Collapse
Affiliation(s)
- Kathryn Kixmoeller
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Biochemistry, Biophysics, Chemical Biology Graduate Group, University of Pennsylvania, Philadelphia, PA, USA; Institute of Structural Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Penn Center for Genome Integrity, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Ekaterina V Tarasovetc
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Institute of Structural Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Penn Center for Genome Integrity, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Elie Mer
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Biochemistry, Biophysics, Chemical Biology Graduate Group, University of Pennsylvania, Philadelphia, PA, USA; Institute of Structural Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Penn Center for Genome Integrity, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yi-Wei Chang
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Biochemistry, Biophysics, Chemical Biology Graduate Group, University of Pennsylvania, Philadelphia, PA, USA; Institute of Structural Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| | - Ben E Black
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Biochemistry, Biophysics, Chemical Biology Graduate Group, University of Pennsylvania, Philadelphia, PA, USA; Institute of Structural Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Penn Center for Genome Integrity, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
20
|
Chen Y, Lin ZB, Wang SK, Wu B, Niu L, Zhong JY, Sun YM, Zheng Z, Bai X, Liu LR, Xie W, Chi W, Ye T, Luo R, Hou C, Luo F, Xiao CL. Reconstruction of diploid higher-order human 3D genome interactions from noisy Pore-C data using Dip3D. Nat Struct Mol Biol 2025:10.1038/s41594-025-01512-w. [PMID: 40038455 DOI: 10.1038/s41594-025-01512-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 02/05/2025] [Indexed: 03/06/2025]
Abstract
Differential high-order chromatin interactions between homologous chromosomes affect many biological processes. Traditional chromatin conformation capture genome analysis methods mainly identify two-way interactions and cannot provide comprehensive haplotype information, especially for low-heterozygosity organisms such as human. Here, we present a pipeline of methods to delineate diploid high-order chromatin interactions from noisy Pore-C outputs. We trained a previously published single-nucleotide variant (SNV)-calling deep learning model, Clair3, on Pore-C data to achieve superior SNV calling, applied a filtering strategy to tag reads for haplotypes and established a haplotype imputation strategy for high-order concatemers. Learning the haplotype characteristics of high-order concatemers from high-heterozygosity mouse allowed us to devise a progressive haplotype imputation strategy, which improved the haplotype-informative Pore-C contact rate 14.1-fold to 76% in the HG001 cell line. Overall, the diploid three-dimensional (3D) genome interactions we derived using Dip3D surpassed conventional methods in noise reduction and contact distribution uniformity, with better haplotype-informative contact density and genomic coverage rates. Dip3D identified previously unresolved haplotype high-order interactions, in addition to an understanding of their relationship with allele-specific expression, such as in X-chromosome inactivation. These results lead us to conclude that Dip3D is a robust pipeline for the high-quality reconstruction of diploid high-order 3D genome interactions.
Collapse
Affiliation(s)
- Ying Chen
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, China
- Guangdong Key Laboratory of Liver Disease Research, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
- Shenzhen Eye Hospital, Shenzhen Eye Medical Center, Southern Medical University, Shenzhen, China
| | - Zhuo-Bin Lin
- Guangdong Key Laboratory of Liver Disease Research, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Shao-Kai Wang
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada
| | - Bo Wu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, China
| | - Longjian Niu
- Shenzhen Eye Hospital, Shenzhen Eye Medical Center, Southern Medical University, Shenzhen, China
| | - Jia-Yong Zhong
- Shenzhen Eye Hospital, Shenzhen Eye Medical Center, Southern Medical University, Shenzhen, China
| | - Yi-Meng Sun
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, China
| | - Zhenxian Zheng
- Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Xin Bai
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, China
| | - Luo-Ran Liu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, China
| | - Wei Xie
- Center for Stem Cell Biology and Regenerative Medicine, MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
| | - Wei Chi
- Shenzhen Eye Hospital, Shenzhen Eye Medical Center, Southern Medical University, Shenzhen, China
| | | | - Ruibang Luo
- Department of Computer Science, The University of Hong Kong, Hong Kong, China.
| | - Chunhui Hou
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.
| | - Feng Luo
- School of Computing, Clemson University, Clemson, SC, USA.
| | - Chuan-Le Xiao
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, China.
| |
Collapse
|
21
|
Ramirez P, Sun W, Dehkordi SK, Zare H, Pascarella G, Carninci P, Fongang B, Bieniek KF, Frost B. Nanopore Long-Read Sequencing Unveils Genomic Disruptions in Alzheimer's Disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.02.01.578450. [PMID: 38370753 PMCID: PMC10871260 DOI: 10.1101/2024.02.01.578450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Studies in laboratory models and postmortem human brain tissue from patients with Alzheimer's disease have revealed disruption of basic cellular processes such as DNA repair and epigenetic control as drivers of neurodegeneration. While genomic alterations in regions of the genome that are rich in repetitive sequences, often termed "dark regions," are difficult to resolve using traditional sequencing approaches, long-read technologies offer promising new avenues to explore previously inaccessible regions of the genome. In the current study, we leverage nanopore-based long-read whole-genome sequencing of DNA extracted from postmortem human frontal cortex at early and late stages of Alzheimer's disease, as well as age-matched controls, to analyze retrotransposon insertion events, non-allelic homologous recombination (NAHR), structural variants and DNA methylation within retrotransposon loci and other repetitive/dark regions of the human genome. Interestingly, we find that retrotransposon insertion events and repetitive element-associated NAHR are particularly enriched within centromeric and pericentromeric regions of DNA in the aged human brain, and that ribosomal DNA (rDNA) is subject to a high degree of NAHR compared to other regions of the genome. We detect a trending increase in potential somatic retrotransposition events of the small interfering nuclear element (SINE) AluY in late-stage Alzheimer's disease, and differential changes in methylation within repetitive elements and retrotransposons according to disease stage. Taken together, our analysis provides the first long-read DNA sequencing-based analysis of retrotransposon sequences, NAHR, structural variants, and DNA methylation in the aged brain, and points toward transposable elements, centromeric/pericentromeric regions and rDNA as hotspots for genomic variation.
Collapse
Affiliation(s)
- Paulino Ramirez
- Barshop Institute for Longevity and Aging Studies
- Glenn Biggs Institute for Alzheimer’s and Neurodegenerative Diseases
- Department of Cell Systems and Anatomy, University of Texas Health San Antonio, San Antonio, Texas
- Brown University, Providence, Rhode Island
| | - Wenyan Sun
- Barshop Institute for Longevity and Aging Studies
- Glenn Biggs Institute for Alzheimer’s and Neurodegenerative Diseases
- Department of Cell Systems and Anatomy, University of Texas Health San Antonio, San Antonio, Texas
- Clinical Neuroscience Research Center, Department of Neurosurgery, School of Medicine, Tulane University, New Orleans, Louisiana
| | - Shiva Kazempour Dehkordi
- Glenn Biggs Institute for Alzheimer’s and Neurodegenerative Diseases
- Department of Cell Systems and Anatomy, University of Texas Health San Antonio, San Antonio, Texas
| | - Habil Zare
- Glenn Biggs Institute for Alzheimer’s and Neurodegenerative Diseases
- Department of Cell Systems and Anatomy, University of Texas Health San Antonio, San Antonio, Texas
| | | | - Piero Carninci
- RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Bernard Fongang
- Glenn Biggs Institute for Alzheimer’s and Neurodegenerative Diseases
- Department of Biochemistry & Structural Biology, University of Texas Health San Antonio, San Antonio, Texas
| | - Kevin F. Bieniek
- Glenn Biggs Institute for Alzheimer’s and Neurodegenerative Diseases
- Department of Pathology, University of Texas Health San Antonio, San Antonio, Texas
| | - Bess Frost
- Barshop Institute for Longevity and Aging Studies
- Glenn Biggs Institute for Alzheimer’s and Neurodegenerative Diseases
- Department of Cell Systems and Anatomy, University of Texas Health San Antonio, San Antonio, Texas
- Brown University, Providence, Rhode Island
| |
Collapse
|
22
|
Brändl B, Steiger M, Kubelt C, Rohrandt C, Zhu Z, Evers M, Wang G, Schuldt B, Afflerbach AK, Wong D, Lum A, Halldorsson S, Djirackor L, Leske H, Magadeeva S, Smičius R, Quedenau C, Schmidt NO, Schüller U, Vik-Mo EO, Proescholdt M, Riemenschneider MJ, Zadeh G, Ammerpohl O, Yip S, Synowitz M, van Bömmel A, Kretzmer H, Müller FJ. Rapid brain tumor classification from sparse epigenomic data. Nat Med 2025; 31:840-848. [PMID: 40021833 PMCID: PMC11922770 DOI: 10.1038/s41591-024-03435-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 11/27/2024] [Indexed: 03/03/2025]
Abstract
Although the intraoperative molecular diagnosis of the approximately 100 known brain tumor entities described to date has been a goal of neuropathology for the past decade, achieving this within a clinically relevant timeframe of under 1 h after biopsy collection remains elusive. Advances in third-generation sequencing have brought this goal closer, but established machine learning techniques rely on computationally intensive methods, making them impractical for live diagnostic workflows in clinical applications. Here we present MethyLYZR, a naive Bayesian framework enabling fully tractable, live classification of cancer epigenomes. For evaluation, we used nanopore sequencing to classify over 200 brain tumor samples, including 10 sequenced in a clinical setting next to the operating room, achieving highly accurate results within 15 min of sequencing. MethyLYZR can be run in parallel with an ongoing nanopore experiment with negligible computational overhead. Therefore, the only limiting factors for even faster time to results are DNA extraction time and the nanopore sequencer's maximum parallel throughput. Although more evidence from prospective studies is needed, our study suggests the potential applicability of MethyLYZR for live molecular classification of nervous system malignancies using nanopore sequencing not only for the neurosurgical intraoperative use case but also for other oncologic indications and the classification of tumors from cell-free DNA in liquid biopsies.
Collapse
Affiliation(s)
- Björn Brändl
- Department of Psychiatry and Psychotherapy, Christian-Albrecht University of Kiel, Kiel, Germany
| | - Mara Steiger
- Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin, Germany
- Digital Health Cluster, Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
- Department of Mathematics and Computer Science, Free University Berlin, Berlin, Germany
| | - Carolin Kubelt
- Department of Neurosurgery, University Medical Center Schleswig-Holstein (UKSH), Campus Kiel, Kiel, Germany
| | - Christian Rohrandt
- Department of Psychiatry and Psychotherapy, Christian-Albrecht University of Kiel, Kiel, Germany
| | - Zhihan Zhu
- Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Maximilian Evers
- Altona Diagnostics GmbH, Hamburg, Germany
- Institute for Biology and Biotechnology of Plants, University of Münster, Münster, Germany
| | - Gaojianyong Wang
- Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Bernhard Schuldt
- Mathematische Modellierung, Entwicklung und Beratung, Düsseldorf, Germany
| | - Ann-Kristin Afflerbach
- Institute for Tumor Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Derek Wong
- Molecular Oncology, BC Cancer, Vancouver, British Columbia, Canada
- Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pathology and Laboratory Medicine, Faculty of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Amy Lum
- Molecular Oncology, BC Cancer, Vancouver, British Columbia, Canada
| | - Skarphedinn Halldorsson
- Vilhelm Magnus Laboratory for Neurosurgical Research, Institute for Surgical Research/Department of Neurosurgery, Oslo University Hospital, Oslo, Norway
| | - Luna Djirackor
- Vilhelm Magnus Laboratory for Neurosurgical Research, Institute for Surgical Research/Department of Neurosurgery, Oslo University Hospital, Oslo, Norway
| | - Henning Leske
- Vilhelm Magnus Laboratory for Neurosurgical Research, Institute for Surgical Research/Department of Neurosurgery, Oslo University Hospital, Oslo, Norway
- Section of Neuropathology, Department of Pathology, Oslo University Hospital, Oslo, Norway
| | - Svetlana Magadeeva
- Department of Psychiatry and Psychotherapy, Christian-Albrecht University of Kiel, Kiel, Germany
| | - Romualdas Smičius
- Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Claudia Quedenau
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Nils O Schmidt
- Department of Neurosurgery, University Medical Center Regensburg, Regensburg, Germany
- Brain Tumor Center, University Medical Center Regensburg, Regensburg, Germany
| | - Ulrich Schüller
- Department of Pediatric Hematology and Oncology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Research Institute Children's Cancer Center Hamburg, Hamburg, Germany
- Institute of Neuropathology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Einar O Vik-Mo
- Vilhelm Magnus Laboratory for Neurosurgical Research, Institute for Surgical Research/Department of Neurosurgery, Oslo University Hospital, Oslo, Norway
- Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Martin Proescholdt
- Department of Neurosurgery, University Medical Center Regensburg, Regensburg, Germany
- Brain Tumor Center, University Medical Center Regensburg, Regensburg, Germany
| | - Markus J Riemenschneider
- Brain Tumor Center, University Medical Center Regensburg, Regensburg, Germany
- Department of Neuropathology, Regensburg University Hospital, Regensburg, Germany
| | - Gelareh Zadeh
- MacFeeters Hamilton Neuro-Oncology Program, Princess Margaret Cancer Centre, University Health Network and University of Toronto, Toronto, Ontario, Canada
- Division of Neurosurgery, Department of Surgery, University of Toronto, Toronto, Ontario, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Ole Ammerpohl
- Institute for Human Genetics, Ulm University and Ulm University Medical Center, Ulm, Germany
| | - Stephen Yip
- Molecular Oncology, BC Cancer, Vancouver, British Columbia, Canada
- Department of Pathology and Laboratory Medicine, Faculty of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Michael Synowitz
- Department of Neurosurgery, University Medical Center Schleswig-Holstein (UKSH), Campus Kiel, Kiel, Germany
| | - Alena van Bömmel
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany.
- Hoffmann Group, Leibniz Institute on Aging - Fritz Lipmann Institute (FLI), Jena, Germany.
| | - Helene Kretzmer
- Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin, Germany.
- Digital Health Cluster, Hasso Plattner Institute for Digital Engineering, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany.
| | - Franz-Josef Müller
- Department of Psychiatry and Psychotherapy, Christian-Albrecht University of Kiel, Kiel, Germany.
- Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin, Germany.
| |
Collapse
|
23
|
Liang SA, Ren T, Zhang J, He J, Wang X, Jiang X, He Y, McCoy RC, Fu Q, Akey JM, Mao Y, Chen L. A refined analysis of Neanderthal-introgressed sequences in modern humans with a complete reference genome. Genome Biol 2025; 26:32. [PMID: 39962554 PMCID: PMC11834205 DOI: 10.1186/s13059-025-03502-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Accepted: 02/11/2025] [Indexed: 02/20/2025] Open
Abstract
BACKGROUND Leveraging long-read sequencing technologies, the first complete human reference genome, T2T-CHM13, corrects assembly errors in previous references and resolves the remaining 8% of the genome. While studies on archaic admixture in modern humans have so far relied on the GRCh37 reference due to the availability of archaic genome data, the impact of T2T-CHM13 in this field remains unexplored. RESULTS We remap the sequencing reads of the high-quality Altai Neanderthal and Denisovan genomes onto GRCh38 and T2T-CHM13. Compared to GRCh37, we find that T2T-CHM13 significantly improves read mapping quality in archaic samples. We then apply IBDmix to identify Neanderthal-introgressed sequences in 2504 individuals from 26 geographically diverse populations using different reference genomes. We observe that commonly used pre-phasing filtering strategies in public datasets substantially influence archaic ancestry determination, underscoring the need for careful filter selection. Our analysis identifies approximately 51 Mb of Neanderthal sequences unique to T2T-CHM13, predominantly in genomic regions where GRCh38 and T2T-CHM13 assemblies diverge. Additionally, we uncover novel instances of population-specific archaic introgression in diverse populations, spanning genes involved in metabolism, olfaction, and ion-channel function. Finally, to facilitate the exploration of archaic alleles and adaptive signals in human genomics and evolutionary research, we integrate these introgressed sequences and adaptive signals across all reference genomes into a visualization database, ASH ( www.arcseqhub.com ). CONCLUSIONS Our study enhances the detection of archaic variations in modern humans, highlights the importance of utilizing the T2T-CHM13 reference, and provides novel insights into the functional consequences of archaic hominin admixture.
Collapse
Affiliation(s)
- Shen-Ao Liang
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Tianxin Ren
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Jiayu Zhang
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Jiahui He
- Ministry of Education Key Laboratory of Contemporary Anthropology, Center for Evolutionary Biology, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Xuankai Wang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Xinrui Jiang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Yuan He
- Ministry of Education Key Laboratory of Contemporary Anthropology, Center for Evolutionary Biology, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD, 21212, USA
| | - Qiaomei Fu
- Key Laboratory of Vertebrate Evolution and Human Origins, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing, 100044, China
- University of the Chinese Academy of Sciences, Beijing, 100049, China
| | - Joshua M Akey
- The Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, 08540, USA
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, 200030, China.
- Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University, Yiwu, 322000, China.
| | - Lu Chen
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, School of Life Science, Fudan University, Shanghai, 200438, China.
| |
Collapse
|
24
|
Zhang F, Li C, Yang D, Liu B, Zhou Y, Zhou Z, Zhong H, Wang Z, Chen D. Label-Free and Sequence-Independent Isothermal Amplification Strategy for the Simultaneous Detection of Genomic 5-Methylcytosine and 5-Hydroxymethylcytosine. Anal Chem 2025; 97:3063-3073. [PMID: 39869504 DOI: 10.1021/acs.analchem.4c06200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2025]
Abstract
5-Methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) are crucial epigenetic modifications in eukaryotic genomic DNA that regulate gene expression and are associated with the occurrence of various cancers. Here, we combined bisulfite conversion with 4-acetamido-2,2,6,6-tetramethyl-1-oxopiperridinium tetrafluoroborate (ACT+BF4-, TCI) oxidation to develop a label-free and sequence-independent isothermal amplification (BTIA) assay for a genome-wide 5mC and 5hmC analysis. The BTIA strategy can distinguish 5mC and 5hmC signatures from other bases with high sensitivity and good specificity, avoiding sophisticated chemical modifications and expensive protein labeling. Moreover, the utilization of terminal deoxynucleotidyl transferase (TdT) enables the proposed strategy to detect global 5mC and 5hmC without sequence dependence. With only 78 ng of input of genomic DNA, global 5mC and 5hmC levels were accurately quantified in cells (including cancer cells of A549, T47D, and K562 and normal cells of HEK-293T, CHO, and NRK-52E) and clinical whole blood samples (including healthy control, precancerous cervical cancer, and confirmed cervical cancer) within 18 h. The detection results suggested that 5mC was highly expressed in cancer cells. More importantly, a significant increase in 5mC was observed in precancerous cervical cancer and further upregulation in confirmed cervical cancer, suggesting a correlation between 5mC and cancer occurrence and development. However, 5hmC showed the reverse result in these tested cells and clinical samples. Collectively, the BTIA strategy can be easily performed on the ordinary heating apparatus in almost all research and medical laboratories, showing a significant application in the early screening of cervical cancer in the clinic.
Collapse
Affiliation(s)
- Feng Zhang
- School of Pharmaceutical Sciences, Guizhou University, Guiyang 550025, China
| | - Chengpeng Li
- School of Pharmaceutical Sciences, Guizhou University, Guiyang 550025, China
| | - Di Yang
- School of Pharmaceutical Sciences, Guizhou University, Guiyang 550025, China
| | - Bingqian Liu
- School of Pharmaceutical Sciences, Guizhou University, Guiyang 550025, China
| | - Yue Zhou
- School of Pharmaceutical Sciences, Guizhou University, Guiyang 550025, China
| | - Zhixu Zhou
- School of Pharmaceutical Sciences, Guizhou University, Guiyang 550025, China
| | - Hang Zhong
- School of Pharmaceutical Sciences, Guizhou University, Guiyang 550025, China
| | - Zhenchao Wang
- School of Pharmaceutical Sciences, Guizhou University, Guiyang 550025, China
| | - Danping Chen
- School of Pharmaceutical Sciences, Guizhou University, Guiyang 550025, China
| |
Collapse
|
25
|
Tsukamoto S, Mofrad MRK. Bridging scales in chromatin organization: Computational models of loop formation and their implications for genome function. J Chem Phys 2025; 162:054122. [PMID: 39918128 DOI: 10.1063/5.0232328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Accepted: 11/18/2024] [Indexed: 05/08/2025] Open
Abstract
Chromatin loop formation plays a crucial role in 3D genome interactions, with misfolding potentially leading to irregular gene expression and various diseases. While experimental tools such as Hi-C have advanced our understanding of genome interactions, the biophysical principles underlying chromatin loop formation remain elusive. This review examines computational approaches to chromatin folding, focusing on polymer models that elucidate chromatin loop mechanics. We discuss three key models: (1) the multi-loop-subcompartment model, which investigates the structural effects of loops on chromatin conformation; (2) the strings and binders switch model, capturing thermodynamic chromatin aggregation; and (3) the loop extrusion model, revealing the role of structural maintenance of chromosome complexes. In addition, we explore advanced models that address chromatin clustering heterogeneity in biological processes and disease progression. The review concludes with an outlook on open questions and current trends in chromatin loop formation and genome interactions, emphasizing the physical and computational challenges in the field.
Collapse
Affiliation(s)
- Shingo Tsukamoto
- Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, 208A Stanley Hall, Berkeley, California 94720-1762, USA
| | - Mohammad R K Mofrad
- Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, 208A Stanley Hall, Berkeley, California 94720-1762, USA
- Molecular Biophysics and Integrative BioImaging Division, Lawrence Berkeley National Lab, Berkeley, California 94720, USA
| |
Collapse
|
26
|
Castellano KR, Neitzey ML, Starovoitov A, Barrett GA, Reid NM, Vuruputoor VS, Webster CN, Storer JM, Pauloski NR, Ameral NJ, McEvoy SL, McManus MC, Puritz JB, Wegrzyn JL, O’Neill RJ. Genome Assembly of a Living Fossil, the Atlantic Horseshoe Crab Limulus polyphemus, Reveals Lineage-Specific Whole-Genome Duplications, Transposable Element-Based Centromeres, and a ZW Sex Chromosome System. Mol Biol Evol 2025; 42:msaf021. [PMID: 39907027 PMCID: PMC11836539 DOI: 10.1093/molbev/msaf021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 12/16/2024] [Accepted: 01/10/2025] [Indexed: 02/06/2025] Open
Abstract
Horseshoe crabs, considered living fossils with a stable morphotype spanning ∼445 million years, are evolutionarily, ecologically, and biomedically important species experiencing rapid population decline. Of the four extant species of horseshoe crabs, the Atlantic horseshoe crab, Limulus polyphemus, has become an essential component of the modern medicine toolkit. Here, we present the first chromosome-level genome assembly, and the most contiguous and complete assembly to date, for L. polyphemus using nanopore long-read sequencing and chromatin conformation analysis. We find support for three horseshoe crab-specific whole-genome duplications, but none shared with Arachnopulmonata (spiders and scorpions). Moreover, we discovered tandem duplicates of endotoxin detection pathway components Factors C and G, identify candidate centromeres consisting of Gypsy retroelements, and classify the ZW sex chromosome system for this species and a sister taxon, Carcinoscorpius rotundicauda. Finally, we revealed this species has been experiencing a steep population decline over the last 5 million years, highlighting the need for international conservation interventions and fisheries-based management for this critical species.
Collapse
Affiliation(s)
- Kate R Castellano
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| | - Michelle L Neitzey
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| | - Andrew Starovoitov
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
| | - Gabriel A Barrett
- Biological and Environmental Sciences, University of Rhode Island, Kingston, RI 02881, USA
| | - Noah M Reid
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| | - Vidya S Vuruputoor
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
| | - Cynthia N Webster
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
| | - Jessica M Storer
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| | - Nicole R Pauloski
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| | - Natalie J Ameral
- Biological and Environmental Sciences, University of Rhode Island, Kingston, RI 02881, USA
- Division of Marine Fisheries, Rhode Island Department of Environmental Management, Providence, RI 02908, USA
| | - Susan L McEvoy
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
| | - M Conor McManus
- Division of Marine Fisheries, Rhode Island Department of Environmental Management, Providence, RI 02908, USA
| | - Jonathan B Puritz
- Department of Biological Sciences, University of Rhode Island, Kingston, RI 02881, USA
| | - Jill L Wegrzyn
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
| | - Rachel J O’Neill
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
- Department of Genetics and Genome Sciences, UConn Health, Farmington, CT 06030, USA
| |
Collapse
|
27
|
Wang M, Duan S, Sun Q, Liu K, Liu Y, Wang Z, Li X, Wei L, Liu Y, Nie S, Zhou K, Ma Y, Yuan H, Liu B, Hu L, Liu C, He G. YHSeqY3000 panel captures all founding lineages in the Chinese paternal genomic diversity database. BMC Biol 2025; 23:18. [PMID: 39838386 PMCID: PMC11752814 DOI: 10.1186/s12915-025-02122-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 01/07/2025] [Indexed: 01/23/2025] Open
Abstract
BACKGROUND The advancements in second-/third-generation sequencing technologies, alongside computational innovations, have significantly enhanced our understanding of the genomic structure of Y-chromosomes and their unique phylogenetic characteristics. These researches, despite the challenges posed by the lack of population-scale genomic databases, have the potential to revolutionize our approach to high-resolution, population-specific Y-chromosome panels and databases for anthropological and forensic applications. OBJECTIVES This study aimed to develop the highest-resolution Y-targeted sequencing panel, utilizing time-stamped, core phylogenetic informative mutations identified from high-coverage sequences in the YanHuang cohort. This panel is intended to provide a new tool for forensic complex pedigree search and paternal biogeographical ancestry inference, as well as explore the general patterns of the fine-scale paternal evolutionary history of ethnolinguistically diverse Chinese populations. RESULTS The sequencing performance of the East Asian-specific Y-chromosomal panel, including 2999-core SNP variants, was found to be robust and reliable. The YHSeqY3000 panel was designed to capture the genetic diversity of Chinese paternal lineages from 3500 years ago, identifying 408 terminal lineages in 2097 individuals across 41 genetically and geographically distinct populations. We identified a fine-scale paternal substructure that was correlating with ancient population migrations and expansions. New evidence was provided for extensive gene flow events between minority ethnic groups and Han Chinese people, based on the integrative Chinese Paternal Genomic Diversity Database. CONCLUSIONS This work successfully integrated Y-chromosome-related basic genomic science with forensic and anthropological translational applications, emphasizing the necessity of comprehensively characterizing Y-chromosome genomic diversity from genomically under-representative populations. This is particularly important in the second phase of our population-specific medical or anthropological genomic cohorts, where dense sampling strategies are employed.
Collapse
Affiliation(s)
- Mengge Wang
- Institute of Rare Diseases, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610000, Sichuan, China.
- Center for Archaeological Science, Sichuan University, Chengdu, 610000, China.
- Anti-Drug Technology Center of Guangdong Province, Guangzhou, 510230, China.
- Department of Oto-Rhino-Laryngology, West China Hospital of Sichuan University, Chengdu, 610000, China.
| | - Shuhan Duan
- Institute of Rare Diseases, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610000, Sichuan, China
- Center for Archaeological Science, Sichuan University, Chengdu, 610000, China
- School of Basic Medical Sciences, North Sichuan Medical College, Nanchong, 637100, China
- Department of Oto-Rhino-Laryngology, West China Hospital of Sichuan University, Chengdu, 610000, China
| | - Qiuxia Sun
- Institute of Rare Diseases, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610000, Sichuan, China
- Center for Archaeological Science, Sichuan University, Chengdu, 610000, China
- Department of Forensic Medicine, College of Basic Medicine, Chongqing Medical University, Chongqing, 400331, China
| | - Kaijun Liu
- School of International Tourism and Culture, Guizhou Normal University, Guiyang, 550025, China
- MoFang Human Genome Research Institute, Tianfu Software Park, Chengdu, 610042, Sichuan, China
| | - Yan Liu
- Institute of Rare Diseases, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610000, Sichuan, China
- School of Basic Medical Sciences, North Sichuan Medical College, Nanchong, 637100, China
| | - Zhiyong Wang
- Institute of Rare Diseases, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610000, Sichuan, China
- Center for Archaeological Science, Sichuan University, Chengdu, 610000, China
- School of Forensic Medicine, Kunming Medical University, Kunming, 650500, China
| | - Xiangping Li
- Institute of Rare Diseases, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610000, Sichuan, China
- Center for Archaeological Science, Sichuan University, Chengdu, 610000, China
- School of Forensic Medicine, Kunming Medical University, Kunming, 650500, China
| | - Lanhai Wei
- School of Ethnology and Anthropology, Inner Mongolia Normal University, Hohhot, 010028, Inner Mongolia, China
| | - Yunhui Liu
- Institute of Rare Diseases, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610000, Sichuan, China
- Center for Archaeological Science, Sichuan University, Chengdu, 610000, China
- Department of Forensic Medicine, College of Basic Medicine, Chongqing Medical University, Chongqing, 400331, China
| | - Shengjie Nie
- School of Forensic Medicine, Kunming Medical University, Kunming, 650500, China
| | - Kun Zhou
- MoFang Human Genome Research Institute, Tianfu Software Park, Chengdu, 610042, Sichuan, China
| | - Yongxin Ma
- Department of Medical Genetics, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, Sichuan, China
| | - Huijun Yuan
- Institute of Rare Diseases, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610000, Sichuan, China
- Center for Archaeological Science, Sichuan University, Chengdu, 610000, China
| | - Bing Liu
- Institute of Forensic Science, Ministry of Public Security, Beijing, 100038, China
| | - Lan Hu
- Institute of Forensic Science, Ministry of Public Security, Beijing, 100038, China
| | - Chao Liu
- Anti-Drug Technology Center of Guangdong Province, Guangzhou, 510230, China.
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, 510515, China.
| | - Guanglin He
- Institute of Rare Diseases, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610000, Sichuan, China.
- Center for Archaeological Science, Sichuan University, Chengdu, 610000, China.
- Anti-Drug Technology Center of Guangdong Province, Guangzhou, 510230, China.
| |
Collapse
|
28
|
Sobral AF, Dinis-Oliveira RJ, Barbosa DJ. CRISPR-Cas technology in forensic investigations: Principles, applications, and ethical considerations. Forensic Sci Int Genet 2025; 74:103163. [PMID: 39437497 DOI: 10.1016/j.fsigen.2024.103163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 10/08/2024] [Accepted: 10/09/2024] [Indexed: 10/25/2024]
Abstract
CRISPR-Cas (Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR-associated proteins) systems are adaptive immune systems originally present in bacteria, where they are essential to protect against external genetic elements, including viruses and plasmids. Taking advantage of this system, CRISPR-Cas-based technologies have emerged as incredible tools for precise genome editing, thus significantly advancing several research fields. Forensic sciences represent a multidisciplinary field that explores scientific methods to investigate and resolve legal issues, particularly criminal investigations and subject identification. Consequently, it plays a critical role in the justice system, providing scientific evidence to support judicial investigations. Although less explored, CRISPR-Cas-based methodologies demonstrate strong potential in the field of forensic sciences due to their high accuracy and sensitivity, including DNA profiling and identification, interpretation of crime scene investigations, detection of food contamination or fraud, and other aspects related to environmental forensics. However, using CRISPR-Cas-based methodologies in human samples raises several ethical issues and concerns regarding the potential misuse of individual genetic information. In this manuscript, we provide an overview of potential applications of CRISPR-Cas-based methodologies in several areas of forensic sciences and discuss the legal implications that challenge their routine implementation in this research field.
Collapse
Affiliation(s)
- Ana Filipa Sobral
- Associate Laboratory i4HB - Institute for Health and Bioeconomy, University Institute of Health Sciences - CESPU, Gandra 4585-116, Portugal; UCIBIO - Applied Molecular Biosciences Unit, Toxicologic Pathology Research Laboratory, University Institute of Health Sciences (1H-TOXRUN, IUCS-CESPU), Gandra 4585-116, Portugal.
| | - Ricardo Jorge Dinis-Oliveira
- Associate Laboratory i4HB - Institute for Health and Bioeconomy, University Institute of Health Sciences - CESPU, Gandra 4585-116, Portugal; UCIBIO - Applied Molecular Biosciences Unit, Translational Toxicology Research Laboratory, University Institute of Health Sciences (1H-TOXRUN, IUCS-CESPU), Gandra 4585-116, Portugal; Department of Public Health and Forensic Sciences and Medical Education, Faculty of Medicine, University of Porto, Porto 4200-319, Portugal; FOREN - Forensic Science Experts, Dr. Mário Moutinho Avenue, No. 33-A, Lisbon 1400-136, Portugal.
| | - Daniel José Barbosa
- Associate Laboratory i4HB - Institute for Health and Bioeconomy, University Institute of Health Sciences - CESPU, Gandra 4585-116, Portugal; UCIBIO - Applied Molecular Biosciences Unit, Translational Toxicology Research Laboratory, University Institute of Health Sciences (1H-TOXRUN, IUCS-CESPU), Gandra 4585-116, Portugal.
| |
Collapse
|
29
|
Ferreira MR, Carratto TMT, Frontanilla TS, Bonadio RS, Jain M, de Oliveira SF, Castelli EC, Mendes-Junior CT. Advances in forensic genetics: Exploring the potential of long read sequencing. Forensic Sci Int Genet 2025; 74:103156. [PMID: 39427416 DOI: 10.1016/j.fsigen.2024.103156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Revised: 10/04/2024] [Accepted: 10/06/2024] [Indexed: 10/22/2024]
Abstract
DNA-based technologies have been used in forensic practice since the mid-1980s. While PCR-based STR genotyping using Capillary Electrophoresis remains the gold standard for generating DNA profiles in routine casework worldwide, the research community is continually seeking alternative methods capable of providing additional information to enhance discrimination power or contribute with new investigative leads. Oxford Nanopore Technologies (ONT) and PacBio third-generation sequencing have revolutionized the field, offering real-time capabilities, single-molecule resolution, and long-read sequencing (LRS). ONT, the pioneer of nanopore sequencing, uses biological nanopores to analyze nucleic acids in real-time. Its devices have revolutionized sequencing and may represent an interesting alternative for forensic research and routine casework, given that it offers unparalleled flexibility in a portable size: it enables sequencing approaches that range widely from PCR-amplified short target regions (e.g., CODIS STRs) to PCR-free whole transcriptome or even ultra-long whole genome sequencing. Despite its higher error rate compared to Illumina sequencing, it can significantly improve accuracy in read alignment against a reference genome or de novo genome assembly. This is achieved by generating long contiguous sequences that correctly assemble repetitive sections and regions with structural variation. Moreover, it allows real-time determination of DNA methylation status from native DNA without the need for bisulfite conversion. LRS enables the analysis of thousands of markers at once, providing phasing information and eliminating the need for multiple assays. This maximizes the information retrieved from a single invaluable sample. In this review, we explore the potential use of LRS in different forensic genetics approaches.
Collapse
Affiliation(s)
- Marcel Rodrigues Ferreira
- Molecular Genetics and Bioinformatics Laboratory, Experimental Research Unit - Unipex, School of Medicine, São Paulo State University - Unesp, Botucatu, São Paulo, Brazil
| | - Thássia Mayra Telles Carratto
- Departamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP 14040-901, Brazil
| | - Tamara Soledad Frontanilla
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP 14049-900, Brazil
| | - Raphael Severino Bonadio
- Depto Genética e Morfologia, Instituto de Ciências Biológicas, Universidade de Brasília, Brasília, DF, Brazil
| | - Miten Jain
- Department of Bioengineering, Department of Physics, Khoury College of Computer Sciences, Northeastern University, Boston, MA, United States
| | | | - Erick C Castelli
- Molecular Genetics and Bioinformatics Laboratory, Experimental Research Unit - Unipex, School of Medicine, São Paulo State University - Unesp, Botucatu, São Paulo, Brazil; Pathology Department, School of Medicine, São Paulo State University - Unesp, Botucatu, São Paulo, Brazil
| | - Celso Teixeira Mendes-Junior
- Departamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP 14040-901, Brazil.
| |
Collapse
|
30
|
Luo LY, Wu H, Zhao LM, Zhang YH, Huang JH, Liu QY, Wang HT, Mo DX, EEr HH, Zhang LQ, Chen HL, Jia SG, Wang WM, Li MH. Telomere-to-telomere sheep genome assembly identifies variants associated with wool fineness. Nat Genet 2025; 57:218-230. [PMID: 39779954 DOI: 10.1038/s41588-024-02037-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 11/19/2024] [Indexed: 01/11/2025]
Abstract
Ongoing efforts to improve sheep reference genome assemblies still leave many gaps and incomplete regions, resulting in a few common failures and errors in genomic studies. Here, we report a 2.85-Gb gap-free telomere-to-telomere genome of a ram (T2T-sheep1.0), including all autosomes and the X and Y chromosomes. This genome adds 220.05 Mb of previously unresolved regions and 754 new genes to the most updated reference assembly ARS-UI_Ramb_v3.0; it contains four types of repeat units (SatI, SatII, SatIII and CenY) in centromeric regions. T2T-sheep1.0 has a base accuracy of more than 99.999%, corrects several structural errors in previous reference assemblies and improves structural variant detection in repetitive sequences. Alignment of whole-genome short-read sequences of global domestic and wild sheep against T2T-sheep1.0 identifies 2,664,979 new single-nucleotide polymorphisms in previously unresolved regions, which improves the population genetic analyses and detection of selective signals for domestication (for example, ABCC4) and wool fineness (for example, FOXQ1).
Collapse
Affiliation(s)
- Ling-Yun Luo
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Hui Wu
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Li-Ming Zhao
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems; Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs; Engineering Research Center of Grassland Industry, Ministry of Education; College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou, China
| | - Ya-Hui Zhang
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Jia-Hui Huang
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Qiu-Yue Liu
- Institute of Genetics and Developmental Biology, The Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing, China
| | - Hai-Tao Wang
- Institute of Genetics and Developmental Biology, The Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing, China
| | - Dong-Xin Mo
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - He-Hua EEr
- Institute of Animal Science, Ningxia Academy of Agriculture and Forestry Sciences, Yinchuan, China
| | - Lian-Quan Zhang
- Ningxia Shuomuyanchi Tan Sheep Breeding Co. Ltd., Wuzhong, China
| | | | - Shan-Gang Jia
- College of Grassland Science and Technology, China Agricultural University, Beijing, China.
| | - Wei-Min Wang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems; Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs; Engineering Research Center of Grassland Industry, Ministry of Education; College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou, China.
| | - Meng-Hua Li
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, China.
| |
Collapse
|
31
|
Mastrorosa FK, Oshima KK, Rozanski AN, Harvey WT, Eichler EE, Logsdon GA. Identification and annotation of centromeric hypomethylated regions with CDR-Finder. Bioinformatics 2024; 40:btae733. [PMID: 39657946 PMCID: PMC11663805 DOI: 10.1093/bioinformatics/btae733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 11/26/2024] [Accepted: 12/06/2024] [Indexed: 12/12/2024] Open
Abstract
MOTIVATION Centromeres are chromosomal regions historically understudied with sequencing technologies due to their repetitive nature and short-read mapping limitations. However, recent improvements in long-read sequencing allow for the investigation of complex regions of the genome at the sequence and epigenetic levels. RESULTS Here, we present Centromere Dip Region (CDR)-Finder: a tool to identify regions of hypomethylation within the centromeres of high-quality, contiguous genome assemblies. These regions are typically associated with a unique type of chromatin containing the histone H3 variant CENP-A, which marks the location of the kinetochore. CDR-Finder identifies the CDRs in large and short centromeres and generates a BED file indicating the location of the CDRs within the centromere. It also outputs a plot for visualization, validation, and downstream analysis. AVAILABILITY AND IMPLEMENTATION CDR-Finder is available at https://github.com/EichlerLab/CDR-Finder.
Collapse
Affiliation(s)
- Francesco Kumara Mastrorosa
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, United States
| | - Keisuke K Oshima
- Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Allison N Rozanski
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, United States
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, United States
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, United States
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, United States
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, United States
| |
Collapse
|
32
|
Volarić M, Meštrović N, Despot-Slade E. SatXplor-a comprehensive pipeline for satellite DNA analyses in complex genome assemblies. Brief Bioinform 2024; 26:bbae660. [PMID: 39708839 DOI: 10.1093/bib/bbae660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 10/31/2024] [Accepted: 12/04/2024] [Indexed: 12/23/2024] Open
Abstract
Satellite DNAs (satDNAs) are tandemly repeated sequences that make up a significant portion of almost all eukaryotic genomes. Although satDNAs have been shown to play an important role in genome organization and evolution, they are relatively poorly analyzed, even in model organisms. One of the main reasons for the current lack of in-depth studies on satDNAs is their underrepresentation in genome assemblies. Due to complexity, abundance, and highly repetitive nature of satDNAs, their analysis is challenging, requiring efficient tools that ensure accurate annotation and comprehensive genome-wide analysis. We present a novel pipeline, named satellite DNA Exploration (SatXplor), designed to robustly characterize satDNA elements and analyze their arrays and flanking regions. SatXplor is benchmarked against other tools and curated satDNA datasets from diverse species, including mice and humans, showcase its versatility across genomes with varying complexities and satDNA profiles. Component algorithms excel in the identification of tandemly repeated sequences and, for the first time, enable evaluation of satDNA variation and array annotation with the addition of information about surrounding genomic landscape. SatXplor is an innovative pipeline for satDNA analysis that can be paired with any tool used for satDNA detection, offering insights into the structural characteristics, array determination, and genomic context of satDNA elements. By integrating various computational techniques, from sequence analysis and homology investigation to advanced clustering and graph-based methods, it provides a versatile and comprehensive approach to explore the complexity of satDNA organization and understand the underlying mechanisms and evolutionary aspects. It is open-source and freely accessible at https://github.com/mvolar/SatXplor.
Collapse
Affiliation(s)
- Marin Volarić
- Ruđer Bošković Institute, Bijenička cesta 54, 10000 Zagreb, Croatia
| | | | | |
Collapse
|
33
|
Iyer SV, Goodwin S, McCombie WR. Leveraging the power of long reads for targeted sequencing. Genome Res 2024; 34:1701-1718. [PMID: 39567237 PMCID: PMC11610587 DOI: 10.1101/gr.279168.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 10/01/2024] [Indexed: 11/22/2024]
Abstract
Long-read sequencing technologies have improved the contiguity and, as a result, the quality of genome assemblies by generating reads long enough to span and resolve complex or repetitive regions of the genome. Several groups have shown the power of long reads in detecting thousands of genomic and epigenomic features that were previously missed by short-read sequencing approaches. While these studies demonstrate how long reads can help resolve repetitive and complex regions of the genome, they also highlight the throughput and coverage requirements needed to accurately resolve variant alleles across large populations using these platforms. At the time of this review, whole-genome long-read sequencing is more expensive than short-read sequencing on the highest throughput short-read instruments; thus, achieving sufficient coverage to detect low-frequency variants (such as somatic variation) in heterogenous samples remains challenging. Targeted sequencing, on the other hand, provides the depth necessary to detect these low-frequency variants in heterogeneous populations. Here, we review currently used and recently developed targeted sequencing strategies that leverage existing long-read technologies to increase the resolution with which we can look at nucleic acids in a variety of biological contexts.
Collapse
Affiliation(s)
- Shruti V Iyer
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | | |
Collapse
|
34
|
Koren S, Bao Z, Guarracino A, Ou S, Goodwin S, Jenike KM, Lucas J, McNulty B, Park J, Rautiainen M, Rhie A, Roelofs D, Schneiders H, Vrijenhoek I, Nijbroek K, Nordesjo O, Nurk S, Vella M, Lawrence KR, Ware D, Schatz MC, Garrison E, Huang S, McCombie WR, Miga KH, Wittenberg AHJ, Phillippy AM. Gapless assembly of complete human and plant chromosomes using only nanopore sequencing. Genome Res 2024; 34:1919-1930. [PMID: 39505490 PMCID: PMC11610574 DOI: 10.1101/gr.279334.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 10/08/2024] [Indexed: 11/08/2024]
Abstract
The combination of ultra-long (UL) Oxford Nanopore Technologies (ONT) sequencing reads with long, accurate Pacific Bioscience (PacBio) High Fidelity (HiFi) reads has enabled the completion of a human genome and spurred similar efforts to complete the genomes of many other species. However, this approach for complete, "telomere-to-telomere" genome assembly relies on multiple sequencing platforms, limiting its accessibility. ONT "Duplex" sequencing reads, where both strands of the DNA are read to improve quality, promise high per-base accuracy. To evaluate this new data type, we generated ONT Duplex data for three widely studied genomes: human HG002, Solanum lycopersicum Heinz 1706 (tomato), and Zea mays B73 (maize). For the diploid, heterozygous HG002 genome, we also used "Pore-C" chromatin contact mapping to completely phase the haplotypes. We found the accuracy of Duplex data to be similar to HiFi sequencing, but with read lengths tens of kilobases longer, and the Pore-C data to be compatible with existing diploid assembly algorithms. This combination of read length and accuracy enables the construction of a high-quality initial assembly, which can then be further resolved using the UL reads, and finally phased into chromosome-scale haplotypes with Pore-C. The resulting assemblies have a base accuracy exceeding 99.999% (Q50) and near-perfect continuity, with most chromosomes assembled as single contigs. We conclude that ONT sequencing is a viable alternative to HiFi sequencing for de novo genome assembly, and provides a multirun single-instrument solution for the reconstruction of complete genomes.
Collapse
Affiliation(s)
- Sergey Koren
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA;
| | - Zhigui Bao
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Baden-Württemberg, Germany
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
| | - Shujun Ou
- Department of Molecular Genetics, Ohio State University, Columbus, Ohio 43210, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Katharine M Jenike
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Julian Lucas
- Genomics Institute, University of California Santa Cruz, Santa Cruz, California 95060, USA
| | - Brandy McNulty
- Genomics Institute, University of California Santa Cruz, Santa Cruz, California 95060, USA
| | - Jimin Park
- Genomics Institute, University of California Santa Cruz, Santa Cruz, California 95060, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Arang Rhie
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | | | | | | | | - Olle Nordesjo
- Oxford Nanopore Technologies, Oxford OX4 4DQ, United Kingdom
| | - Sergey Nurk
- Oxford Nanopore Technologies, Oxford OX4 4DQ, United Kingdom
| | - Mike Vella
- Oxford Nanopore Technologies, Oxford OX4 4DQ, United Kingdom
| | | | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
- USDA ARS NEA Plant, Soil and Nutrition Laboratory Research Unit, Ithaca, New York 14853, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38163, USA
| | - Sanwen Huang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
- State Key Laboratory of Tropical Crop Breeding, Chinese Academy of Tropical Agricultural Sciences, Haikou, Hainan 571101, China
| | | | - Karen H Miga
- Genomics Institute, University of California Santa Cruz, Santa Cruz, California 95060, USA
| | | | - Adam M Phillippy
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA;
| |
Collapse
|
35
|
Kukla-Bartoszek M, Głombik K. Train and Reprogram Your Brain: Effects of Physical Exercise at Different Stages of Life on Brain Functions Saved in Epigenetic Modifications. Int J Mol Sci 2024; 25:12043. [PMID: 39596111 PMCID: PMC11593723 DOI: 10.3390/ijms252212043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2024] [Revised: 11/05/2024] [Accepted: 11/07/2024] [Indexed: 11/28/2024] Open
Abstract
Multiple studies have demonstrated the significant effects of physical exercise on brain plasticity, the enhancement of memory and cognition, and mood improvement. Although the beneficial impact of exercise on brain functions and mental health is well established, the exact mechanisms underlying this phenomenon are currently under thorough investigation. Several hypotheses have emerged suggesting various possible mechanisms, including the effects of hormones, neurotrophins, neurotransmitters, and more recently also other compounds such as lactate or irisin, which are released under the exercise circumstances and act both locally or/and on distant tissues, triggering systemic body reactions. Nevertheless, none of these actually explain the long-lasting effect of exercise, which can persist for years or even be passed on to subsequent generations. It is believed that these long-lasting effects are mediated through epigenetic modifications, influencing the expression of particular genes and the translation and modification of specific proteins. This review explores the impact of regular physical exercise on brain function and brain plasticity and the associated occurrence of epigenetic modifications. It examines how these changes contribute to the prevention and treatment of neuropsychiatric and neurological disorders, as well as their influence on the natural aging process and mental health.
Collapse
Affiliation(s)
| | - Katarzyna Głombik
- Laboratory of Immunoendocrinology, Department of Experimental Neuroendocrinology, Maj Institute of Pharmacology, Polish Academy of Sciences, Smętna 12, 31-343 Kraków, Poland;
| |
Collapse
|
36
|
Mohanty SK, Chiaromonte F, Makova KD. Evolutionary Dynamics of G-Quadruplexes in Human and Other Great Ape Telomere-to-Telomere Genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.05.621973. [PMID: 39574740 PMCID: PMC11580976 DOI: 10.1101/2024.11.05.621973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2024]
Abstract
G-quadruplexes (G4s) are non-canonical DNA structures that can form at approximately 1% of the human genome. G4s contribute to point mutations and structural variation and thus facilitate genomic instability. They play important roles in regulating replication, transcription, and telomere maintenance, and some of them evolve under purifying selection. Nevertheless, the evolutionary dynamics of G4s has remained underexplored. Here we conducted a comprehensive analysis of predicted G4s (pG4s) in the recently released, telomere-to-telomere (T2T) genomes of human and other great apes-bonobo, chimpanzee, gorilla, Bornean orangutan, and Sumatran orangutan. We annotated tens of thousands of new pG4s in T2T compared to previous ape genome assemblies, including 41,236 in the human genome. Analyzing species alignments, we found approximately one-third of pG4s shared by all apes studied and identified thousands of species- and genus-specific pG4s. pG4s accumulated and diverged at rates consistent with divergence times between the studied species. We observed a significant enrichment and hypomethylation of pG4 shared across species at regulatory regions, including promoters, 5' and 3'UTRs, and origins of replication, strongly suggesting their formation and functional role in these regions. pG4s shared among great apes displayed lower methylation levels compared to species-specific pG4s, suggesting evolutionary conservation of functional roles of the former. Many species-specific pG4s were located in the repetitive and satellite regions deciphered in the T2T genomes. Our findings illuminate the evolutionary dynamics of G4s, their role in gene regulation, and their potential contribution to species-specific adaptations in great apes, emphasizing the utility of high-resolution T2T genomes in uncovering previously elusive genomic features.
Collapse
Affiliation(s)
- Saswat K. Mohanty
- Molecular, Cellular, and Integrative Biosciences, Huck Institutes of the Life Sciences, Penn State University, University Park, PA 16802, USA
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Francesca Chiaromonte
- Department of Statistics, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
- EMbeDS, Sant’Anna School of Advanced Studies, 56127 Pisa, Italy
| | - Kateryna D. Makova
- Department of Biology, Penn State University, University Park, PA 16802, USA
- Center for Medical Genomics, Penn State University, University Park and Hershey, PA, USA
| |
Collapse
|
37
|
Kumara Mastrorosa F, Oshima KK, Rozanski AN, Harvey WT, Eichler EE, Logsdon GA. Identification and annotation of centromeric hypomethylated regions with Centromere Dip Region (CDR)-Finder. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.01.621587. [PMID: 39574726 PMCID: PMC11580854 DOI: 10.1101/2024.11.01.621587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2024]
Abstract
Centromeres are chromosomal regions historically understudied with sequencing technologies due to their repetitive nature and short-read mapping limitations. However, recent improvements in long-read sequencing allowed for the investigation of complex regions of the genome at the sequence and epigenetic levels. Here, we present Centromere Dip Region (CDR)-Finder: a tool to identify regions of hypomethylation within the centromeres of high-quality, contiguous genome assemblies. These regions are typically associated with a unique type of chromatin containing the histone H3 variant CENP-A, which marks the location of the kinetochore. CDR-Finder identifies the CDRs in large and short centromeres and generates a BED file indicating the location of the CDRs within the centromere. It also outputs a plot for visualization, validation, and downstream analysis. CDR-Finder is available at https://github.com/EichlerLab/CDR-Finder.
Collapse
Affiliation(s)
- F. Kumara Mastrorosa
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Keisuke K. Oshima
- Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Allison N. Rozanski
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - William T. Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Present address: Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
38
|
Xia Y, Li D, Chen T, Pan S, Huang H, Zhang W, Liang Y, Fu Y, Peng Z, Zhang H, Zhang L, Peng S, Shi R, He X, Zhou S, Jiao W, Zhao X, Wu X, Zhou L, Zhou J, Ouyang Q, Tian Y, Jiang X, Zhou Y, Tang S, Shen J, Ohshima K, Tan Z. Microsatellite density landscapes illustrate short tandem repeats aggregation in the complete reference human genome. BMC Genomics 2024; 25:960. [PMID: 39402450 PMCID: PMC11477012 DOI: 10.1186/s12864-024-10843-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2024] [Accepted: 09/26/2024] [Indexed: 10/19/2024] Open
Abstract
BACKGROUND Microsatellites are increasingly realized to have biological significance in human genome and health in past decades, the assembled complete reference sequence of human genome T2T-CHM13 brought great help for a comprehensive study of short tandem repeats in the human genome. RESULTS Microsatellites density landscapes of all 24 chromosomes were built here for the first complete reference sequence of human genome T2T-CHM13. These landscapes showed that short tandem repeats (STRs) are prone to aggregate characteristically to form a large number of STRs density peaks. We classified 8,823 High Microsatellites Density Peaks (HMDPs), 35,257 Middle Microsatellites Density Peaks (MMDPs) and 199, 649 Low Microsatellites Density Peaks (LMDPs) on the 24 chromosomes; and also classified the motif types of every microsatellites density peak. These STRs density aggregation peaks are mainly composing of a single motif, and AT is the most dominant motif, followed by AATGG and CCATT motifs. And 514 genomic regions were characterized by microsatellite density feature in the full T2T-CHM13 genome. CONCLUSIONS These landscape maps exhibited that microsatellites aggregate in many genomic positions to form a large number of microsatellite density peaks with composing of mainly single motif type in the complete reference genome, indicating that the local microsatellites density varies enormously along the every chromosome of T2T-CHM13.
Collapse
Affiliation(s)
- Yun Xia
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Douyue Li
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Tingyi Chen
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Saichao Pan
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Hanrou Huang
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Wenxiang Zhang
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Yulin Liang
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Yongzhuo Fu
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Zhuli Peng
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Hongxi Zhang
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Liang Zhang
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Shan Peng
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Ruixue Shi
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Xingxin He
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Siqian Zhou
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Weili Jiao
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Xiangyan Zhao
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Xiaolong Wu
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Lan Zhou
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Jingyu Zhou
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Qingjian Ouyang
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - You Tian
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Xiaoping Jiang
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Yi Zhou
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Shiying Tang
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | - Junxiong Shen
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China
| | | | - Zhongyang Tan
- Bioinformatic Center, College of Biology, Hunan University, Lushan Road (S), Yuelu District, Changsha, 410082, China.
| |
Collapse
|
39
|
Yoo D, Rhie A, Hebbar P, Antonacci F, Logsdon GA, Solar SJ, Antipov D, Pickett BD, Safonova Y, Montinaro F, Luo Y, Malukiewicz J, Storer JM, Lin J, Sequeira AN, Mangan RJ, Hickey G, Anez GM, Balachandran P, Bankevich A, Beck CR, Biddanda A, Borchers M, Bouffard GG, Brannan E, Brooks SY, Carbone L, Carrel L, Chan AP, Crawford J, Diekhans M, Engelbrecht E, Feschotte C, Formenti G, Garcia GH, de Gennaro L, Gilbert D, Green RE, Guarracino A, Gupta I, Haddad D, Han J, Harris RS, Hartley GA, Harvey WT, Hiller M, Hoekzema K, Houck ML, Jeong H, Kamali K, Kellis M, Kille B, Lee C, Lee Y, Lees W, Lewis AP, Li Q, Loftus M, Loh YHE, Loucks H, Ma J, Mao Y, Martinez JFI, Masterson P, McCoy RC, McGrath B, McKinney S, Meyer BS, Miga KH, Mohanty SK, Munson KM, Pal K, Pennell M, Pevzner PA, Porubsky D, Potapova T, Ringeling FR, Roha JL, Ryder OA, Sacco S, Saha S, Sasaki T, Schatz MC, Schork NJ, Shanks C, Smeds L, Son DR, Steiner C, Sweeten AP, Tassia MG, Thibaud-Nissen F, Torres-González E, Trivedi M, Wei W, Wertz J, Yang M, Zhang P, Zhang S, Zhang Y, Zhang Z, et alYoo D, Rhie A, Hebbar P, Antonacci F, Logsdon GA, Solar SJ, Antipov D, Pickett BD, Safonova Y, Montinaro F, Luo Y, Malukiewicz J, Storer JM, Lin J, Sequeira AN, Mangan RJ, Hickey G, Anez GM, Balachandran P, Bankevich A, Beck CR, Biddanda A, Borchers M, Bouffard GG, Brannan E, Brooks SY, Carbone L, Carrel L, Chan AP, Crawford J, Diekhans M, Engelbrecht E, Feschotte C, Formenti G, Garcia GH, de Gennaro L, Gilbert D, Green RE, Guarracino A, Gupta I, Haddad D, Han J, Harris RS, Hartley GA, Harvey WT, Hiller M, Hoekzema K, Houck ML, Jeong H, Kamali K, Kellis M, Kille B, Lee C, Lee Y, Lees W, Lewis AP, Li Q, Loftus M, Loh YHE, Loucks H, Ma J, Mao Y, Martinez JFI, Masterson P, McCoy RC, McGrath B, McKinney S, Meyer BS, Miga KH, Mohanty SK, Munson KM, Pal K, Pennell M, Pevzner PA, Porubsky D, Potapova T, Ringeling FR, Roha JL, Ryder OA, Sacco S, Saha S, Sasaki T, Schatz MC, Schork NJ, Shanks C, Smeds L, Son DR, Steiner C, Sweeten AP, Tassia MG, Thibaud-Nissen F, Torres-González E, Trivedi M, Wei W, Wertz J, Yang M, Zhang P, Zhang S, Zhang Y, Zhang Z, Zhao SA, Zhu Y, Jarvis ED, Gerton JL, Rivas-González I, Paten B, Szpiech ZA, Huber CD, Lenz TL, Konkel MK, Yi SV, Canzar S, Watson CT, Sudmant PH, Molloy E, Garrison E, Lowe CB, Ventura M, O’Neill RJ, Koren S, Makova KD, Phillippy AM, Eichler EE. Complete sequencing of ape genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.31.605654. [PMID: 39131277 PMCID: PMC11312596 DOI: 10.1101/2024.07.31.605654] [Show More Authors] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Abstract
We present haplotype-resolved reference genomes and comparative analyses of six ape species, namely: chimpanzee, bonobo, gorilla, Bornean orangutan, Sumatran orangutan, and siamang. We achieve chromosome-level contiguity with unparalleled sequence accuracy (<1 error in 500,000 base pairs), completely sequencing 215 gapless chromosomes telomere-to-telomere. We resolve challenging regions, such as the major histocompatibility complex and immunoglobulin loci, providing more in-depth evolutionary insights. Comparative analyses, including human, allow us to investigate the evolution and diversity of regions previously uncharacterized or incompletely studied without bias from mapping to the human reference. This includes newly minted gene families within lineage-specific segmental duplications, centromeric DNA, acrocentric chromosomes, and subterminal heterochromatin. This resource should serve as a definitive baseline for all future evolutionary studies of humans and our closest living ape relatives.
Collapse
Affiliation(s)
- DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arang Rhie
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Prajna Hebbar
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Francesca Antonacci
- Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, 70124, Italy
| | - Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19103, USA
| | - Steven J. Solar
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Dmitry Antipov
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Brandon D. Pickett
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Yana Safonova
- Computer Science and Engineering Department, Huck Institutes of Life Sciences, Pennsylvania State University, State College, PA 16801, USA
| | - Francesco Montinaro
- Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, 70124, Italy
- Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Yanting Luo
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC 27710, USA
| | - Joanna Malukiewicz
- Research Unit for Evolutionary Immunogenomics, Department of Biology, University of Hamburg, 20146 Hamburg, Germany
| | - Jessica M. Storer
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| | - Jiadong Lin
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Abigail N. Sequeira
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Riley J. Mangan
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Genetics Training Program, Harvard Medical School, Boston, MA 02115, USA
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | | | | | - Anton Bankevich
- Computer Science and Engineering Department, Huck Institutes of Life Sciences, Pennsylvania State University, State College, PA 16801, USA
| | - Christine R. Beck
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA
| | - Arjun Biddanda
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Matthew Borchers
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Gerard G. Bouffard
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Emry Brannan
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Shelise Y. Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Lucia Carbone
- Department of Medicine, KCVI, Oregon Health Sciences University, Portland, OR, USA
- Division of Genetics, Oregon National Primate Research Center, Beaverton, OR, USA
| | - Laura Carrel
- PSU Medical School, Penn State University School of Medicine, Hershey, PA, USA
| | - Agnes P. Chan
- The Translational Genomics Research Institute, a part of the City of Hope National Medical Center, Phoenix, AZ, USA
| | - Juyun Crawford
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Eric Engelbrecht
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, USA
| | - Cedric Feschotte
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY 10021, USA
| | - Gage H. Garcia
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Luciana de Gennaro
- Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, 70124, Italy
| | - David Gilbert
- San Diego Biomedical Research Institute, San Diego, CA, USA
| | | | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Ishaan Gupta
- Department of Computer Science and Engineering, University of California San Diego, CA, USA
| | - Diana Haddad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Junmin Han
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Robert S. Harris
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Gabrielle A. Hartley
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| | - William T. Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Michael Hiller
- LOEWE Centre for Translational Biodiversity Genomics, Senckenberg Research Institute, Goethe University, Frankfurt, Germany
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marlys L. Houck
- San Diego Zoo Wildlife Alliance, Escondido, CA, 92027-7000, USA
| | - Hyeonsoo Jeong
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Kaivan Kamali
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Bryce Kille
- Department of Computer Science, Rice University, Houston, TX 77005, USA
| | - Chul Lee
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Youngho Lee
- Laboratory of bioinformatics and population genetics, Interdisciplinary program in bioinformatics, Seoul National University, Republic of Korea
| | - William Lees
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, USA
- Bioengineering Program, Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel
| | - Alexandra P. Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Mark Loftus
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Yong Hwee Eddie Loh
- Neuroscience Research Institute, University of California, Santa Barbara, CA, USA
| | - Hailey Loucks
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, PA, USA
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
- Center for Genomic Research, International Institutes of Medicine, Fourth Affiliated Hospital, Zhejiang University, Yiwu, Zhejiang, China
- Shanghai Jiao Tong University Chongqing Research Institute, Chongqing, China
| | - Juan F. I. Martinez
- Computer Science and Engineering Department, Huck Institutes of Life Sciences, Pennsylvania State University, State College, PA 16801, USA
| | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Rajiv C. McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Barbara McGrath
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Sean McKinney
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Britta S. Meyer
- Research Unit for Evolutionary Immunogenomics, Department of Biology, University of Hamburg, 20146 Hamburg, Germany
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Saswat K. Mohanty
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Karol Pal
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Matt Pennell
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California San Diego, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Francisca R. Ringeling
- Faculty of Informatics and Data Science, University of Regensburg, 93053 Regensburg, Germany
| | - Joana L. Roha
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA
| | - Oliver A. Ryder
- San Diego Zoo Wildlife Alliance, Escondido, CA, 92027-7000, USA
| | - Samuel Sacco
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Swati Saha
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, USA
| | - Takayo Sasaki
- San Diego Biomedical Research Institute, San Diego, CA, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Nicholas J. Schork
- The Translational Genomics Research Institute, a part of the City of Hope National Medical Center, Phoenix, AZ, USA
| | - Cole Shanks
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Linnéa Smeds
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Dongmin R. Son
- Department of Ecology, Evolution and Marine Biology, Neuroscience Research Institute, University of California, Santa Barbara, CA, USA
| | - Cynthia Steiner
- San Diego Zoo Wildlife Alliance, Escondido, CA, 92027-7000, USA
| | - Alexander P. Sweeten
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Michael G. Tassia
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | - Mihir Trivedi
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Wenjie Wei
- School of Life Sciences, Westlake University, Hangzhou 310024, China
- National Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, 430070, Wuhan, China
| | - Julie Wertz
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Muyu Yang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, PA, USA
| | - Panpan Zhang
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Shilong Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Yang Zhang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, PA, USA
| | - Zhenmiao Zhang
- Department of Computer Science and Engineering, University of California San Diego, CA, USA
| | - Sarah A. Zhao
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Yixin Zhu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Erich D. Jarvis
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | | | - Iker Rivas-González
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Zachary A. Szpiech
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Christian D. Huber
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Tobias L. Lenz
- Research Unit for Evolutionary Immunogenomics, Department of Biology, University of Hamburg, 20146 Hamburg, Germany
| | - Miriam K. Konkel
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Soojin V. Yi
- Department of Ecology, Evolution and Marine Biology, Department of Molecular, Cellular and Developmental Biology, Neuroscience Research Institute, University of California, Santa Barbara, CA, USA
| | - Stefan Canzar
- Faculty of Informatics and Data Science, University of Regensburg, 93053 Regensburg, Germany
| | - Corey T. Watson
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, USA
| | - Peter H. Sudmant
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA
- Center for Computational Biology, University of California, Berkeley, Berkeley, USA
| | - Erin Molloy
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Craig B. Lowe
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC 27710, USA
| | - Mario Ventura
- Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, 70124, Italy
| | - Rachel J. O’Neill
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA
- Departments of Molecular and Cell Biology, UConn Storrs, CT, USA
| | - Sergey Koren
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Kateryna D. Makova
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
40
|
Gafurov A, VinaŘ T, Medvedev P, Brejová B. Fast Context-Aware Analysis of Genome Annotation Colocalization. J Comput Biol 2024; 31:946-964. [PMID: 39381845 PMCID: PMC11698669 DOI: 10.1089/cmb.2024.0667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2024] Open
Abstract
An annotation is a set of genomic intervals sharing a particular function or property. Examples include genes or their exons, sequence repeats, regions with a particular epigenetic state, and copy number variants. A common task is to compare two annotations to determine if one is enriched or depleted in the regions covered by the other. We study the problem of assigning statistical significance to such a comparison based on a null model representing random unrelated annotations. To incorporate more background information into such analyses, we propose a new null model based on a Markov chain that differentiates among several genomic contexts. These contexts can capture various confounding factors, such as GC content or assembly gaps. We then develop a new algorithm for estimating p-values by computing the exact expectation and variance of the test statistic and then estimating the p-value using a normal approximation. Compared to the previous algorithm by Gafurov et al., the new algorithm provides three advances: (1) the running time is improved from quadratic to linear or quasi-linear, (2) the algorithm can handle two different test statistics, and (3) the algorithm can handle both simple and context-dependent Markov chain null models. We demonstrate the efficiency and accuracy of our algorithm on synthetic and real data sets, including the recent human telomere-to-telomere assembly. In particular, our algorithm computed p-values for 450 pairs of human genome annotations using 24 threads in under three hours. Moreover, the use of genomic contexts to correct for GC bias resulted in the reversal of some previously published findings.
Collapse
Affiliation(s)
- Askar Gafurov
- Faculty of Mathematics, Physics and Informatics, Comenius University, Bratislava, Slovakia
- LIRMM, University of Montpellier, Montpellier, France
| | - Tomáš VinaŘ
- Faculty of Mathematics, Physics and Informatics, Comenius University, Bratislava, Slovakia
| | - Paul Medvedev
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, USA
| | - BroŇa Brejová
- Faculty of Mathematics, Physics and Informatics, Comenius University, Bratislava, Slovakia
| |
Collapse
|
41
|
Karageorgiou C, Gokcumen O, Dennis MY. Deciphering the role of structural variation in human evolution: a functional perspective. Curr Opin Genet Dev 2024; 88:102240. [PMID: 39121701 PMCID: PMC11485010 DOI: 10.1016/j.gde.2024.102240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 06/27/2024] [Accepted: 07/23/2024] [Indexed: 08/12/2024]
Abstract
Advances in sequencing technologies have enabled the comparison of high-quality genomes of diverse primate species, revealing vast amounts of divergence due to structural variation. Given their large size, structural variants (SVs) can simultaneously alter the function and regulation of multiple genes. Studies estimate that collectively more than 3.5% of the genome is divergent in humans versus other great apes, impacting thousands of genes. Functional genomics and gene-editing tools in various model systems recently emerged as an exciting frontier - investigating the wide-ranging impacts of SVs on molecular, cellular, and systems-level phenotypes. This review examines existing research and identifies future directions to broaden our understanding of the functional roles of SVs on phenotypic innovations and diversity impacting uniquely human features, ranging from cognition to metabolic adaptations.
Collapse
Affiliation(s)
- Charikleia Karageorgiou
- Department of Biological Sciences, University at Buffalo, 109 Cooke Hall, Buffalo, NY 14260, USA. https://twitter.com/@evobioclio
| | - Omer Gokcumen
- Department of Biological Sciences, University at Buffalo, 109 Cooke Hall, Buffalo, NY 14260, USA
| | - Megan Y Dennis
- Department of Biochemistry & Molecular Medicine, Genome Center, and MIND Institute, University of California, Davis, CA 95616, USA.
| |
Collapse
|
42
|
Olagunju TA, Rosen BD, Neibergs HL, Becker GM, Davenport KM, Elsik CG, Hadfield TS, Koren S, Kuhn KL, Rhie A, Shira KA, Skibiel AL, Stegemiller MR, Thorne JW, Villamediana P, Cockett NE, Murdoch BM, Smith TPL. Telomere-to-telomere assemblies of cattle and sheep Y-chromosomes uncover divergent structure and gene content. Nat Commun 2024; 15:8277. [PMID: 39333471 PMCID: PMC11436988 DOI: 10.1038/s41467-024-52384-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 09/05/2024] [Indexed: 09/29/2024] Open
Abstract
Reference genomes of cattle and sheep have lacked contiguous assemblies of the sex-determining Y chromosome. Here, we assemble complete and gapless telomere to telomere (T2T) Y chromosomes for these species. We find that the pseudo-autosomal regions are similar in length, but the total chromosome size is substantially different, with the cattle Y more than twice the length of the sheep Y. The length disparity is accounted for by expanded ampliconic region in cattle. The genic amplification in cattle contrasts with pseudogenization in sheep suggesting opposite evolutionary mechanisms since their divergence 19MYA. The centromeres also differ dramatically despite the close relationship between these species at the overall genome sequence level. These Y chromosomes have been added to the current reference assemblies in GenBank opening new opportunities for the study of evolution and variation while supporting efforts to improve sustainability in these important livestock species that generally use sire-driven genetic improvement strategies.
Collapse
Affiliation(s)
- Temitayo A Olagunju
- Department of Animal, Veterinary and Food Sciences (AVFS), University of Idaho, Moscow, ID, USA
| | - Benjamin D Rosen
- Animal Genomics and Improvement Laboratory (AGIL), ARS, USDA, Beltsville, MD, USA
| | - Holly L Neibergs
- Department of Animal Sciences, Washington State University, Pullman, WA, USA
| | - Gabrielle M Becker
- Department of Animal, Veterinary and Food Sciences (AVFS), University of Idaho, Moscow, ID, USA
| | | | - Christine G Elsik
- Divisions of Animal Sciences and Plant Science & Technology, University of Missouri, Columbia, MO, USA
| | - Tracy S Hadfield
- Animal, Dairy and Veterinary Sciences (ADVS), Utah State University, Logan, UT, USA
| | - Sergey Koren
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Kristen L Kuhn
- U.S. Meat Animal Research Center (USMARC), ARS, USDA, Clay Center, NE, USA
| | - Arang Rhie
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Katie A Shira
- Department of Animal, Veterinary and Food Sciences (AVFS), University of Idaho, Moscow, ID, USA
| | - Amy L Skibiel
- Department of Animal, Veterinary and Food Sciences (AVFS), University of Idaho, Moscow, ID, USA
| | - Morgan R Stegemiller
- Department of Animal, Veterinary and Food Sciences (AVFS), University of Idaho, Moscow, ID, USA
| | | | - Patricia Villamediana
- Department of Dairy and Food Science, South Dakota State University, Brookings, SD, USA
| | - Noelle E Cockett
- Animal, Dairy and Veterinary Sciences (ADVS), Utah State University, Logan, UT, USA
| | - Brenda M Murdoch
- Department of Animal, Veterinary and Food Sciences (AVFS), University of Idaho, Moscow, ID, USA.
| | - Timothy P L Smith
- U.S. Meat Animal Research Center (USMARC), ARS, USDA, Clay Center, NE, USA.
| |
Collapse
|
43
|
de Lima LG, Guarracino A, Koren S, Potapova T, McKinney S, Rhie A, Solar SJ, Seidel C, Fagen B, Walenz BP, Bouffard GG, Brooks SY, Peterson M, Hall K, Crawford J, Young AC, Pickett BD, Garrison E, Phillippy AM, Gerton JL. The formation and propagation of human Robertsonian chromosomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.24.614821. [PMID: 39386535 PMCID: PMC11463614 DOI: 10.1101/2024.09.24.614821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Robertsonian chromosomes are a type of variant chromosome found commonly in nature. Present in one in 800 humans, these chromosomes can underlie infertility, trisomies, and increased cancer incidence. Recognized cytogenetically for more than a century, their origins have remained mysterious. Recent advances in genomics allowed us to assemble three human Robertsonian chromosomes completely. We identify a common breakpoint and epigenetic changes in centromeres that provide insight into the formation and propagation of common Robertsonian translocations. Further investigation of the assembled genomes of chimpanzee and bonobo highlights the structural features of the human genome that uniquely enable the specific crossover event that creates these chromosomes. Resolving the structure and epigenetic features of human Robertsonian chromosomes at a molecular level paves the way to understanding how chromosomal structural variation occurs more generally, and how chromosomes evolve.
Collapse
Affiliation(s)
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Sergey Koren
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Sean McKinney
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Arang Rhie
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Steven J Solar
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Chris Seidel
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Brandon Fagen
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Brian P Walenz
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Gerard G Bouffard
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shelise Y Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Kate Hall
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Juyun Crawford
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Alice C Young
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Brandon D Pickett
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Adam M Phillippy
- Stowers Institute for Medical Research, Kansas City, MO, USA
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | |
Collapse
|
44
|
Logsdon GA, Ebert P, Audano PA, Loftus M, Porubsky D, Ebler J, Yilmaz F, Hallast P, Prodanov T, Yoo D, Paisie CA, Harvey WT, Zhao X, Martino GV, Henglin M, Munson KM, Rabbani K, Chin CS, Gu B, Ashraf H, Austine-Orimoloye O, Balachandran P, Bonder MJ, Cheng H, Chong Z, Crabtree J, Gerstein M, Guethlein LA, Hasenfeld P, Hickey G, Hoekzema K, Hunt SE, Jensen M, Jiang Y, Koren S, Kwon Y, Li C, Li H, Li J, Norman PJ, Oshima KK, Paten B, Phillippy AM, Pollock NR, Rausch T, Rautiainen M, Scholz S, Song Y, Söylev A, Sulovari A, Surapaneni L, Tsapalou V, Zhou W, Zhou Y, Zhu Q, Zody MC, Mills RE, Devine SE, Shi X, Talkowski ME, Chaisson MJP, Dilthey AT, Konkel MK, Korbel JO, Lee C, Beck CR, Eichler EE, Marschall T. Complex genetic variation in nearly complete human genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.24.614721. [PMID: 39372794 PMCID: PMC11451754 DOI: 10.1101/2024.09.24.614721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/08/2024]
Abstract
Diverse sets of complete human genomes are required to construct a pangenome reference and to understand the extent of complex structural variation. Here, we sequence 65 diverse human genomes and build 130 haplotype-resolved assemblies (130 Mbp median continuity), closing 92% of all previous assembly gaps1,2 and reaching telomere-to-telomere (T2T) status for 39% of the chromosomes. We highlight complete sequence continuity of complex loci, including the major histocompatibility complex (MHC), SMN1/SMN2, NBPF8, and AMY1/AMY2, and fully resolve 1,852 complex structural variants (SVs). In addition, we completely assemble and validate 1,246 human centromeres. We find up to 30-fold variation in α-satellite high-order repeat (HOR) array length and characterize the pattern of mobile element insertions into α-satellite HOR arrays. While most centromeres predict a single site of kinetochore attachment, epigenetic analysis suggests the presence of two hypomethylated regions for 7% of centromeres. Combining our data with the draft pangenome reference1 significantly enhances genotyping accuracy from short-read data, enabling whole-genome inference3 to a median quality value (QV) of 45. Using this approach, 26,115 SVs per sample are detected, substantially increasing the number of SVs now amenable to downstream disease association studies.
Collapse
Affiliation(s)
- Glennis A Logsdon
- Perelman School of Medicine, University of Pennsylvania, Department of Genetics, Epigenetics Institute, Philadelphia, PA, USA
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter Ebert
- Core Unit Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Mark Loftus
- Clemson University, Department of Genetics & Biochemistry, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Timofey Prodanov
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Carolyn A Paisie
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Xuefang Zhao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Gianni V Martino
- Clemson University, Department of Genetics & Biochemistry, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
- Medical University of South Carolina, College of Graduate Studies, Charleston, SC, USA
| | - Mir Henglin
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Keon Rabbani
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Chen-Shan Chin
- Foundation of Biological Data Sciences, Belmont, CA, USA
| | - Bida Gu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Hufsah Ashraf
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Olanrewaju Austine-Orimoloye
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | | | - Marc Jan Bonder
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands; Oncode Institute, Utrecht, The Netherlands
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center, Heidelberg, Germany
| | - Haoyu Cheng
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
| | - Zechen Chong
- Department of Biomedical Informatics and Data Science, Heersink School of Medicine, University of Alabama, Birmingham, AL, USA
| | - Jonathan Crabtree
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Lisbeth A Guethlein
- Department of Structural Biology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Patrick Hasenfeld
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Matthew Jensen
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Yunzhe Jiang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Sergey Koren
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Youngjun Kwon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Chong Li
- Temple University, Department of Computer and Information Sciences, College of Science and Technology, Philadelphia, PA, USA
- Temple University, Institute for Genomics and Evolutionary Medicine, Philadelphia, PA, USA
| | - Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Jiaqi Li
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Paul J Norman
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
- Department of Immunology and Microbiology, University of Colorado School of Medicine, Aurora, CO, USA
| | - Keisuke K Oshima
- Perelman School of Medicine, University of Pennsylvania, Department of Genetics, Epigenetics Institute, Philadelphia, PA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Adam M Phillippy
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Nicholas R Pollock
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
| | - Tobias Rausch
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Mikko Rautiainen
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Stephan Scholz
- Institute of Medical Microbiology and Hospital Hygiene, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Yuwei Song
- Department of Biomedical Informatics and Data Science, Heersink School of Medicine, University of Alabama, Birmingham, AL, USA
| | - Arda Söylev
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Likhitha Surapaneni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Vasiliki Tsapalou
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Weichen Zhou
- Department of Computational Medicine & Bioinformatics, University of Michigan, MI, USA
| | - Ying Zhou
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Stanford Health Care, Palo Alto, CA, USA
| | | | - Ryan E Mills
- Department of Computational Medicine & Bioinformatics, University of Michigan, MI, USA
| | - Scott E Devine
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Xinghua Shi
- Temple University, Department of Computer and Information Sciences, College of Science and Technology, Philadelphia, PA, USA
- Temple University, Institute for Genomics and Evolutionary Medicine, Philadelphia, PA, USA
| | - Mike E Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Harvard Medical School, Boston, MA, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Alexander T Dilthey
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
- Institute of Medical Microbiology and Hospital Hygiene, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Miriam K Konkel
- Clemson University, Department of Genetics & Biochemistry, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- The University of Connecticut Health Center, Farmington, CT, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| |
Collapse
|
45
|
Wen W, Zhong J, Zhang Z, Jia L, Chu T, Wang N, Danko CG, Wang Z. dHICA: a deep transformer-based model enables accurate histone imputation from chromatin accessibility. Brief Bioinform 2024; 25:bbae459. [PMID: 39316943 PMCID: PMC11421843 DOI: 10.1093/bib/bbae459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 07/13/2024] [Accepted: 09/04/2024] [Indexed: 09/26/2024] Open
Abstract
Histone modifications (HMs) are pivotal in various biological processes, including transcription, replication, and DNA repair, significantly impacting chromatin structure. These modifications underpin the molecular mechanisms of cell-type-specific gene expression and complex diseases. However, annotating HMs across different cell types solely using experimental approaches is impractical due to cost and time constraints. Herein, we present dHICA (deep histone imputation using chromatin accessibility), a novel deep learning framework that integrates DNA sequences and chromatin accessibility data to predict multiple HM tracks. Employing the transformer architecture alongside dilated convolutions, dHICA boasts an extensive receptive field and captures more cell-type-specific information. dHICA outperforms state-of-the-art baselines and achieves superior performance in cell-type-specific loci and gene elements, aligning with biological expectations. Furthermore, dHICA's imputations hold significant potential for downstream applications, including chromatin state segmentation and elucidating the functional implications of SNPs (Single Nucleotide Polymorphisms). In conclusion, dHICA serves as a valuable tool for advancing the understanding of chromatin dynamics, offering enhanced predictive capabilities and interpretability.
Collapse
Affiliation(s)
- Wen Wen
- School of Software Technology, Dalian University of Technology, Linggong Rd, Liaoning 116024, China
| | - Jiaxin Zhong
- School of Software Technology, Dalian University of Technology, Linggong Rd, Liaoning 116024, China
| | - Zhaoxi Zhang
- School of Software Technology, Dalian University of Technology, Linggong Rd, Liaoning 116024, China
| | - Lijuan Jia
- School of Software Technology, Dalian University of Technology, Linggong Rd, Liaoning 116024, China
| | - Tinyi Chu
- Meinig School of Biomedical Engineering, Cornell University, Weill Hall, Ithaca, NY 14853, United States
| | - Nating Wang
- Department of Molecular Biology and Genetics, Cornell University, Biotechnology Building, Ithaca, NY 14853, United States
| | - Charles G Danko
- Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Hungerford Hill Rd, Ithaca, NY 14853, United States
- Department of Biomedical Sciences, College of Veterinary Medicine, Cornell University, Tower Rd, Ithaca, NY 14853, United States
| | - Zhong Wang
- School of Software Technology, Dalian University of Technology, Linggong Rd, Liaoning 116024, China
| |
Collapse
|
46
|
Ma Z, Zuo T, Frey N, Rangrez AY. A systematic framework for understanding the microbiome in human health and disease: from basic principles to clinical translation. Signal Transduct Target Ther 2024; 9:237. [PMID: 39307902 PMCID: PMC11418828 DOI: 10.1038/s41392-024-01946-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 07/03/2024] [Accepted: 08/01/2024] [Indexed: 09/26/2024] Open
Abstract
The human microbiome is a complex and dynamic system that plays important roles in human health and disease. However, there remain limitations and theoretical gaps in our current understanding of the intricate relationship between microbes and humans. In this narrative review, we integrate the knowledge and insights from various fields, including anatomy, physiology, immunology, histology, genetics, and evolution, to propose a systematic framework. It introduces key concepts such as the 'innate and adaptive genomes', which enhance genetic and evolutionary comprehension of the human genome. The 'germ-free syndrome' challenges the traditional 'microbes as pathogens' view, advocating for the necessity of microbes for health. The 'slave tissue' concept underscores the symbiotic intricacies between human tissues and their microbial counterparts, highlighting the dynamic health implications of microbial interactions. 'Acquired microbial immunity' positions the microbiome as an adjunct to human immune systems, providing a rationale for probiotic therapies and prudent antibiotic use. The 'homeostatic reprogramming hypothesis' integrates the microbiome into the internal environment theory, potentially explaining the change in homeostatic indicators post-industrialization. The 'cell-microbe co-ecology model' elucidates the symbiotic regulation affecting cellular balance, while the 'meta-host model' broadens the host definition to include symbiotic microbes. The 'health-illness conversion model' encapsulates the innate and adaptive genomes' interplay and dysbiosis patterns. The aim here is to provide a more focused and coherent understanding of microbiome and highlight future research avenues that could lead to a more effective and efficient healthcare system.
Collapse
Affiliation(s)
- Ziqi Ma
- Department of Cardiology, Angiology and Pneumology, University Hospital Heidelberg, Heidelberg, Germany.
- DZHK (German Centre for Cardiovascular Research), partner site Heidelberg/Mannheim, Heidelberg, Germany.
| | - Tao Zuo
- Key Laboratory of Human Microbiome and Chronic Diseases (Sun Yat-sen University), Ministry of Education, Guangzhou, China
- Guangdong Institute of Gastroenterology, The Sixth Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Norbert Frey
- Department of Cardiology, Angiology and Pneumology, University Hospital Heidelberg, Heidelberg, Germany.
- DZHK (German Centre for Cardiovascular Research), partner site Heidelberg/Mannheim, Heidelberg, Germany.
| | - Ashraf Yusuf Rangrez
- Department of Cardiology, Angiology and Pneumology, University Hospital Heidelberg, Heidelberg, Germany.
- DZHK (German Centre for Cardiovascular Research), partner site Heidelberg/Mannheim, Heidelberg, Germany.
| |
Collapse
|
47
|
Engelbrecht E, Rodriguez OL, Watson CT. Addressing Technical Pitfalls in Pursuit of Molecular Factors That Mediate Immunoglobulin Gene Regulation. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2024; 213:651-662. [PMID: 39007649 PMCID: PMC11333172 DOI: 10.4049/jimmunol.2400131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 06/13/2024] [Indexed: 07/16/2024]
Abstract
The expressed Ab repertoire is a critical determinant of immune-related phenotypes. Ab-encoding transcripts are distinct from other expressed genes because they are transcribed from somatically rearranged gene segments. Human Abs are composed of two identical H and L chain polypeptides derived from genes in IGH locus and one of two L chain loci. The combinatorial diversity that results from Ab gene rearrangement and the pairing of different H and L chains contributes to the immense diversity of the baseline Ab repertoire. During rearrangement, Ab gene selection is mediated by factors that influence chromatin architecture, promoter/enhancer activity, and V(D)J recombination. Interindividual variation in the composition of the Ab repertoire associates with germline variation in IGH, implicating polymorphism in Ab gene regulation. Determining how IGH variants directly mediate gene regulation will require integration of these variants with other functional genomic datasets. In this study, we argue that standard approaches using short reads have limited utility for characterizing regulatory regions in IGH at haplotype resolution. Using simulated and chromatin immunoprecipitation sequencing reads, we define features of IGH that limit use of short reads and a single reference genome, namely 1) the highly duplicated nature of the DNA sequence in IGH and 2) structural polymorphisms that are frequent in the population. We demonstrate that personalized diploid references enhance performance of short-read data for characterizing mappable portions of the locus, while also showing that long-read profiling tools will ultimately be needed to fully resolve functional impacts of IGH germline variation on expressed Ab repertoires.
Collapse
Affiliation(s)
- Eric Engelbrecht
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY
| | - Oscar L Rodriguez
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY
| |
Collapse
|
48
|
Pandiloski N, Horváth V, Karlsson O, Koutounidou S, Dorazehi F, Christoforidou G, Matas-Fuentes J, Gerdes P, Garza R, Jönsson ME, Adami A, Atacho DAM, Johansson JG, Englund E, Kokaia Z, Jakobsson J, Douse CH. DNA methylation governs the sensitivity of repeats to restriction by the HUSH-MORC2 corepressor. Nat Commun 2024; 15:7534. [PMID: 39214989 PMCID: PMC11364546 DOI: 10.1038/s41467-024-50765-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 07/18/2024] [Indexed: 09/04/2024] Open
Abstract
The human silencing hub (HUSH) complex binds to transcripts of LINE-1 retrotransposons (L1s) and other genomic repeats, recruiting MORC2 and other effectors to remodel chromatin. How HUSH and MORC2 operate alongside DNA methylation, a central epigenetic regulator of repeat transcription, remains largely unknown. Here we interrogate this relationship in human neural progenitor cells (hNPCs), a somatic model of brain development that tolerates removal of DNA methyltransferase DNMT1. Upon loss of MORC2 or HUSH subunit TASOR in hNPCs, L1s remain silenced by robust promoter methylation. However, genome demethylation and activation of evolutionarily-young L1s attracts MORC2 binding, and simultaneous depletion of DNMT1 and MORC2 causes massive accumulation of L1 transcripts. We identify the same mechanistic hierarchy at pericentromeric α-satellites and clustered protocadherin genes, repetitive elements important for chromosome structure and neurodevelopment respectively. Our data delineate the epigenetic control of repeats in somatic cells, with implications for understanding the vital functions of HUSH-MORC2 in hypomethylated contexts throughout human development.
Collapse
Affiliation(s)
- Ninoslav Pandiloski
- Laboratory of Epigenetics and Chromatin Dynamics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC B11, Lund University, Lund, Sweden
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Vivien Horváth
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Ofelia Karlsson
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Symela Koutounidou
- Laboratory of Epigenetics and Chromatin Dynamics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC B11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Fereshteh Dorazehi
- Laboratory of Epigenetics and Chromatin Dynamics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC B11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Georgia Christoforidou
- Laboratory of Epigenetics and Chromatin Dynamics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC B11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Jon Matas-Fuentes
- Laboratory of Epigenetics and Chromatin Dynamics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC B11, Lund University, Lund, Sweden
| | - Patricia Gerdes
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Raquel Garza
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | | | - Anita Adami
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Diahann A M Atacho
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Jenny G Johansson
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
| | - Elisabet Englund
- Division of Pathology, Department of Clinical Sciences, Lund University, Lund, Sweden
| | - Zaal Kokaia
- Lund Stem Cell Center, Lund University, Lund, Sweden
- Laboratory of Stem Cells and Restorative Neurology, Department of Clinical Sciences, BMC B10, Lund University, Lund, Sweden
| | - Johan Jakobsson
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC A11, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Christopher H Douse
- Laboratory of Epigenetics and Chromatin Dynamics, Department of Experimental Medical Science, Wallenberg Neuroscience Center, BMC B11, Lund University, Lund, Sweden.
- Lund Stem Cell Center, Lund University, Lund, Sweden.
| |
Collapse
|
49
|
Hartley GA, Okhovat M, Hoyt SJ, Fuller E, Pauloski N, Alexandre N, Alexandrov I, Drennan R, Dubocanin D, Gilbert DM, Mao Y, McCann C, Neph S, Ryabov F, Sasaki T, Storer JM, Svendsen D, Troy W, Wells J, Core L, Stergachis A, Carbone L, O’Neill RJ. Centromeric transposable elements and epigenetic status drive karyotypic variation in the eastern hoolock gibbon. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.29.610280. [PMID: 39257810 PMCID: PMC11384015 DOI: 10.1101/2024.08.29.610280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Great apes have maintained a stable karyotype with few large-scale rearrangements; in contrast, gibbons have undergone a high rate of chromosomal rearrangements coincident with rapid centromere turnover. Here we characterize assembled centromeres in the Eastern hoolock gibbon, Hoolock leuconedys (HLE), finding a diverse group of transposable elements (TEs) that differ from the canonical alpha satellites found across centromeres of other apes. We find that HLE centromeres contain a CpG methylation centromere dip region, providing evidence this epigenetic feature is conserved in the absence of satellite arrays; nevertheless, we report a variety of atypical centromeric features, including protein-coding genes and mismatched replication timing. Further, large structural variations define HLE centromeres and distinguish them from other gibbons. Combined with differentially methylated TEs, topologically associated domain boundaries, and segmental duplications at chromosomal breakpoints, we propose that a "perfect storm" of multiple genomic attributes with propensities for chromosome instability shaped gibbon centromere evolution.
Collapse
Affiliation(s)
- Gabrielle A. Hartley
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Mariam Okhovat
- Department of Medicine, Knight Cardiovascular Institute, Oregon Health and Science University, Portland, OR, USA
| | - Savannah J. Hoyt
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Emily Fuller
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Nicole Pauloski
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Nicolas Alexandre
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA, USA
| | - Ivan Alexandrov
- Department of Anatomy and Anthropology and Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Israel
| | - Ryan Drennan
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Danilo Dubocanin
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA
| | - David M. Gilbert
- San Diego Biomedical Research Institute, San Diego, CA 92121, USA
| | - Yizi Mao
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Christine McCann
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Shane Neph
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Fedor Ryabov
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| | - Takayo Sasaki
- San Diego Biomedical Research Institute, San Diego, CA 92121, USA
| | - Jessica M. Storer
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Derek Svendsen
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | | | - Jackson Wells
- Department of Medicine, Knight Cardiovascular Institute, Oregon Health and Science University, Portland, OR, USA
| | - Leighton Core
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Andrew Stergachis
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Lucia Carbone
- Department of Medicine, Knight Cardiovascular Institute, Oregon Health and Science University, Portland, OR, USA
- Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, USA
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR, USA
- Division of Genetics, Oregon National Primate Research Center, Portland, OR, USA
| | - Rachel J. O’Neill
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
- Department of Genetics and Genome Sciences, UConn Health, Farmington, CT, USA
| |
Collapse
|
50
|
Hardikar S, Ren R, Ying Z, Zhou J, Horton JR, Bramble MD, Liu B, Lu Y, Liu B, Coletta LD, Shen J, Dan J, Zhang X, Cheng X, Chen T. The ICF syndrome protein CDCA7 harbors a unique DNA binding domain that recognizes a CpG dyad in the context of a non-B DNA. SCIENCE ADVANCES 2024; 10:eadr0036. [PMID: 39178265 PMCID: PMC11343032 DOI: 10.1126/sciadv.adr0036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Accepted: 07/18/2024] [Indexed: 08/25/2024]
Abstract
CDCA7, encoding a protein with a carboxyl-terminal cysteine-rich domain (CRD), is mutated in immunodeficiency, centromeric instability, and facial anomalies (ICF) syndrome, a disease related to hypomethylation of juxtacentromeric satellite DNA. How CDCA7 directs DNA methylation to juxtacentromeric regions is unknown. Here, we show that the CDCA7 CRD adopts a unique zinc-binding structure that recognizes a CpG dyad in a non-B DNA formed by two sequence motifs. CDCA7, but not ICF mutants, preferentially binds the non-B DNA with strand-specific CpG hemi-methylation. The unmethylated sequence motif is highly enriched at centromeres of human chromosomes, whereas the methylated motif is distributed throughout the genome. At S phase, CDCA7, but not ICF mutants, is concentrated in constitutive heterochromatin foci, and the formation of such foci can be inhibited by exogenous hemi-methylated non-B DNA bound by the CRD. Binding of the non-B DNA formed in juxtacentromeric regions during DNA replication provides a mechanism by which CDCA7 controls the specificity of DNA methylation.
Collapse
Affiliation(s)
- Swanand Hardikar
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Ren Ren
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Zhengzhou Ying
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Jujun Zhou
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - John R. Horton
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Matthew D. Bramble
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Bin Liu
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- Program in Genetics and Epigenetics, The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX 77030, USA
| | - Yue Lu
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Bigang Liu
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Luis Della Coletta
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Jianjun Shen
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Jiameng Dan
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Xing Zhang
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Xiaodong Cheng
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- Program in Genetics and Epigenetics, The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX 77030, USA
| | - Taiping Chen
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- Program in Genetics and Epigenetics, The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX 77030, USA
| |
Collapse
|