1
|
Yoo D, Rhie A, Hebbar P, Antonacci F, Logsdon GA, Solar SJ, Antipov D, Pickett BD, Safonova Y, Montinaro F, Luo Y, Malukiewicz J, Storer JM, Lin J, Sequeira AN, Mangan RJ, Hickey G, Monfort Anez G, Balachandran P, Bankevich A, Beck CR, Biddanda A, Borchers M, Bouffard GG, Brannan E, Brooks SY, Carbone L, Carrel L, Chan AP, Crawford J, Diekhans M, Engelbrecht E, Feschotte C, Formenti G, Garcia GH, de Gennaro L, Gilbert D, Green RE, Guarracino A, Gupta I, Haddad D, Han J, Harris RS, Hartley GA, Harvey WT, Hiller M, Hoekzema K, Houck ML, Jeong H, Kamali K, Kellis M, Kille B, Lee C, Lee Y, Lees W, Lewis AP, Li Q, Loftus M, Loh YHE, Loucks H, Ma J, Mao Y, Martinez JFI, Masterson P, McCoy RC, McGrath B, McKinney S, Meyer BS, Miga KH, Mohanty SK, Munson KM, Pal K, Pennell M, Pevzner PA, Porubsky D, Potapova T, Ringeling FR, Rocha JL, Ryder OA, Sacco S, Saha S, Sasaki T, Schatz MC, Schork NJ, Shanks C, Smeds L, Son DR, Steiner C, Sweeten AP, Tassia MG, Thibaud-Nissen F, Torres-González E, Trivedi M, Wei W, Wertz J, Yang M, Zhang P, Zhang S, Zhang Y, Zhang Z, et alYoo D, Rhie A, Hebbar P, Antonacci F, Logsdon GA, Solar SJ, Antipov D, Pickett BD, Safonova Y, Montinaro F, Luo Y, Malukiewicz J, Storer JM, Lin J, Sequeira AN, Mangan RJ, Hickey G, Monfort Anez G, Balachandran P, Bankevich A, Beck CR, Biddanda A, Borchers M, Bouffard GG, Brannan E, Brooks SY, Carbone L, Carrel L, Chan AP, Crawford J, Diekhans M, Engelbrecht E, Feschotte C, Formenti G, Garcia GH, de Gennaro L, Gilbert D, Green RE, Guarracino A, Gupta I, Haddad D, Han J, Harris RS, Hartley GA, Harvey WT, Hiller M, Hoekzema K, Houck ML, Jeong H, Kamali K, Kellis M, Kille B, Lee C, Lee Y, Lees W, Lewis AP, Li Q, Loftus M, Loh YHE, Loucks H, Ma J, Mao Y, Martinez JFI, Masterson P, McCoy RC, McGrath B, McKinney S, Meyer BS, Miga KH, Mohanty SK, Munson KM, Pal K, Pennell M, Pevzner PA, Porubsky D, Potapova T, Ringeling FR, Rocha JL, Ryder OA, Sacco S, Saha S, Sasaki T, Schatz MC, Schork NJ, Shanks C, Smeds L, Son DR, Steiner C, Sweeten AP, Tassia MG, Thibaud-Nissen F, Torres-González E, Trivedi M, Wei W, Wertz J, Yang M, Zhang P, Zhang S, Zhang Y, Zhang Z, Zhao SA, Zhu Y, Jarvis ED, Gerton JL, Rivas-González I, Paten B, Szpiech ZA, Huber CD, Lenz TL, Konkel MK, Yi SV, Canzar S, Watson CT, Sudmant PH, Molloy E, Garrison E, Lowe CB, Ventura M, O'Neill RJ, Koren S, Makova KD, Phillippy AM, Eichler EE. Complete sequencing of ape genomes. Nature 2025; 641:401-418. [PMID: 40205052 PMCID: PMC12058530 DOI: 10.1038/s41586-025-08816-3] [Show More Authors] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Accepted: 02/19/2025] [Indexed: 04/11/2025]
Abstract
The most dynamic and repetitive regions of great ape genomes have traditionally been excluded from comparative studies1-3. Consequently, our understanding of the evolution of our species is incomplete. Here we present haplotype-resolved reference genomes and comparative analyses of six ape species: chimpanzee, bonobo, gorilla, Bornean orangutan, Sumatran orangutan and siamang. We achieve chromosome-level contiguity with substantial sequence accuracy (<1 error in 2.7 megabases) and completely sequence 215 gapless chromosomes telomere-to-telomere. We resolve challenging regions, such as the major histocompatibility complex and immunoglobulin loci, to provide in-depth evolutionary insights. Comparative analyses enabled investigations of the evolution and diversity of regions previously uncharacterized or incompletely studied without bias from mapping to the human reference genome. Such regions include newly minted gene families in lineage-specific segmental duplications, centromeric DNA, acrocentric chromosomes and subterminal heterochromatin. This resource serves as a comprehensive baseline for future evolutionary studies of humans and our closest living ape relatives.
Collapse
Affiliation(s)
- DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arang Rhie
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Prajna Hebbar
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Francesca Antonacci
- Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, Italy
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Steven J Solar
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Dmitry Antipov
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Brandon D Pickett
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Yana Safonova
- Computer Science and Engineering Department, Huck Institutes of Life Sciences, Pennsylvania State University, State College, PA, USA
| | - Francesco Montinaro
- Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, Italy
- Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Yanting Luo
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC, USA
| | - Joanna Malukiewicz
- Research Unit for Evolutionary Immunogenomics, Department of Biology, University of Hamburg, Hamburg, Germany
- German Primate Center, Primate Genetics Laboratory, Goettingen, Germany
| | - Jessica M Storer
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
| | - Jiadong Lin
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Riley J Mangan
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Genetics Training Program, Harvard Medical School, Boston, MA, USA
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | | | - Anton Bankevich
- Computer Science and Engineering Department, Huck Institutes of Life Sciences, Pennsylvania State University, State College, PA, USA
| | - Christine R Beck
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA
| | - Arjun Biddanda
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | | | - Gerard G Bouffard
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Emry Brannan
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Shelise Y Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Lucia Carbone
- Department of Medicine, KCVI, Oregon Health Sciences University, Portland, OR, USA
- Division of Genetics, Oregon National Primate Research Center, Beaverton, OR, USA
| | - Laura Carrel
- PSU Medical School, Penn State University School of Medicine, Hershey, PA, USA
| | - Agnes P Chan
- The Translational Genomics Research Institute, City of Hope National Medical Center, Phoenix, AZ, USA
| | - Juyun Crawford
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Eric Engelbrecht
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, USA
| | - Cedric Feschotte
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
| | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Gage H Garcia
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Luciana de Gennaro
- Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, Italy
| | - David Gilbert
- San Diego Biomedical Research Institute, San Diego, CA, USA
| | - Richard E Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ishaan Gupta
- Department of Computer Science and Engineering, University of California, San Diego, San Diego, CA, USA
| | - Diana Haddad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Junmin Han
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Robert S Harris
- Department of Biology, Penn State University, University Park, PA, USA
| | | | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Michael Hiller
- LOEWE Centre for Translational Biodiversity Genomics, Frankfurt, Germany
- Senckenberg Research Institute, Frankfurt, Germany
- Institute of Cell Biology and Neuroscience, Faculty of Biosciences, Goethe University Frankfurt, Frankfurt, Germany
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Hyeonsoo Jeong
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kaivan Kamali
- Department of Biology, Penn State University, University Park, PA, USA
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Bryce Kille
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Chul Lee
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Youngho Lee
- Laboratory of Bioinformatics and Population Genetics, Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - William Lees
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, USA
- Bioengineering Program, Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Mark Loftus
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Yong Hwee Eddie Loh
- Neuroscience Research Institute, University of California, Santa Barbara, Santa Barbara, CA, USA
| | - Hailey Loucks
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
- Center for Genomic Research, International Institutes of Medicine, Fourth Affiliated Hospital, Zhejiang University, Yiwu, China
- Shanghai Jiao Tong University Chongqing Research Institute, Chongqing, China
| | - Juan F I Martinez
- Computer Science and Engineering Department, Huck Institutes of Life Sciences, Pennsylvania State University, State College, PA, USA
| | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Barbara McGrath
- Department of Biology, Penn State University, University Park, PA, USA
| | - Sean McKinney
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Britta S Meyer
- Research Unit for Evolutionary Immunogenomics, Department of Biology, University of Hamburg, Hamburg, Germany
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Saswat K Mohanty
- Department of Biology, Penn State University, University Park, PA, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Karol Pal
- Department of Biology, Penn State University, University Park, PA, USA
| | - Matt Pennell
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, San Diego, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Francisca R Ringeling
- Faculty of Informatics and Data Science, University of Regensburg, Regensburg, Germany
| | - Joana L Rocha
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA
| | | | - Samuel Sacco
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Swati Saha
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, USA
| | - Takayo Sasaki
- San Diego Biomedical Research Institute, San Diego, CA, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Nicholas J Schork
- The Translational Genomics Research Institute, City of Hope National Medical Center, Phoenix, AZ, USA
| | - Cole Shanks
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Linnéa Smeds
- Department of Biology, Penn State University, University Park, PA, USA
| | - Dongmin R Son
- Department of Ecology, Evolution and Marine Biology, Neuroscience Research Institute, University of California, Santa Barbara, Santa Barbara, CA, USA
| | | | - Alexander P Sweeten
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Michael G Tassia
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | - Mihir Trivedi
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Wenjie Wei
- School of Life Sciences, Westlake University, Hangzhou, China
- National Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
| | - Julie Wertz
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Muyu Yang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Panpan Zhang
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
| | - Shilong Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Yang Zhang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Zhenmiao Zhang
- Department of Computer Science and Engineering, University of California, San Diego, San Diego, CA, USA
| | - Sarah A Zhao
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Yixin Zhu
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
| | - Erich D Jarvis
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | | | - Iker Rivas-González
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Zachary A Szpiech
- Department of Biology, Penn State University, University Park, PA, USA
| | - Christian D Huber
- Department of Biology, Penn State University, University Park, PA, USA
| | - Tobias L Lenz
- Research Unit for Evolutionary Immunogenomics, Department of Biology, University of Hamburg, Hamburg, Germany
| | - Miriam K Konkel
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Soojin V Yi
- Department of Ecology, Evolution and Marine Biology, Neuroscience Research Institute, University of California, Santa Barbara, Santa Barbara, CA, USA
- Department of Molecular, Cellular and Developmental Biology, Neuroscience Research Institute, University of California, Santa Barbara, Santa Barbara, CA, USA
| | - Stefan Canzar
- Faculty of Informatics and Data Science, University of Regensburg, Regensburg, Germany
| | - Corey T Watson
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, USA
| | - Peter H Sudmant
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Erin Molloy
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Craig B Lowe
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC, USA
| | - Mario Ventura
- Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, Italy
| | - Rachel J O'Neill
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Sergey Koren
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Kateryna D Makova
- Department of Biology, Penn State University, University Park, PA, USA.
| | - Adam M Phillippy
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| |
Collapse
|
2
|
McCluskey BM, Batzel P, Postlethwait JH. The hybrid history of zebrafish. G3 (BETHESDA, MD.) 2025; 15:jkae299. [PMID: 39698833 PMCID: PMC11797037 DOI: 10.1093/g3journal/jkae299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2024] [Revised: 11/29/2024] [Accepted: 12/03/2024] [Indexed: 12/20/2024]
Abstract
Since the description of zebrafish (Danio rerio) in 1822, the identity of its closest living relative has been unclear. To address this problem, we sequenced the exomes of 10 species in genus Danio, using the closely related Devario aequipinnatus as outgroup, to infer relationships across the 25 chromosomes of the zebrafish genome. The majority of relationships within Danio were remarkably consistent across all chromosomes. Relationships of chromosome segments, however, depended systematically upon their genomic location within zebrafish chromosomes. Regions near chromosome centers identified Danio kyathit and/or Danio aesculapii as the closest relative of zebrafish, while segments near chromosome ends supported only D. aesculapii as the zebrafish sister species. Genome-wide comparisons of derived character states revealed that danio relationships are inconsistent with a simple bifurcating species history but support an ancient hybrid origin of the D. rerio lineage by homoploid hybrid speciation. We also found evidence of more recent gene flow limited to the high recombination ends of chromosomes and several megabases of chromosome 20 with a history distinct from the rest of the genome. Additional insights gained from incorporating genome structure into a phylogenomic study demonstrate the utility of such an approach for future studies in other taxa. The multiple genomic histories of species in the genus Danio have important implications for comparative studies in these morphologically varied and beautiful species and for our understanding of the hybrid evolutionary history of zebrafish.
Collapse
Affiliation(s)
- Braedan M McCluskey
- Minnesota Supercomputing Institute, University of Minnesota Twin Cities, Minneapolis, MN 55455, USA
- Institute of Neuroscience, University of Oregon, Eugene, OR 97403, USA
| | - Peter Batzel
- Institute of Neuroscience, University of Oregon, Eugene, OR 97403, USA
| | | |
Collapse
|
3
|
Bilgrav Saether K, Eisfeldt J, Bengtsson JD, Lun MY, Grochowski CM, Mahmoud M, Chao HT, Rosenfeld JA, Liu P, Ek M, Schuy J, Ameur A, Dai H, Hwang JP, Sedlazeck FJ, Bi W, Marom R, Wincent J, Nordgren A, Carvalho CMB, Lindstrand A. Leveraging the T2T assembly to resolve rare and pathogenic inversions in reference genome gaps. Genome Res 2024; 34:1785-1797. [PMID: 39486878 DOI: 10.1101/gr.279346.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 09/12/2024] [Indexed: 11/04/2024]
Abstract
Chromosomal inversions (INVs) are particularly challenging to detect due to their copy-number neutral state and association with repetitive regions. Inversions represent about 1/20 of all balanced structural chromosome aberrations and can lead to disease by gene disruption or altering regulatory regions of dosage-sensitive genes in cis Short-read genome sequencing (srGS) can only resolve ∼70% of cytogenetically visible inversions referred to clinical diagnostic laboratories, likely due to breakpoints in repetitive regions. Here, we study 12 inversions by long-read genome sequencing (lrGS) (n = 9) or srGS (n = 3) and resolve nine of them. In four cases, the inversion breakpoint region was missing from at least one of the human reference genomes (GRCh37, GRCh38, T2T-CHM13) and a reference agnostic analysis was needed. One of these cases, an INV9 mappable only in de novo assembled lrGS data using T2T-CHM13 disrupts EHMT1 consistent with a Mendelian diagnosis (Kleefstra syndrome 1; MIM#610253). Next, by pairwise comparison between T2T-CHM13, GRCh37, and GRCh38, as well as the chimpanzee and bonobo, we show that hundreds of megabases of sequence are missing from at least one human reference, highlighting that primate genomes contribute to genomic diversity. Aligning population genomic data to these regions indicated that these regions are variable between individuals. Our analysis emphasizes that T2T-CHM13 is necessary to maximize the value of lrGS for optimal inversion detection in clinical diagnostics. These results highlight the importance of leveraging diverse and comprehensive reference genomes to resolve unsolved molecular cases in rare diseases.
Collapse
Affiliation(s)
- Kristine Bilgrav Saether
- Department of Molecular Medicine and Surgery, Karolinska Institute, 171 76 Stockholm, Sweden
- Science for Life Laboratory, Karolinska Insitutet, 171 65 Solna, Sweden
| | - Jesper Eisfeldt
- Department of Molecular Medicine and Surgery, Karolinska Institute, 171 76 Stockholm, Sweden;
- Science for Life Laboratory, Karolinska Insitutet, 171 65 Solna, Sweden
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, 171 76 Stockholm, Sweden
| | - Jesse D Bengtsson
- Pacific Northwest Research Institute, Seattle, Washington 98122, USA
| | - Ming Yin Lun
- Pacific Northwest Research Institute, Seattle, Washington 98122, USA
| | - Christopher M Grochowski
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Medhat Mahmoud
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Center for Precision Health, McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas 77030, USA
| | - Hsiao-Tuan Chao
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Texas Children's Hospital, Houston, Texas 77030, USA
- Cain Pediatric Neurology Research Laboratories, Jan and Dan Duncan Neurological Research Institute, Houston, Texas 77030, USA
- Division of Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Neuroscience, Baylor College of Medicine, Houston, Texas 77030, USA
- McNair Medical Institute, The Robert and Janice McNair Foundation, Houston, Texas 77024, USA
| | - Jill A Rosenfeld
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Pengfei Liu
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Baylor Genetics Laboratory, Baylor College of Medicine, Houston, Texas 77021, USA
| | - Marlene Ek
- Department of Molecular Medicine and Surgery, Karolinska Institute, 171 76 Stockholm, Sweden
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, 171 76 Stockholm, Sweden
| | - Jakob Schuy
- Department of Molecular Medicine and Surgery, Karolinska Institute, 171 76 Stockholm, Sweden
| | - Adam Ameur
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 751 85 Uppsala, Sweden
| | - Hongzheng Dai
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Baylor Genetics Laboratory, Baylor College of Medicine, Houston, Texas 77021, USA
| | - James Paul Hwang
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Fritz J Sedlazeck
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Computer Science, Rice University, Houston, Texas 77251, USA
| | - Weimin Bi
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Baylor Genetics Laboratory, Baylor College of Medicine, Houston, Texas 77021, USA
| | - Ronit Marom
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Texas Children's Hospital, Houston, Texas 77030, USA
| | - Josephine Wincent
- Department of Molecular Medicine and Surgery, Karolinska Institute, 171 76 Stockholm, Sweden
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, 171 76 Stockholm, Sweden
| | - Ann Nordgren
- Department of Molecular Medicine and Surgery, Karolinska Institute, 171 76 Stockholm, Sweden
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, 171 76 Stockholm, Sweden
- Department of Laboratory Medicine, University of Gothenburg, 413 45 Gothenburg, Sweden
- Department of Clinical Genetics and Genomics, Sahlgrenska University Hospital, 413 45 Gothenburg, Sweden
| | | | - Anna Lindstrand
- Department of Molecular Medicine and Surgery, Karolinska Institute, 171 76 Stockholm, Sweden;
- Department of Clinical Genetics and Genomics, Karolinska University Hospital, 171 76 Stockholm, Sweden
| |
Collapse
|
4
|
Debbagh C, Folch G, Jabado-Michaloud J, Giudicelli V, Kossida S. Deciphering Gorilla gorilla gorilla immunoglobulin loci in multiple genome assemblies and enrichment of IMGT resources. Front Immunol 2024; 15:1475003. [PMID: 39450182 PMCID: PMC11499206 DOI: 10.3389/fimmu.2024.1475003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Accepted: 09/19/2024] [Indexed: 10/26/2024] Open
Abstract
Through the analysis of immunoglobulin genes at the IGH, IGK, and IGL loci from four Gorilla gorilla gorilla genome assemblies, IMGT® provides an in-depth overview of these loci and their individual variations in a species closely related to humans. The similarity between gorilla and human IG gene organization allowed the assignment of gorilla IG gene names based on their human counterparts. This study revealed significant findings, including variability in the IGH locus, the presence of known and new copy number variations (CNVs), and the accurate estimation of IGHG genes. The IGK locus displayed remarkable homogeneity and lacked the gene duplication seen in humans, while the IGL locus showed a previously unconfirmed CNV in the J-C cluster. The curated data from these analyses, available on the IMGT website, enhance our understanding of gorilla immunogenetics and provide valuable insights into primate evolution.
Collapse
Affiliation(s)
- Chahrazed Debbagh
- The International ImMunoGeneTics Information System (IMGT), Institute of Human Genetics (IGH), National Center for Scientific Research (CNRS), University of Montpellier (UM), Montpellier, France
| | - Géraldine Folch
- The International ImMunoGeneTics Information System (IMGT), Institute of Human Genetics (IGH), National Center for Scientific Research (CNRS), University of Montpellier (UM), Montpellier, France
| | - Joumana Jabado-Michaloud
- The International ImMunoGeneTics Information System (IMGT), Institute of Human Genetics (IGH), National Center for Scientific Research (CNRS), University of Montpellier (UM), Montpellier, France
| | - Véronique Giudicelli
- The International ImMunoGeneTics Information System (IMGT), Institute of Human Genetics (IGH), National Center for Scientific Research (CNRS), University of Montpellier (UM), Montpellier, France
| | - Sofia Kossida
- The International ImMunoGeneTics Information System (IMGT), Institute of Human Genetics (IGH), National Center for Scientific Research (CNRS), University of Montpellier (UM), Montpellier, France
- Institut Universitaire de France (IUF), Paris, France
| |
Collapse
|
5
|
Yoo D, Rhie A, Hebbar P, Antonacci F, Logsdon GA, Solar SJ, Antipov D, Pickett BD, Safonova Y, Montinaro F, Luo Y, Malukiewicz J, Storer JM, Lin J, Sequeira AN, Mangan RJ, Hickey G, Anez GM, Balachandran P, Bankevich A, Beck CR, Biddanda A, Borchers M, Bouffard GG, Brannan E, Brooks SY, Carbone L, Carrel L, Chan AP, Crawford J, Diekhans M, Engelbrecht E, Feschotte C, Formenti G, Garcia GH, de Gennaro L, Gilbert D, Green RE, Guarracino A, Gupta I, Haddad D, Han J, Harris RS, Hartley GA, Harvey WT, Hiller M, Hoekzema K, Houck ML, Jeong H, Kamali K, Kellis M, Kille B, Lee C, Lee Y, Lees W, Lewis AP, Li Q, Loftus M, Loh YHE, Loucks H, Ma J, Mao Y, Martinez JFI, Masterson P, McCoy RC, McGrath B, McKinney S, Meyer BS, Miga KH, Mohanty SK, Munson KM, Pal K, Pennell M, Pevzner PA, Porubsky D, Potapova T, Ringeling FR, Roha JL, Ryder OA, Sacco S, Saha S, Sasaki T, Schatz MC, Schork NJ, Shanks C, Smeds L, Son DR, Steiner C, Sweeten AP, Tassia MG, Thibaud-Nissen F, Torres-González E, Trivedi M, Wei W, Wertz J, Yang M, Zhang P, Zhang S, Zhang Y, Zhang Z, et alYoo D, Rhie A, Hebbar P, Antonacci F, Logsdon GA, Solar SJ, Antipov D, Pickett BD, Safonova Y, Montinaro F, Luo Y, Malukiewicz J, Storer JM, Lin J, Sequeira AN, Mangan RJ, Hickey G, Anez GM, Balachandran P, Bankevich A, Beck CR, Biddanda A, Borchers M, Bouffard GG, Brannan E, Brooks SY, Carbone L, Carrel L, Chan AP, Crawford J, Diekhans M, Engelbrecht E, Feschotte C, Formenti G, Garcia GH, de Gennaro L, Gilbert D, Green RE, Guarracino A, Gupta I, Haddad D, Han J, Harris RS, Hartley GA, Harvey WT, Hiller M, Hoekzema K, Houck ML, Jeong H, Kamali K, Kellis M, Kille B, Lee C, Lee Y, Lees W, Lewis AP, Li Q, Loftus M, Loh YHE, Loucks H, Ma J, Mao Y, Martinez JFI, Masterson P, McCoy RC, McGrath B, McKinney S, Meyer BS, Miga KH, Mohanty SK, Munson KM, Pal K, Pennell M, Pevzner PA, Porubsky D, Potapova T, Ringeling FR, Roha JL, Ryder OA, Sacco S, Saha S, Sasaki T, Schatz MC, Schork NJ, Shanks C, Smeds L, Son DR, Steiner C, Sweeten AP, Tassia MG, Thibaud-Nissen F, Torres-González E, Trivedi M, Wei W, Wertz J, Yang M, Zhang P, Zhang S, Zhang Y, Zhang Z, Zhao SA, Zhu Y, Jarvis ED, Gerton JL, Rivas-González I, Paten B, Szpiech ZA, Huber CD, Lenz TL, Konkel MK, Yi SV, Canzar S, Watson CT, Sudmant PH, Molloy E, Garrison E, Lowe CB, Ventura M, O’Neill RJ, Koren S, Makova KD, Phillippy AM, Eichler EE. Complete sequencing of ape genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.31.605654. [PMID: 39131277 PMCID: PMC11312596 DOI: 10.1101/2024.07.31.605654] [Show More Authors] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Abstract
We present haplotype-resolved reference genomes and comparative analyses of six ape species, namely: chimpanzee, bonobo, gorilla, Bornean orangutan, Sumatran orangutan, and siamang. We achieve chromosome-level contiguity with unparalleled sequence accuracy (<1 error in 500,000 base pairs), completely sequencing 215 gapless chromosomes telomere-to-telomere. We resolve challenging regions, such as the major histocompatibility complex and immunoglobulin loci, providing more in-depth evolutionary insights. Comparative analyses, including human, allow us to investigate the evolution and diversity of regions previously uncharacterized or incompletely studied without bias from mapping to the human reference. This includes newly minted gene families within lineage-specific segmental duplications, centromeric DNA, acrocentric chromosomes, and subterminal heterochromatin. This resource should serve as a definitive baseline for all future evolutionary studies of humans and our closest living ape relatives.
Collapse
Affiliation(s)
- DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arang Rhie
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Prajna Hebbar
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Francesca Antonacci
- Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, 70124, Italy
| | - Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19103, USA
| | - Steven J. Solar
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Dmitry Antipov
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Brandon D. Pickett
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Yana Safonova
- Computer Science and Engineering Department, Huck Institutes of Life Sciences, Pennsylvania State University, State College, PA 16801, USA
| | - Francesco Montinaro
- Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, 70124, Italy
- Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Yanting Luo
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC 27710, USA
| | - Joanna Malukiewicz
- Research Unit for Evolutionary Immunogenomics, Department of Biology, University of Hamburg, 20146 Hamburg, Germany
| | - Jessica M. Storer
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| | - Jiadong Lin
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Abigail N. Sequeira
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Riley J. Mangan
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Genetics Training Program, Harvard Medical School, Boston, MA 02115, USA
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | | | | | - Anton Bankevich
- Computer Science and Engineering Department, Huck Institutes of Life Sciences, Pennsylvania State University, State College, PA 16801, USA
| | - Christine R. Beck
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA
| | - Arjun Biddanda
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Matthew Borchers
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Gerard G. Bouffard
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Emry Brannan
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Shelise Y. Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Lucia Carbone
- Department of Medicine, KCVI, Oregon Health Sciences University, Portland, OR, USA
- Division of Genetics, Oregon National Primate Research Center, Beaverton, OR, USA
| | - Laura Carrel
- PSU Medical School, Penn State University School of Medicine, Hershey, PA, USA
| | - Agnes P. Chan
- The Translational Genomics Research Institute, a part of the City of Hope National Medical Center, Phoenix, AZ, USA
| | - Juyun Crawford
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Eric Engelbrecht
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, USA
| | - Cedric Feschotte
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY 10021, USA
| | - Gage H. Garcia
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Luciana de Gennaro
- Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, 70124, Italy
| | - David Gilbert
- San Diego Biomedical Research Institute, San Diego, CA, USA
| | | | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Ishaan Gupta
- Department of Computer Science and Engineering, University of California San Diego, CA, USA
| | - Diana Haddad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Junmin Han
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Robert S. Harris
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Gabrielle A. Hartley
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| | - William T. Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Michael Hiller
- LOEWE Centre for Translational Biodiversity Genomics, Senckenberg Research Institute, Goethe University, Frankfurt, Germany
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marlys L. Houck
- San Diego Zoo Wildlife Alliance, Escondido, CA, 92027-7000, USA
| | - Hyeonsoo Jeong
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Kaivan Kamali
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Bryce Kille
- Department of Computer Science, Rice University, Houston, TX 77005, USA
| | - Chul Lee
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Youngho Lee
- Laboratory of bioinformatics and population genetics, Interdisciplinary program in bioinformatics, Seoul National University, Republic of Korea
| | - William Lees
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, USA
- Bioengineering Program, Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel
| | - Alexandra P. Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Mark Loftus
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Yong Hwee Eddie Loh
- Neuroscience Research Institute, University of California, Santa Barbara, CA, USA
| | - Hailey Loucks
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, PA, USA
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
- Center for Genomic Research, International Institutes of Medicine, Fourth Affiliated Hospital, Zhejiang University, Yiwu, Zhejiang, China
- Shanghai Jiao Tong University Chongqing Research Institute, Chongqing, China
| | - Juan F. I. Martinez
- Computer Science and Engineering Department, Huck Institutes of Life Sciences, Pennsylvania State University, State College, PA 16801, USA
| | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Rajiv C. McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Barbara McGrath
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Sean McKinney
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Britta S. Meyer
- Research Unit for Evolutionary Immunogenomics, Department of Biology, University of Hamburg, 20146 Hamburg, Germany
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Saswat K. Mohanty
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Karol Pal
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Matt Pennell
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California San Diego, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Francisca R. Ringeling
- Faculty of Informatics and Data Science, University of Regensburg, 93053 Regensburg, Germany
| | - Joana L. Roha
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA
| | - Oliver A. Ryder
- San Diego Zoo Wildlife Alliance, Escondido, CA, 92027-7000, USA
| | - Samuel Sacco
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Swati Saha
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, USA
| | - Takayo Sasaki
- San Diego Biomedical Research Institute, San Diego, CA, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Nicholas J. Schork
- The Translational Genomics Research Institute, a part of the City of Hope National Medical Center, Phoenix, AZ, USA
| | - Cole Shanks
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Linnéa Smeds
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Dongmin R. Son
- Department of Ecology, Evolution and Marine Biology, Neuroscience Research Institute, University of California, Santa Barbara, CA, USA
| | - Cynthia Steiner
- San Diego Zoo Wildlife Alliance, Escondido, CA, 92027-7000, USA
| | - Alexander P. Sweeten
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Michael G. Tassia
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | - Mihir Trivedi
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Wenjie Wei
- School of Life Sciences, Westlake University, Hangzhou 310024, China
- National Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, 430070, Wuhan, China
| | - Julie Wertz
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Muyu Yang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, PA, USA
| | - Panpan Zhang
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Shilong Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Yang Zhang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, PA, USA
| | - Zhenmiao Zhang
- Department of Computer Science and Engineering, University of California San Diego, CA, USA
| | - Sarah A. Zhao
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Yixin Zhu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Erich D. Jarvis
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | | | - Iker Rivas-González
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Zachary A. Szpiech
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Christian D. Huber
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Tobias L. Lenz
- Research Unit for Evolutionary Immunogenomics, Department of Biology, University of Hamburg, 20146 Hamburg, Germany
| | - Miriam K. Konkel
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Soojin V. Yi
- Department of Ecology, Evolution and Marine Biology, Department of Molecular, Cellular and Developmental Biology, Neuroscience Research Institute, University of California, Santa Barbara, CA, USA
| | - Stefan Canzar
- Faculty of Informatics and Data Science, University of Regensburg, 93053 Regensburg, Germany
| | - Corey T. Watson
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Louisville, Louisville, KY, USA
| | - Peter H. Sudmant
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA
- Center for Computational Biology, University of California, Berkeley, Berkeley, USA
| | - Erin Molloy
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Craig B. Lowe
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC 27710, USA
| | - Mario Ventura
- Department of Biosciences, Biotechnology and Environment, University of Bari, Bari, 70124, Italy
| | - Rachel J. O’Neill
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA
- Departments of Molecular and Cell Biology, UConn Storrs, CT, USA
| | - Sergey Koren
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Kateryna D. Makova
- Department of Biology, Penn State University, University Park, PA 16802, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
6
|
L Rocha J, Lou RN, Sudmant PH. Structural variation in humans and our primate kin in the era of telomere-to-telomere genomes and pangenomics. Curr Opin Genet Dev 2024; 87:102233. [PMID: 39042999 PMCID: PMC11695101 DOI: 10.1016/j.gde.2024.102233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 07/02/2024] [Accepted: 07/05/2024] [Indexed: 07/25/2024]
Abstract
Structural variants (SVs) account for the majority of base pair differences both within and between primate species. However, our understanding of inter- and intra-species SV has been historically hampered by the quality of draft primate genomes and the absence of genome resources for key taxa. Recently, advances in long-read sequencing and genome assembly have begun to radically reshape our understanding of SVs. Two landmark achievements include the publication of a human telomere-to-telomere (T2T) genome as well as the development of the first human pangenome reference. In this review, we first look back to the major works laying the foundation for these projects. We then examine the ways in which T2T genome assemblies and pangenomes are transforming our understanding of and approach to primate SV. Finally, we discuss what the future of primate SV research may look like in the era of T2T genomes and pangenomics.
Collapse
Affiliation(s)
- Joana L Rocha
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA. https://twitter.com/@joanocha
| | - Runyang N Lou
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA. https://twitter.com/@NicolasLou10
| | - Peter H Sudmant
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA; Center for Computational Biology, University of California, Berkeley, Berkeley, USA.
| |
Collapse
|
7
|
Makova KD, Pickett BD, Harris RS, Hartley GA, Cechova M, Pal K, Nurk S, Yoo D, Li Q, Hebbar P, McGrath BC, Antonacci F, Aubel M, Biddanda A, Borchers M, Bornberg-Bauer E, Bouffard GG, Brooks SY, Carbone L, Carrel L, Carroll A, Chang PC, Chin CS, Cook DE, Craig SJC, de Gennaro L, Diekhans M, Dutra A, Garcia GH, Grady PGS, Green RE, Haddad D, Hallast P, Harvey WT, Hickey G, Hillis DA, Hoyt SJ, Jeong H, Kamali K, Pond SLK, LaPolice TM, Lee C, Lewis AP, Loh YHE, Masterson P, McGarvey KM, McCoy RC, Medvedev P, Miga KH, Munson KM, Pak E, Paten B, Pinto BJ, Potapova T, Rhie A, Rocha JL, Ryabov F, Ryder OA, Sacco S, Shafin K, Shepelev VA, Slon V, Solar SJ, Storer JM, Sudmant PH, Sweetalana, Sweeten A, Tassia MG, Thibaud-Nissen F, Ventura M, Wilson MA, Young AC, Zeng H, Zhang X, Szpiech ZA, Huber CD, Gerton JL, Yi SV, Schatz MC, Alexandrov IA, Koren S, O'Neill RJ, Eichler EE, Phillippy AM. The complete sequence and comparative analysis of ape sex chromosomes. Nature 2024; 630:401-411. [PMID: 38811727 PMCID: PMC11168930 DOI: 10.1038/s41586-024-07473-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 04/26/2024] [Indexed: 05/31/2024]
Abstract
Apes possess two sex chromosomes-the male-specific Y chromosome and the X chromosome, which is present in both males and females. The Y chromosome is crucial for male reproduction, with deletions being linked to infertility1. The X chromosome is vital for reproduction and cognition2. Variation in mating patterns and brain function among apes suggests corresponding differences in their sex chromosomes. However, owing to their repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the methodology developed for the telomere-to-telomere (T2T) human genome, we produced gapless assemblies of the X and Y chromosomes for five great apes (bonobo (Pan paniscus), chimpanzee (Pan troglodytes), western lowland gorilla (Gorilla gorilla gorilla), Bornean orangutan (Pongo pygmaeus) and Sumatran orangutan (Pongo abelii)) and a lesser ape (the siamang gibbon (Symphalangus syndactylus)), and untangled the intricacies of their evolution. Compared with the X chromosomes, the ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements-owing to the accumulation of lineage-specific ampliconic regions, palindromes, transposable elements and satellites. Many Y chromosome genes expand in multi-copy families and some evolve under purifying selection. Thus, the Y chromosome exhibits dynamic evolution, whereas the X chromosome is more stable. Mapping short-read sequencing data to these assemblies revealed diversity and selection patterns on sex chromosomes of more than 100 individual great apes. These reference assemblies are expected to inform human evolution and conservation genetics of non-human apes, all of which are endangered species.
Collapse
Affiliation(s)
| | - Brandon D Pickett
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Monika Cechova
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Karol Pal
- Penn State University, University Park, PA, USA
| | - Sergey Nurk
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - DongAhn Yoo
- University of Washington School of Medicine, Seattle, WA, USA
| | - Qiuhui Li
- Johns Hopkins University, Baltimore, MD, USA
| | - Prajna Hebbar
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | | | | | - Erich Bornberg-Bauer
- University of Münster, Münster, Germany
- MPI for Developmental Biology, Tübingen, Germany
| | - Gerard G Bouffard
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shelise Y Brooks
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Lucia Carbone
- Oregon Health and Science University, Portland, OR, USA
- Oregon National Primate Research Center, Hillsboro, OR, USA
| | - Laura Carrel
- Penn State University School of Medicine, Hershey, PA, USA
| | | | | | - Chen-Shan Chin
- Foundation of Biological Data Sciences, Belmont, CA, USA
| | | | | | | | - Mark Diekhans
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Amalia Dutra
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Gage H Garcia
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Diana Haddad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Glenn Hickey
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - David A Hillis
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | - Hyeonsoo Jeong
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Yong-Hwee E Loh
- University of California Santa Barbara, Santa Barbara, CA, USA
| | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Kelly M McGarvey
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Karen H Miga
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Evgenia Pak
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Benedict Paten
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | - Arang Rhie
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Joana L Rocha
- University of California Berkeley, Berkeley, CA, USA
| | - Fedor Ryabov
- Masters Program in National Research, University Higher School of Economics, Moscow, Russia
| | | | - Samuel Sacco
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | - Steven J Solar
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Sweetalana
- Penn State University, University Park, PA, USA
| | - Alex Sweeten
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Johns Hopkins University, Baltimore, MD, USA
| | | | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Mario Ventura
- Università degli Studi di Bari Aldo Moro, Bari, Italy
| | | | - Alice C Young
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Xinru Zhang
- Penn State University, University Park, PA, USA
| | | | | | | | - Soojin V Yi
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | | | - Sergey Koren
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Evan E Eichler
- University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| | - Adam M Phillippy
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
8
|
Mao Y, Harvey WT, Porubsky D, Munson KM, Hoekzema K, Lewis AP, Audano PA, Rozanski A, Yang X, Zhang S, Yoo D, Gordon DS, Fair T, Wei X, Logsdon GA, Haukness M, Dishuck PC, Jeong H, Del Rosario R, Bauer VL, Fattor WT, Wilkerson GK, Mao Y, Shi Y, Sun Q, Lu Q, Paten B, Bakken TE, Pollen AA, Feng G, Sawyer SL, Warren WC, Carbone L, Eichler EE. Structurally divergent and recurrently mutated regions of primate genomes. Cell 2024; 187:1547-1562.e13. [PMID: 38428424 PMCID: PMC10947866 DOI: 10.1016/j.cell.2024.01.052] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 11/26/2023] [Accepted: 01/31/2024] [Indexed: 03/03/2024]
Abstract
We sequenced and assembled using multiple long-read sequencing technologies the genomes of chimpanzee, bonobo, gorilla, orangutan, gibbon, macaque, owl monkey, and marmoset. We identified 1,338,997 lineage-specific fixed structural variants (SVs) disrupting 1,561 protein-coding genes and 136,932 regulatory elements, including the most complete set of human-specific fixed differences. We estimate that 819.47 Mbp or ∼27% of the genome has been affected by SVs across primate evolution. We identify 1,607 structurally divergent regions wherein recurrent structural variation contributes to creating SV hotspots where genes are recurrently lost (e.g., CARD, C4, and OLAH gene families) and additional lineage-specific genes are generated (e.g., CKAP2, VPS36, ACBD7, and NEK5 paralogs), becoming targets of rapid chromosomal diversification and positive selection (e.g., RGPD gene family). High-fidelity long-read sequencing has made these dynamic regions of the genome accessible for sequence-level analyses within and between primate species.
Collapse
Affiliation(s)
- Yafei Mao
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA; Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China.
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Allison Rozanski
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Xiangyu Yang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Shilong Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David S Gordon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Tyler Fair
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA
| | - Xiaoxi Wei
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Philip C Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Hyeonsoo Jeong
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ricardo Del Rosario
- McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Vanessa L Bauer
- BioFrontiers Institute, Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Bouder, CO, USA
| | - Will T Fattor
- BioFrontiers Institute, Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Bouder, CO, USA
| | - Gregory K Wilkerson
- Department of Veterinary Sciences, Michale E. Keeling Center for Comparative Medicine and Research, The University of Texas MD Anderson Cancer Center, Bastrop, TX, USA; Department of Clinical Sciences, North Carolina State University, Raleigh, NC, USA
| | - Yuxiang Mao
- Institute of Neuroscience, State Key Laboratory of Neuroscience, Center for Excellence in Brain Science & Intelligence Technology, Chinese Academy of Sciences, Shanghai, China; Shanghai Center for Brain Science and Brain-Inspired Intelligence Technology, Shanghai, China
| | - Yongyong Shi
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China; Institute of Neuroscience, State Key Laboratory of Neuroscience, Center for Excellence in Brain Science & Intelligence Technology, Chinese Academy of Sciences, Shanghai, China; Shanghai Center for Brain Science and Brain-Inspired Intelligence Technology, Shanghai, China
| | - Qiang Sun
- Institute of Neuroscience, State Key Laboratory of Neuroscience, Center for Excellence in Brain Science & Intelligence Technology, Chinese Academy of Sciences, Shanghai, China; Shanghai Center for Brain Science and Brain-Inspired Intelligence Technology, Shanghai, China
| | - Qing Lu
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Alex A Pollen
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA; Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Guoping Feng
- McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Sara L Sawyer
- BioFrontiers Institute, Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Bouder, CO, USA
| | - Wesley C Warren
- Department of Animal Sciences, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA; Department of Surgery, School of Medicine, University of Missouri, Columbia, MO, USA; Institute of Data Science and Informatics, University of Missouri, Columbia, MO, USA
| | - Lucia Carbone
- Department of Medicine, Knight Cardiovascular Institute, Oregon Health and Science University, Portland, OR, USA; Division of Genetics, Oregon National Primate Research Center, Beaverton, OR, USA; Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, USA; Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
9
|
Bukhman YV, Morin PA, Meyer S, Chu LF, Jacobsen JK, Antosiewicz-Bourget J, Mamott D, Gonzales M, Argus C, Bolin J, Berres ME, Fedrigo O, Steill J, Swanson SA, Jiang P, Rhie A, Formenti G, Phillippy AM, Harris RS, Wood JMD, Howe K, Kirilenko BM, Munegowda C, Hiller M, Jain A, Kihara D, Johnston JS, Ionkov A, Raja K, Toh H, Lang A, Wolf M, Jarvis ED, Thomson JA, Chaisson MJP, Stewart R. A High-Quality Blue Whale Genome, Segmental Duplications, and Historical Demography. Mol Biol Evol 2024; 41:msae036. [PMID: 38376487 PMCID: PMC10919930 DOI: 10.1093/molbev/msae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 01/11/2024] [Accepted: 01/22/2024] [Indexed: 02/21/2024] Open
Abstract
The blue whale, Balaenoptera musculus, is the largest animal known to have ever existed, making it an important case study in longevity and resistance to cancer. To further this and other blue whale-related research, we report a reference-quality, long-read-based genome assembly of this fascinating species. We assembled the genome from PacBio long reads and utilized Illumina/10×, optical maps, and Hi-C data for scaffolding, polishing, and manual curation. We also provided long read RNA-seq data to facilitate the annotation of the assembly by NCBI and Ensembl. Additionally, we annotated both haplotypes using TOGA and measured the genome size by flow cytometry. We then compared the blue whale genome with other cetaceans and artiodactyls, including vaquita (Phocoena sinus), the world's smallest cetacean, to investigate blue whale's unique biological traits. We found a dramatic amplification of several genes in the blue whale genome resulting from a recent burst in segmental duplications, though the possible connection between this amplification and giant body size requires further study. We also discovered sites in the insulin-like growth factor-1 gene correlated with body size in cetaceans. Finally, using our assembly to examine the heterozygosity and historical demography of Pacific and Atlantic blue whale populations, we found that the genomes of both populations are highly heterozygous and that their genetic isolation dates to the last interglacial period. Taken together, these results indicate how a high-quality, annotated blue whale genome will serve as an important resource for biology, evolution, and conservation research.
Collapse
Affiliation(s)
- Yury V Bukhman
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Phillip A Morin
- Southwest Fisheries Science Center, National Oceanic and Atmospheric Administration (NOAA), La Jolla, CA 92037, USA
| | - Susanne Meyer
- Neuroscience Research Institute, University of California, Santa Barbara, CA, USA
| | - Li-Fang Chu
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
- Department of Comparative Biology and Experimental Medicine, University of Calgary, Calgary, Canada
| | | | | | - Daniel Mamott
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Maylie Gonzales
- Neuroscience Research Institute, University of California, Santa Barbara, CA, USA
| | - Cara Argus
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Jennifer Bolin
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Mark E Berres
- University of Wisconsin Biotechnology Center, Bioinformatics Resource Center, University of Wisconsin - Madison, Madison, WI 53706, USA
| | - Olivier Fedrigo
- Vertebrate Genome Lab, The Rockefeller University, New York, NY 10065, USA
| | - John Steill
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Scott A Swanson
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Peng Jiang
- Center for Gene Regulation in Health and Disease (GRHD), Cleveland State University, Cleveland, OH, USA
- Department of Biological, Geological and Environmental Sciences, Cleveland State University, Cleveland, OH, USA
- Center for RNA Science and Therapeutics, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Arang Rhie
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD 20892, USA
| | - Giulio Formenti
- Laboratory of Neurogenetics of Language, The Rockefeller University/HHMI, New York, NY 10065, USA
| | - Adam M Phillippy
- Genome Informatics Section, National Human Genome Research Institute, Bethesda, MD 20892, USA
| | - Robert S Harris
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
| | | | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - Bogdan M Kirilenko
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Institute of Cell Biology and Neuroscience, Faculty of Biosciences, Goethe University Frankfurt, 60438 Frankfurt, Germany
| | - Chetan Munegowda
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Institute of Cell Biology and Neuroscience, Faculty of Biosciences, Goethe University Frankfurt, 60438 Frankfurt, Germany
| | - Michael Hiller
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Institute of Cell Biology and Neuroscience, Faculty of Biosciences, Goethe University Frankfurt, 60438 Frankfurt, Germany
| | - Aashish Jain
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA
| | - J Spencer Johnston
- Department of Entomology, Texas A&M University, College Station, TX 77843, USA
| | - Alexander Ionkov
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Kalpana Raja
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| | - Huishi Toh
- Neuroscience Research Institute, University of California, Santa Barbara, CA, USA
| | - Aimee Lang
- Southwest Fisheries Science Center, National Oceanic and Atmospheric Administration (NOAA), La Jolla, CA 92037, USA
| | - Magnus Wolf
- Institute for Evolution and Biodiversity (IEB), University of Muenster, 48149, Muenster, Germany
- Senckenberg Biodiversity and Climate Research Centre (BiK-F), Frankfurt am Main, Germany
| | - Erich D Jarvis
- Vertebrate Genome Lab, The Rockefeller University, New York, NY 10065, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University/HHMI, New York, NY 10065, USA
| | - James A Thomson
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
- Department of Molecular, Cellular and Developmental Biology, University of California Santa Barbara, Santa Barbara, CA 93106, USA
- Department of Cell and Regenerative Biology, University of Wisconsin School of Medicine and Public Health, Madison, WI 53726, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, Los Angeles, CA 90089, USA
| | - Ron Stewart
- Regenerative Biology, Morgridge Institute for Research, Madison, WI 53715, USA
| |
Collapse
|
10
|
Wang Z, Liu C, Liu W, Lv X, Hu T, Yang F, Yang W, He L, Huang X. Long-read sequencing reveals the structural complexity of genomic integration of HPV DNA in cervical cancer cell lines. BMC Genomics 2024; 25:198. [PMID: 38378450 PMCID: PMC10877919 DOI: 10.1186/s12864-024-10101-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 02/08/2024] [Indexed: 02/22/2024] Open
Abstract
BACKGROUND Cervical cancer (CC) causes more than 311,000 deaths annually worldwide. The integration of human papillomavirus (HPV) is a crucial genetic event that contributes to cervical carcinogenesis. Despite HPV DNA integration is known to disrupt the genomic architecture of both the host and viral genomes in CC, the complexity of this process remains largely unexplored. RESULTS In this study, we conducted whole-genome sequencing (WGS) at 55-65X coverage utilizing the PacBio long-read sequencing platform in SiHa and HeLa cells, followed by comprehensive analyses of the sequence data to elucidate the complexity of HPV integration. Firstly, our results demonstrated that PacBio long-read sequencing effectively identifies HPV integration breakpoints with comparable accuracy to targeted-capture Next-generation sequencing (NGS) methods. Secondly, we constructed detailed models of complex integrated genome structures that included both the HPV genome and nearby regions of the human genome by utilizing PacBio long-read WGS. Thirdly, our sequencing results revealed the occurrence of a wide variety of genome-wide structural variations (SVs) in SiHa and HeLa cells. Additionally, our analysis further revealed a potential correlation between changes in gene expression levels and SVs on chromosome 13 in the genome of SiHa cells. CONCLUSIONS Using PacBio long-read sequencing, we have successfully constructed complex models illustrating HPV integrated genome structures in SiHa and HeLa cells. This accomplishment serves as a compelling demonstration of the valuable capabilities of long-read sequencing in detecting and characterizing HPV genomic integration structures within human cells. Furthermore, these findings offer critical insights into the complex process of HPV16 and HPV18 integration and their potential contribution to the development of cervical cancer.
Collapse
Affiliation(s)
- Zhijie Wang
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
| | - Chen Liu
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
| | - Wanxin Liu
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
| | - Xinyi Lv
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
| | - Ting Hu
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
| | - Fan Yang
- Wuhan Kandwise Biotechnology, Inc. Wuhan, Hubei, China
| | - Wenhui Yang
- Wuhan Kandwise Biotechnology, Inc. Wuhan, Hubei, China
| | - Liang He
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China.
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China.
| | - Xiaoyuan Huang
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China.
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China.
| |
Collapse
|
11
|
Makova KD, Pickett BD, Harris RS, Hartley GA, Cechova M, Pal K, Nurk S, Yoo D, Li Q, Hebbar P, McGrath BC, Antonacci F, Aubel M, Biddanda A, Borchers M, Bomberg E, Bouffard GG, Brooks SY, Carbone L, Carrel L, Carroll A, Chang PC, Chin CS, Cook DE, Craig SJ, de Gennaro L, Diekhans M, Dutra A, Garcia GH, Grady PG, Green RE, Haddad D, Hallast P, Harvey WT, Hickey G, Hillis DA, Hoyt SJ, Jeong H, Kamali K, Kosakovsky Pond SL, LaPolice TM, Lee C, Lewis AP, Loh YHE, Masterson P, McCoy RC, Medvedev P, Miga KH, Munson KM, Pak E, Paten B, Pinto BJ, Potapova T, Rhie A, Rocha JL, Ryabov F, Ryder OA, Sacco S, Shafin K, Shepelev VA, Slon V, Solar SJ, Storer JM, Sudmant PH, Sweetalana, Sweeten A, Tassia MG, Thibaud-Nissen F, Ventura M, Wilson MA, Young AC, Zeng H, Zhang X, Szpiech ZA, Huber CD, Gerton JL, Yi SV, Schatz MC, Alexandrov IA, Koren S, O’Neill RJ, Eichler E, Phillippy AM. The Complete Sequence and Comparative Analysis of Ape Sex Chromosomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.30.569198. [PMID: 38077089 PMCID: PMC10705393 DOI: 10.1101/2023.11.30.569198] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
Apes possess two sex chromosomes-the male-specific Y and the X shared by males and females. The Y chromosome is crucial for male reproduction, with deletions linked to infertility. The X chromosome carries genes vital for reproduction and cognition. Variation in mating patterns and brain function among great apes suggests corresponding differences in their sex chromosome structure and evolution. However, due to their highly repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the state-of-the-art experimental and computational methods developed for the telomere-to-telomere (T2T) human genome, we produced gapless, complete assemblies of the X and Y chromosomes for five great apes (chimpanzee, bonobo, gorilla, Bornean and Sumatran orangutans) and a lesser ape, the siamang gibbon. These assemblies completely resolved ampliconic, palindromic, and satellite sequences, including the entire centromeres, allowing us to untangle the intricacies of ape sex chromosome evolution. We found that, compared to the X, ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements. This divergence on the Y arises from the accumulation of lineage-specific ampliconic regions and palindromes (which are shared more broadly among species on the X) and from the abundance of transposable elements and satellites (which have a lower representation on the X). Our analysis of Y chromosome genes revealed lineage-specific expansions of multi-copy gene families and signatures of purifying selection. In summary, the Y exhibits dynamic evolution, while the X is more stable. Finally, mapping short-read sequencing data from >100 great ape individuals revealed the patterns of diversity and selection on their sex chromosomes, demonstrating the utility of these reference assemblies for studies of great ape evolution. These complete sex chromosome assemblies are expected to further inform conservation genetics of nonhuman apes, all of which are endangered species.
Collapse
Affiliation(s)
| | - Brandon D. Pickett
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Monika Cechova
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Karol Pal
- Penn State University, University Park, PA, USA
| | - Sergey Nurk
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - DongAhn Yoo
- University of Washington School of Medicine, Seattle, WA, USA
| | - Qiuhui Li
- Johns Hopkins University, Baltimore, MD, USA
| | - Prajna Hebbar
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | | | | | - Erich Bomberg
- University of Münster, Münster, Germany
- MPI for Developmental Biology, Tübingen, Germany
| | - Gerard G. Bouffard
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shelise Y. Brooks
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Lucia Carbone
- Oregon Health & Science University, Portland, OR, USA
- Oregon National Primate Research Center, Hillsboro, OR, USA
| | - Laura Carrel
- Penn State University School of Medicine, Hershey, PA, USA
| | | | | | - Chen-Shan Chin
- Foundation of Biological Data Sciences, Belmont, CA, USA
| | | | | | | | - Mark Diekhans
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Amalia Dutra
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Gage H. Garcia
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Diana Haddad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Glenn Hickey
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - David A. Hillis
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | - Hyeonsoo Jeong
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | | | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Karen H. Miga
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Evgenia Pak
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Benedict Paten
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | - Arang Rhie
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | | | - Samuel Sacco
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | - Steven J. Solar
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Sweetalana
- Penn State University, University Park, PA, USA
| | - Alex Sweeten
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Johns Hopkins University, Baltimore, MD, USA
| | | | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Alice C. Young
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Xinru Zhang
- Penn State University, University Park, PA, USA
| | | | | | | | - Soojin V. Yi
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | | | - Sergey Koren
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Evan Eichler
- University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Adam M. Phillippy
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
12
|
Ling X, Wang C, Li L, Pan L, Huang C, Zhang C, Huang Y, Qiu Y, Lin F, Huang Y. Third-generation sequencing for genetic disease. Clin Chim Acta 2023; 551:117624. [PMID: 37923104 DOI: 10.1016/j.cca.2023.117624] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 10/31/2023] [Accepted: 10/31/2023] [Indexed: 11/07/2023]
Abstract
Third-generation sequencing (TGS) has led to a brave new revolution in detecting genetic diseases over the last few years. TGS has been rapidly developed for genetic disease applications owing to its significant advantages such as long read length, rapid detection, and precise detection of complex and rare structural variants. This approach greatly improves the efficiency of disease diagnosis and complements the shortcomings of short-read sequencing. In this paper, we first briefly introduce the working mechanism of one of the most important representatives of TGS, single-molecule real-time (SMRT) sequencing by Pacific Bioscience (PacBio), followed by a review and comparison of the advantages and disadvantages of different sequencing technologies. Finally, we focused on the progress of SMRT sequencing applications in genetic disease detection. Future perspectives on the applications of TGS in other fields were also presented. With the continuous innovation of the SMRT technologies and the expansion of their fields of application, SMRT sequencing has broad clinical application prospects in genetic diseases detection, and is expected to become an important tool for the molecular diagnosis of other diseases.
Collapse
Affiliation(s)
- Xiaoting Ling
- Department of Clinical Laboratory, The First Affiliated Hospital of Guangxi Medical University, Key Laboratory of Clinical Laboratory Medicine of Guangxi Department of Education, Guangxi Medical University, Nanning 530021, China
| | - Chenghan Wang
- Department of Clinical Laboratory, The First Affiliated Hospital of Guangxi Medical University, Key Laboratory of Clinical Laboratory Medicine of Guangxi Department of Education, Guangxi Medical University, Nanning 530021, China
| | - Linlin Li
- Department of Clinical Laboratory, The First Affiliated Hospital of Guangxi Medical University, Key Laboratory of Clinical Laboratory Medicine of Guangxi Department of Education, Guangxi Medical University, Nanning 530021, China
| | - Liqiu Pan
- Department of Clinical Laboratory, The First Affiliated Hospital of Guangxi Medical University, Key Laboratory of Clinical Laboratory Medicine of Guangxi Department of Education, Guangxi Medical University, Nanning 530021, China
| | - Chaoyu Huang
- Department of Clinical Laboratory, The First Affiliated Hospital of Guangxi Medical University, Key Laboratory of Clinical Laboratory Medicine of Guangxi Department of Education, Guangxi Medical University, Nanning 530021, China
| | - Caixia Zhang
- Department of Clinical Laboratory, The First Affiliated Hospital of Guangxi Medical University, Key Laboratory of Clinical Laboratory Medicine of Guangxi Department of Education, Guangxi Medical University, Nanning 530021, China
| | - Yunhua Huang
- Department of Clinical Laboratory, The First Affiliated Hospital of Guangxi Medical University, Key Laboratory of Clinical Laboratory Medicine of Guangxi Department of Education, Guangxi Medical University, Nanning 530021, China
| | - Yuling Qiu
- NHC Key Laboratory of Thalassemia Medicine, Guangxi Medical University, Nanning 530021, China; Guangxi Key Laboratory of Thalassemia Research, Guangxi Medical University, Nanning 530021, China
| | - Faquan Lin
- Department of Clinical Laboratory, The First Affiliated Hospital of Guangxi Medical University, Key Laboratory of Clinical Laboratory Medicine of Guangxi Department of Education, Guangxi Medical University, Nanning 530021, China.
| | - Yifang Huang
- Department of Clinical Laboratory, The First Affiliated Hospital of Guangxi Medical University, Key Laboratory of Clinical Laboratory Medicine of Guangxi Department of Education, Guangxi Medical University, Nanning 530021, China.
| |
Collapse
|
13
|
Pollen AA, Kilik U, Lowe CB, Camp JG. Human-specific genetics: new tools to explore the molecular and cellular basis of human evolution. Nat Rev Genet 2023; 24:687-711. [PMID: 36737647 PMCID: PMC9897628 DOI: 10.1038/s41576-022-00568-4] [Citation(s) in RCA: 48] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/08/2022] [Indexed: 02/05/2023]
Abstract
Our ancestors acquired morphological, cognitive and metabolic modifications that enabled humans to colonize diverse habitats, develop extraordinary technologies and reshape the biosphere. Understanding the genetic, developmental and molecular bases for these changes will provide insights into how we became human. Connecting human-specific genetic changes to species differences has been challenging owing to an abundance of low-effect size genetic changes, limited descriptions of phenotypic differences across development at the level of cell types and lack of experimental models. Emerging approaches for single-cell sequencing, genetic manipulation and stem cell culture now support descriptive and functional studies in defined cell types with a human or ape genetic background. In this Review, we describe how the sequencing of genomes from modern and archaic hominins, great apes and other primates is revealing human-specific genetic changes and how new molecular and cellular approaches - including cell atlases and organoids - are enabling exploration of the candidate causal factors that underlie human-specific traits.
Collapse
Affiliation(s)
- Alex A Pollen
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA.
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA.
| | - Umut Kilik
- Institute of Human Biology (IHB), Roche Pharma Research and Early Development, Roche Innovation Center Basel, Basel, Switzerland
- University of Basel, Basel, Switzerland
| | - Craig B Lowe
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC, USA.
| | - J Gray Camp
- Institute of Human Biology (IHB), Roche Pharma Research and Early Development, Roche Innovation Center Basel, Basel, Switzerland.
- University of Basel, Basel, Switzerland.
| |
Collapse
|
14
|
Budassi J, Cho N, Del Valle A, Sokolov J. Microfluidic delivery of cutting enzymes for fragmentation of surface-adsorbed DNA molecules. PLoS One 2023; 18:e0250054. [PMID: 37672538 PMCID: PMC10482287 DOI: 10.1371/journal.pone.0250054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 07/24/2023] [Indexed: 09/08/2023] Open
Abstract
We describe a method for fragmenting, in-situ, surface-adsorbed and immobilized DNAs on polymethylmethacrylate(PMMA)-coated silicon substrates using microfluidic delivery of the cutting enzyme DNase I. Soft lithography is used to produce silicone elastomer (Sylgard 184) gratings which form microfluidic channels for delivery of the enzyme. Bovine serum albumin (BSA) is used to reduce DNase I adsorption to the walls of the microchannels and enable diffusion of the cutting enzyme to a distance of 10mm. Due to the DNAs being immobilized, the fragment order is maintained on the surface. Possible methods of preserving the order for application to sequencing are discussed.
Collapse
Affiliation(s)
- Julia Budassi
- Department of Materials Science and Chemical Engineering, Stony Brook University, Stony Brook, New York, United States of America
| | - NaHyun Cho
- Department of Materials Science and Chemical Engineering, Stony Brook University, Stony Brook, New York, United States of America
| | - Anthony Del Valle
- Department of Physics and Astronomy, Stony Brook University, Stony Brook, New York, United States of America
| | - Jonathan Sokolov
- Department of Materials Science and Chemical Engineering, Stony Brook University, Stony Brook, New York, United States of America
| |
Collapse
|
15
|
Rosenbaum S, Kuzawa CW. The promise of great apes as model organisms for understanding the downstream consequences of early life experiences. Neurosci Biobehav Rev 2023; 152:105240. [PMID: 37211151 DOI: 10.1016/j.neubiorev.2023.105240] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 05/08/2023] [Accepted: 05/10/2023] [Indexed: 05/23/2023]
Abstract
Early life experiences have a significant influence on adult health and aging processes in humans. Despite widespread interest in the evolutionary roots of this phenomenon, very little research on this topic has been conducted in humans' closest living relatives, the great apes. The longitudinal data sets that are now available on wild and captive great ape populations hold great promise to clarify the nature, evolutionary function, and mechanisms underlying these connections in species which share key human life history characteristics. Here, we explain features of great ape life history and socioecologies that make them of particular interest for this topic, as well as those that may limit their utility as comparative models; outline the ways in which available data are complementary to and extend the kinds of data that are available for humans; and review what is currently known about the connections among early life experiences, social behavior, and adult physiology and biological fitness in our closest living relatives. We conclude by highlighting key next steps for this emerging area of research.
Collapse
Affiliation(s)
| | - Christopher W Kuzawa
- Department of Anthropology, Northwestern University, USA; Institute for Policy Research, Northwestern University, USA
| |
Collapse
|
16
|
Zhou B, He Y, Chen Y, Su B. Comparative Genomic Analysis Identifies Great-Ape-Specific Structural Variants and Their Evolutionary Relevance. Mol Biol Evol 2023; 40:msad184. [PMID: 37565562 PMCID: PMC10461412 DOI: 10.1093/molbev/msad184] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 08/01/2023] [Accepted: 08/10/2023] [Indexed: 08/12/2023] Open
Abstract
During the origin of great apes about 14 million years ago, a series of phenotypic innovations emerged, such as the increased body size, the enlarged brain volume, the improved cognitive skill, and the diversified diet. Yet, the genomic basis of these evolutionary changes remains unclear. Utilizing the high-quality genome assemblies of great apes (including human), gibbon, and macaque, we conducted comparative genome analyses and identified 15,885 great ape-specific structural variants (GSSVs), including eight coding GSSVs resulting in the creation of novel proteins (e.g., ACAN and CMYA5). Functional annotations of the GSSV-related genes revealed the enrichment of genes involved in development and morphogenesis, especially neurogenesis and neural network formation, suggesting the potential role of GSSVs in shaping the great ape-shared traits. Further dissection of the brain-related GSSVs shows great ape-specific changes of enhancer activities and gene expression in the brain, involving a group of GSSV-regulated genes (such as NOL3) that potentially contribute to the altered brain development and function in great apes. The presented data highlight the evolutionary role of structural variants in the phenotypic innovations during the origin of the great ape lineage.
Collapse
Affiliation(s)
- Bin Zhou
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- National Resource Center for Non-Human Primates, Kunming Primate Research Center, and National Research Facility for Phenotypic & Genetic Analysis of Model Animals (Primate Facility), Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Beijing, China
| | - Yaoxi He
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- National Resource Center for Non-Human Primates, Kunming Primate Research Center, and National Research Facility for Phenotypic & Genetic Analysis of Model Animals (Primate Facility), Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Yongjie Chen
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- National Resource Center for Non-Human Primates, Kunming Primate Research Center, and National Research Facility for Phenotypic & Genetic Analysis of Model Animals (Primate Facility), Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, Beijing, China
| | - Bing Su
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- National Resource Center for Non-Human Primates, Kunming Primate Research Center, and National Research Facility for Phenotypic & Genetic Analysis of Model Animals (Primate Facility), Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, Yunnan, China
| |
Collapse
|
17
|
Soto DC, Uribe-Salazar JM, Shew CJ, Sekar A, McGinty S, Dennis MY. Genomic structural variation: A complex but important driver of human evolution. AMERICAN JOURNAL OF BIOLOGICAL ANTHROPOLOGY 2023; 181 Suppl 76:118-144. [PMID: 36794631 PMCID: PMC10329998 DOI: 10.1002/ajpa.24713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Revised: 01/21/2023] [Accepted: 02/05/2023] [Indexed: 02/17/2023]
Abstract
Structural variants (SVs)-including duplications, deletions, and inversions of DNA-can have significant genomic and functional impacts but are technically difficult to identify and assay compared with single-nucleotide variants. With the aid of new genomic technologies, it has become clear that SVs account for significant differences across and within species. This phenomenon is particularly well-documented for humans and other primates due to the wealth of sequence data available. In great apes, SVs affect a larger number of nucleotides than single-nucleotide variants, with many identified SVs exhibiting population and species specificity. In this review, we highlight the importance of SVs in human evolution by (1) how they have shaped great ape genomes resulting in sensitized regions associated with traits and diseases, (2) their impact on gene functions and regulation, which subsequently has played a role in natural selection, and (3) the role of gene duplications in human brain evolution. We further discuss how to incorporate SVs in research, including the strengths and limitations of various genomic approaches. Finally, we propose future considerations in integrating existing data and biospecimens with the ever-expanding SV compendium propelled by biotechnology advancements.
Collapse
Affiliation(s)
- Daniela C. Soto
- Genome Center, MIND Institute, and Department of Biochemistry & Molecular Medicine, University of California, Davis, CA, USA
- Integrative Genetics and Genomics Graduate Group, University of California, Davis, CA, USA
| | - José M. Uribe-Salazar
- Genome Center, MIND Institute, and Department of Biochemistry & Molecular Medicine, University of California, Davis, CA, USA
- Integrative Genetics and Genomics Graduate Group, University of California, Davis, CA, USA
| | - Colin J. Shew
- Genome Center, MIND Institute, and Department of Biochemistry & Molecular Medicine, University of California, Davis, CA, USA
- Integrative Genetics and Genomics Graduate Group, University of California, Davis, CA, USA
| | - Aarthi Sekar
- Genome Center, MIND Institute, and Department of Biochemistry & Molecular Medicine, University of California, Davis, CA, USA
- Integrative Genetics and Genomics Graduate Group, University of California, Davis, CA, USA
| | - Sean McGinty
- Genome Center, MIND Institute, and Department of Biochemistry & Molecular Medicine, University of California, Davis, CA, USA
- Integrative Genetics and Genomics Graduate Group, University of California, Davis, CA, USA
| | - Megan Y. Dennis
- Genome Center, MIND Institute, and Department of Biochemistry & Molecular Medicine, University of California, Davis, CA, USA
- Integrative Genetics and Genomics Graduate Group, University of California, Davis, CA, USA
| |
Collapse
|
18
|
Shen F, Qin Y, Wang R, Huang X, Wang Y, Gao T, He J, Zhou Y, Jiao Y, Wei J, Li L, Yang X. Comparative genomics reveals a unique nitrogen-carbon balance system in Asteraceae. Nat Commun 2023; 14:4334. [PMID: 37474573 PMCID: PMC10359422 DOI: 10.1038/s41467-023-40002-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 07/07/2023] [Indexed: 07/22/2023] Open
Abstract
The Asteraceae (daisy family) is one of the largest families of plants. The genetic basis for its high biodiversity and excellent adaptability has not been elucidated. Here, we compare the genomes of 29 terrestrial plant species, including two de novo chromosome-scale genome assemblies for stem lettuce, a member of Asteraceae, and Scaevola taccada, a member of Goodeniaceae that is one of the closest outgroups of Asteraceae. We show that Asteraceae originated ~80 million years ago and experienced repeated paleopolyploidization. PII, the universal regulator of nitrogen-carbon (N-C) assimilation present in almost all domains of life, has conspicuously lost across Asteraceae. Meanwhile, Asteraceae has stepwise upgraded the N-C balance system via paleopolyploidization and tandem duplications of key metabolic genes, resulting in enhanced nitrogen uptake and fatty acid biosynthesis. In addition to suggesting a molecular basis for their ecological success, the unique N-C balance system reported for Asteraceae offers a potential crop improvement strategy.
Collapse
Affiliation(s)
- Fei Shen
- Beijing Key Laboratory of Agricultural Genetic Resources and Biotechnology, Institute of Biotechnology, Beijing Academy of Agriculture and Forestry Sciences, 100097, Beijing, China
| | - Yajuan Qin
- Beijing Key Laboratory of Agricultural Genetic Resources and Biotechnology, Institute of Biotechnology, Beijing Academy of Agriculture and Forestry Sciences, 100097, Beijing, China
| | - Rui Wang
- Beijing Key Laboratory of Development and Quality Control of Ornamental Crops, College of Horticulture, China Agricultural University, 100193, Beijing, China
| | - Xin Huang
- Beijing Key Laboratory of Agricultural Genetic Resources and Biotechnology, Institute of Biotechnology, Beijing Academy of Agriculture and Forestry Sciences, 100097, Beijing, China
| | - Ying Wang
- Beijing Key Laboratory of Agricultural Genetic Resources and Biotechnology, Institute of Biotechnology, Beijing Academy of Agriculture and Forestry Sciences, 100097, Beijing, China
- State Key Laboratory of Protein and Plant Gene Research, School of Advanced Agricultural Sciences, Peking University, 100871, Beijing, China
| | - Tiangang Gao
- State Key Laboratory of Evolutionary and Systematic Botany, Institute of Botany, the Chinese Academy of Sciences, 100093, Beijing, China
| | - Junna He
- Beijing Key Laboratory of Development and Quality Control of Ornamental Crops, College of Horticulture, China Agricultural University, 100193, Beijing, China
| | - Yue Zhou
- State Key Laboratory of Protein and Plant Gene Research, School of Advanced Agricultural Sciences, Peking University, 100871, Beijing, China
| | - Yuannian Jiao
- State Key Laboratory of Evolutionary and Systematic Botany, Institute of Botany, the Chinese Academy of Sciences, 100093, Beijing, China
| | - Jianhua Wei
- Beijing Key Laboratory of Agricultural Genetic Resources and Biotechnology, Institute of Biotechnology, Beijing Academy of Agriculture and Forestry Sciences, 100097, Beijing, China.
| | - Lei Li
- State Key Laboratory of Protein and Plant Gene Research, School of Advanced Agricultural Sciences, Peking University, 100871, Beijing, China.
| | - Xiaozeng Yang
- Beijing Key Laboratory of Agricultural Genetic Resources and Biotechnology, Institute of Biotechnology, Beijing Academy of Agriculture and Forestry Sciences, 100097, Beijing, China.
| |
Collapse
|
19
|
Hao K, Yang M, Cui Y, Jiao Z, Gao X, Du Z, Wang Z, An M, Xia Z, Wu Y. Transcriptomic and Functional Analyses Reveal the Different Roles of Vitamins C, E, and K in Regulating Viral Infections in Maize. Int J Mol Sci 2023; 24:ijms24098012. [PMID: 37175719 PMCID: PMC10178231 DOI: 10.3390/ijms24098012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 04/25/2023] [Accepted: 04/26/2023] [Indexed: 05/15/2023] Open
Abstract
Maize lethal necrosis (MLN), one of the most important maize viral diseases, is caused by maize chlorotic mottle virus (MCMV) infection in combination with a potyvirid, such as sugarcane mosaic virus (SCMV). However, the resistance mechanism of maize to MLN remains largely unknown. In this study, we obtained isoform expression profiles of maize after SCMV and MCMV single and synergistic infection (S + M) via comparative analysis of SMRT- and Illumina-based RNA sequencing. A total of 15,508, 7567, and 2378 differentially expressed isoforms (DEIs) were identified in S + M, MCMV, and SCMV libraries, which were primarily involved in photosynthesis, reactive oxygen species (ROS) scavenging, and some pathways related to disease resistance. The results of virus-induced gene silencing (VIGS) assays revealed that silencing of a vitamin C biosynthesis-related gene, ZmGalDH or ZmAPX1, promoted viral infections, while silencing ZmTAT or ZmNQO1, the gene involved in vitamin E or K biosynthesis, inhibited MCMV and S + M infections, likely by regulating the expressions of pathogenesis-related (PR) genes. Moreover, the relationship between viral infections and expression of the above four genes in ten maize inbred lines was determined. We further demonstrated that the exogenous application of vitamin C could effectively suppress viral infections, while vitamins E and K promoted MCMV infection. These findings provide novel insights into the gene regulatory networks of maize in response to MLN, and the roles of vitamins C, E, and K in conditioning viral infections in maize.
Collapse
Affiliation(s)
- Kaiqiang Hao
- College of Plant Protection, Shenyang Agricultural University, Shenyang 110866, China
| | - Miaoren Yang
- College of Plant Protection, Shenyang Agricultural University, Shenyang 110866, China
| | - Yakun Cui
- Institute of Food Crops, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| | - Zhiyuan Jiao
- State Kay Laboratory of Agrobiotechnology and Key Laboratory of Pest Monitoring and Green Management-MOA, Department of Plant Pathology, China Agricultural University, Beijing 100193, China
| | - Xinran Gao
- College of Plant Protection, Shenyang Agricultural University, Shenyang 110866, China
| | - Zhichao Du
- College of Plant Protection, Shenyang Agricultural University, Shenyang 110866, China
| | - Zhiping Wang
- College of Plant Protection, Shenyang Agricultural University, Shenyang 110866, China
| | - Mengnan An
- College of Plant Protection, Shenyang Agricultural University, Shenyang 110866, China
| | - Zihao Xia
- College of Plant Protection, Shenyang Agricultural University, Shenyang 110866, China
| | - Yuanhua Wu
- College of Plant Protection, Shenyang Agricultural University, Shenyang 110866, China
| |
Collapse
|
20
|
Yu S, Liu Z, Li M, Zhou D, Hua P, Cheng H, Fan W, Xu Y, Liu D, Liang S, Zhang Y, Xie M, Tang J, Jiang Y, Hou S, Zhou Z. Resequencing of a Pekin duck breeding population provides insights into the genomic response to short-term artificial selection. Gigascience 2023; 12:giad016. [PMID: 36971291 PMCID: PMC10041536 DOI: 10.1093/gigascience/giad016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 02/04/2023] [Accepted: 02/27/2023] [Indexed: 03/29/2023] Open
Abstract
BACKGROUND Short-term, intense artificial selection drives fast phenotypic changes in domestic animals and leaves imprints on their genomes. However, the genetic basis of this selection response is poorly understood. To better address this, we employed the Pekin duck Z2 pure line, in which the breast muscle weight was increased nearly 3-fold after 10 generations of breeding. We denovo assembled a high-quality reference genome of a female Pekin duck of this line (GCA_003850225.1) and identified 8.60 million genetic variants in 119 individuals among 10 generations of the breeding population. RESULTS We identified 53 selected regions between the first and tenth generations, and 93.8% of the identified variations were enriched in regulatory and noncoding regions. Integrating the selection signatures and genome-wide association approach, we found that 2 regions covering 0.36 Mb containing UTP25 and FBRSL1 were most likely to contribute to breast muscle weight improvement. The major allele frequencies of these 2 loci increased gradually with each generation following the same trend. Additionally, we found that a copy number variation region containing the entire EXOC4 gene could explain 1.9% of the variance in breast muscle weight, indicating that the nervous system may play a role in economic trait improvement. CONCLUSIONS Our study not only provides insights into genomic dynamics under intense artificial selection but also provides resources for genomics-enabled improvements in duck breeding.
Collapse
Affiliation(s)
- Simeng Yu
- State Key Laboratory of Animal Nutrition; Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs; Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Zihua Liu
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Ming Li
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Dongke Zhou
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Ping Hua
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Hong Cheng
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Wenlei Fan
- State Key Laboratory of Animal Nutrition; Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs; Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Yaxi Xu
- State Key Laboratory of Animal Nutrition; Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs; Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Dapeng Liu
- State Key Laboratory of Animal Nutrition; Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs; Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Suyun Liang
- State Key Laboratory of Animal Nutrition; Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs; Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Yunsheng Zhang
- State Key Laboratory of Animal Nutrition; Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs; Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Ming Xie
- State Key Laboratory of Animal Nutrition; Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs; Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Jing Tang
- State Key Laboratory of Animal Nutrition; Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs; Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Yu Jiang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Shuisheng Hou
- State Key Laboratory of Animal Nutrition; Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs; Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Zhengkui Zhou
- State Key Laboratory of Animal Nutrition; Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture and Rural Affairs; Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| |
Collapse
|
21
|
Mao Y, Harvey WT, Porubsky D, Munson KM, Hoekzema K, Lewis AP, Audano PA, Rozanski A, Yang X, Zhang S, Gordon DS, Wei X, Logsdon GA, Haukness M, Dishuck PC, Jeong H, Del Rosario R, Bauer VL, Fattor WT, Wilkerson GK, Lu Q, Paten B, Feng G, Sawyer SL, Warren WC, Carbone L, Eichler EE. Structurally divergent and recurrently mutated regions of primate genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.07.531415. [PMID: 36945442 PMCID: PMC10028934 DOI: 10.1101/2023.03.07.531415] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/10/2023]
Abstract
To better understand the pattern of primate genome structural variation, we sequenced and assembled using multiple long-read sequencing technologies the genomes of eight nonhuman primate species, including New World monkeys (owl monkey and marmoset), Old World monkey (macaque), Asian apes (orangutan and gibbon), and African ape lineages (gorilla, bonobo, and chimpanzee). Compared to the human genome, we identified 1,338,997 lineage-specific fixed structural variants (SVs) disrupting 1,561 protein-coding genes and 136,932 regulatory elements, including the most complete set of human-specific fixed differences. Across 50 million years of primate evolution, we estimate that 819.47 Mbp or ~27% of the genome has been affected by SVs based on analysis of these primate lineages. We identify 1,607 structurally divergent regions (SDRs) wherein recurrent structural variation contributes to creating SV hotspots where genes are recurrently lost (CARDs, ABCD7, OLAH) and new lineage-specific genes are generated (e.g., CKAP2, NEK5) and have become targets of rapid chromosomal diversification and positive selection (e.g., RGPDs). High-fidelity long-read sequencing has made these dynamic regions of the genome accessible for sequence-level analyses within and between primate species for the first time.
Collapse
Affiliation(s)
- Yafei Mao
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Allison Rozanski
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Xiangyu Yang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Shilong Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - David S Gordon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Xiaoxi Wei
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Philip C Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Hyeonsoo Jeong
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ricardo Del Rosario
- McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Vanessa L Bauer
- BioFrontiers Institute, Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder, CO, USA
| | - Will T Fattor
- BioFrontiers Institute, Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder, CO, USA
| | - Gregory K Wilkerson
- Department of Veterinary Sciences, Michale E. Keeling Center for Comparative Medicine and Research, The University of Texas MD Anderson Cancer Center, Bastrop, TX, USA
- Department of Clinical Sciences, North Carolina State University, Raleigh, NC, USA
| | - Qing Lu
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Guoping Feng
- McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Sara L Sawyer
- BioFrontiers Institute, Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder, CO, USA
| | - Wesley C Warren
- Department of Animal Sciences, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
- Department of Surgery, School of Medicine, University of Missouri, Columbia, MO, USA
- Institute of Data Science and Informatics, University of Missouri, Columbia, MO, USA
| | - Lucia Carbone
- Department of Medicine, Knight Cardiovascular Institute, Oregon Health and Science University, Portland, OR, USA
- Division of Genetics, Oregon National Primate Research Center, Beaverton, OR, USA
- Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, USA
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
22
|
Wang Y, Cai X, Hu S, Qin S, Wang Z, Cao Y, Hou C, Yang J, Zhou W. Comparative genomic analysis provides insight into the phylogeny and potential mechanisms of adaptive evolution of Sphingobacterium sp. CZ-2. Gene 2023; 855:147118. [PMID: 36521669 DOI: 10.1016/j.gene.2022.147118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 11/21/2022] [Accepted: 12/09/2022] [Indexed: 12/14/2022]
Abstract
Sphingobacterium is a class of Gram-negative, non-fermentative bacilli that have received widespread attention due to their broad ecological distribution and oil degradation ability, but are rarely involved in infections. In this manuscript, a novel Sphingobacterium strain isolated from wildfire-infected tobacco leaves was named Sphingobacterium sp. CZ-2. NGS and TGS sequencing results showed a whole genome of 3.92 Mb with 40.68 mol% GC content and containing 3,462 protein-coding genes, 9 rRNA-coding genes and 50 tRNA-coding genes. Phylogenetic analysis, ANI and dDDH calculations all supported that Sphingobacterium sp. CZ-2 represented a novel species of the genus Sphingobacterium. Analysis of the specific genes of Sphingobacterium sp. CZ-2 by comparative genomics revealed that metal transport proteins encoded by the troD and cusA genes could maintain the balance of heavy metal ion concentrations in the internal environment of bacteria and avoid heavy metal toxicity while meeting the needs of growth and reproduction, and transport proteins encoded by the malG gene could keep nutrients required for the survival of bacteria. Synteny and genome evolutionary analyses of Sphingobacterium strains implicated that the gene family contraction as a major process in genome evolution, with insertional sequences leading to mutations, deletions and reversals of genes that help bacteria to withstand complex environmental changes. Complete genome sequencing and systematic comparative genomic analysis will contribute new insights into the adaptive evolution of this novel species and the genus Sphingobacterium.
Collapse
Affiliation(s)
- Yongqiang Wang
- Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
| | - Xunhui Cai
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Shengnan Hu
- Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
| | - Sidong Qin
- Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
| | - Ziqi Wang
- Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
| | - Yixiang Cao
- Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
| | - Chaoliang Hou
- Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
| | - Jiangshan Yang
- Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
| | - Wei Zhou
- Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China.
| |
Collapse
|
23
|
García-Campa L, Valledor L, Pascual J. The Integration of Data from Different Long-Read Sequencing Platforms Enhances Proteoform Characterization in Arabidopsis. PLANTS (BASEL, SWITZERLAND) 2023; 12:511. [PMID: 36771596 PMCID: PMC9920879 DOI: 10.3390/plants12030511] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 01/13/2023] [Accepted: 01/14/2023] [Indexed: 06/18/2023]
Abstract
The increasing availability of massive omics data requires improving the quality of reference databases and their annotations. The combination of full-length isoform sequencing (Iso-Seq) with short-read transcriptomics and proteomics has been successfully used for increasing proteoform characterization, which is a main ongoing goal in biology. However, the potential of including Oxford Nanopore Technologies Direct RNA Sequencing (ONT-DRS) data has not been explored. In this paper, we analyzed the impact of combining Iso-Seq- and ONT-DRS-derived data on the identification of proteoforms in Arabidopsis MS proteomics data. To this end, we selected a proteomics dataset corresponding to senescent leaves and we performed protein searches using three different protein databases: AtRTD2 and AtRTD3, built from the homonymous transcriptomes, regarded as the most complete and up-to-date available for the species; and a custom hybrid database combining AtRTD3 with publicly available ONT-DRS transcriptomics data generated from Arabidopsis leaves. Our results show that the inclusion and combination of long-read sequencing data from Iso-Seq and ONT-DRS into a proteogenomic workflow enhances proteoform characterization and discovery in bottom-up proteomics studies. This represents a great opportunity to further investigate biological systems at an unprecedented scale, although it brings challenges to current protein searching algorithms.
Collapse
Affiliation(s)
- Lara García-Campa
- Plant Physiology, Department of Organisms and Systems Biology, University of Oviedo, 33003 Oviedo, Spain
- University Institute of Biotechnology of Asturias, University of Oviedo, 33003 Oviedo, Spain
| | - Luis Valledor
- Plant Physiology, Department of Organisms and Systems Biology, University of Oviedo, 33003 Oviedo, Spain
- University Institute of Biotechnology of Asturias, University of Oviedo, 33003 Oviedo, Spain
| | - Jesús Pascual
- Plant Physiology, Department of Organisms and Systems Biology, University of Oviedo, 33003 Oviedo, Spain
- University Institute of Biotechnology of Asturias, University of Oviedo, 33003 Oviedo, Spain
| |
Collapse
|
24
|
Zhou T, Lu L, Li C. Optimization of the " in-silico" mate-pair method improves contiguity and accuracy of genome assembly. Ecol Evol 2023; 13:e9745. [PMID: 36644701 PMCID: PMC9833964 DOI: 10.1002/ece3.9745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2022] [Revised: 12/30/2022] [Accepted: 12/30/2022] [Indexed: 01/13/2023] Open
Abstract
A combination of short-insert paired-ended and mate-pair libraries of large insert sizes is used as a standard method to generate genome assemblies with high contiguity. The third-generation sequencing techniques also are used to improve the quality of assembled genomes. However, both mate-pair libraries and the third-generation libraries require high-molecular-weight DNA, making the use of these libraries inappropriate for samples with only degraded DNA. An in silico method that generates mate-pair libraries using a reference genome was devised for the task of assembling target genomes. Although the contiguity and completeness of assembled genomes were significantly improved by this method, a high level of errors manifested in the assembly, further to which the methods for using reference genomes, was not optimized. Here, we tested different strategies for using reference genomes to generate in silico mate-pairs. The results showed that using a closely related reference genome from the same genus was more effective than using divergent references. Conservation of in silico mate-pairs by comparing two references and using those to guide genome assembly reduced the number of misassemblies (18.6%-46.1%) and increased the contiguity of assembled genomes (9.7%-70.7%), while maintaining gene completeness at a level that was either similar or marginally lower than that obtained via the current method. Finally, we developed a pipeline of the optimized in silico method and compared it with another reference-guided assembler, RagTag. We found that RagTag produced longer scaffolds (17.8 Mbp vs 3.0 Mbp), but resulted in a much higher misassembly rate (85.68%) than our optimized in silico mate-pair method. This optimized in silico pipeline developed in this study should facilitate further studies on genomics, population genetics, and conservation of endangered species.
Collapse
Affiliation(s)
- Tao Zhou
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and EvolutionShanghai Ocean UniversityShanghaiChina
- Shanghai Collaborative Innovation for Aquatic Animal Genetics and BreedingShanghai Ocean UniversityShanghaiChina
| | - Liang Lu
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and EvolutionShanghai Ocean UniversityShanghaiChina
- Shanghai Collaborative Innovation for Aquatic Animal Genetics and BreedingShanghai Ocean UniversityShanghaiChina
| | - Chenhong Li
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and EvolutionShanghai Ocean UniversityShanghaiChina
- Shanghai Collaborative Innovation for Aquatic Animal Genetics and BreedingShanghai Ocean UniversityShanghaiChina
| |
Collapse
|
25
|
Sim M, Lee J, Kwon D, Lee D, Park N, Wy S, Ko Y, Kim J. Reference-based read clustering improves the de novo genome assembly of microbial strains. Comput Struct Biotechnol J 2022; 21:444-451. [PMID: 36618978 PMCID: PMC9804104 DOI: 10.1016/j.csbj.2022.12.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 12/17/2022] [Accepted: 12/19/2022] [Indexed: 12/24/2022] Open
Abstract
Constructing accurate microbial genome assemblies is necessary to understand genetic diversity in microbial genomes and its functional consequences. However, it still remains as a challenging task especially when only short-read sequencing technologies are used. Here, we present a new read-clustering algorithm, called RBRC, for improving de novo microbial genome assembly, by accurately estimating read proximity using multiple reference genomes. The performance of RBRC was confirmed by simulation-based evaluation in terms of assembly contiguity and the number of misassemblies, and was successfully applied to existing fungal and bacterial genomes by improving the quality of the assemblies without using additional sequencing data. RBRC is a very useful read-clustering algorithm that can be used (i) for generating high-quality genome assemblies of microbial strains when genome assemblies of related strains are available, and (ii) for upgrading existing microbial genome assemblies when the generation of additional sequencing data, such as long reads, is difficult.
Collapse
Affiliation(s)
- Mikang Sim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Jongin Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Daehong Kwon
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Daehwan Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Nayoung Park
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Suyeon Wy
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Younhee Ko
- Division of Biomedical Engineering, Hankuk University of Foreign Studies, Gyeonggi-do 17035, Republic of Korea
| | - Jaebum Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea,Corresponding author.
| |
Collapse
|
26
|
Vervoort L, Vermeesch JR. The 22q11.2 Low Copy Repeats. Genes (Basel) 2022; 13:2101. [PMID: 36421776 PMCID: PMC9690962 DOI: 10.3390/genes13112101] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 10/19/2022] [Accepted: 10/25/2022] [Indexed: 07/22/2023] Open
Abstract
LCR22s are among the most complex loci in the human genome and are susceptible to nonallelic homologous recombination. This can lead to a variety of genomic disorders, including deletions, duplications, and translocations, of which the 22q11.2 deletion syndrome is the most common in humans. Interrogating these phenomena is difficult due to the high complexity of the LCR22s and the inaccurate representation of the LCRs across different reference genomes. Optical mapping techniques, which provide long-range chromosomal maps, could be used to unravel the complex duplicon structure. These techniques have already uncovered the hypervariability of the LCR22-A haplotype in the human population. Although optical LCR22 mapping is a major step forward, long-read sequencing approaches will be essential to reach nucleotide resolution of the LCR22s and map the crossover sites. Accurate maps and sequences are needed to pinpoint potential predisposing alleles and, most importantly, allow for genotype-phenotype studies exploring the role of the LCR22s in health and disease. In addition, this research might provide a paradigm for the study of other rare genomic disorders.
Collapse
|
27
|
Toh H, Yang C, Formenti G, Raja K, Yan L, Tracey A, Chow W, Howe K, Bergeron LA, Zhang G, Haase B, Mountcastle J, Fedrigo O, Fogg J, Kirilenko B, Munegowda C, Hiller M, Jain A, Kihara D, Rhie A, Phillippy AM, Swanson SA, Jiang P, Clegg DO, Jarvis ED, Thomson JA, Stewart R, Chaisson MJP, Bukhman YV. A haplotype-resolved genome assembly of the Nile rat facilitates exploration of the genetic basis of diabetes. BMC Biol 2022; 20:245. [DOI: 10.1186/s12915-022-01427-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 09/29/2022] [Indexed: 11/09/2022] Open
Abstract
Abstract
Background
The Nile rat (Avicanthis niloticus) is an important animal model because of its robust diurnal rhythm, a cone-rich retina, and a propensity to develop diet-induced diabetes without chemical or genetic modifications. A closer similarity to humans in these aspects, compared to the widely used Mus musculus and Rattus norvegicus models, holds the promise of better translation of research findings to the clinic.
Results
We report a 2.5 Gb, chromosome-level reference genome assembly with fully resolved parental haplotypes, generated with the Vertebrate Genomes Project (VGP). The assembly is highly contiguous, with contig N50 of 11.1 Mb, scaffold N50 of 83 Mb, and 95.2% of the sequence assigned to chromosomes. We used a novel workflow to identify 3613 segmental duplications and quantify duplicated genes. Comparative analyses revealed unique genomic features of the Nile rat, including some that affect genes associated with type 2 diabetes and metabolic dysfunctions. We discuss 14 genes that are heterozygous in the Nile rat or highly diverged from the house mouse.
Conclusions
Our findings reflect the exceptional level of genomic resolution present in this assembly, which will greatly expand the potential of the Nile rat as a model organism.
Collapse
|
28
|
Peel E, Silver L, Brandies P, Zhu Y, Cheng Y, Hogg CJ, Belov K. Best genome sequencing strategies for annotation of complex immune gene families in wildlife. Gigascience 2022; 11:giac100. [PMID: 36310247 PMCID: PMC9618407 DOI: 10.1093/gigascience/giac100] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 08/10/2022] [Accepted: 09/29/2022] [Indexed: 11/04/2022] Open
Abstract
BACKGROUND The biodiversity crisis and increasing impact of wildlife disease on animal and human health provides impetus for studying immune genes in wildlife. Despite the recent boom in genomes for wildlife species, immune genes are poorly annotated in nonmodel species owing to their high level of polymorphism and complex genomic organisation. Our research over the past decade and a half on Tasmanian devils and koalas highlights the importance of genomics and accurate immune annotations to investigate disease in wildlife. Given this, we have increasingly been asked the minimum levels of genome quality required to effectively annotate immune genes in order to study immunogenetic diversity. Here we set out to answer this question by manually annotating immune genes in 5 marsupial genomes and 1 monotreme genome to determine the impact of sequencing data type, assembly quality, and automated annotation on accurate immune annotation. RESULTS Genome quality is directly linked to our ability to annotate complex immune gene families, with long reads and scaffolding technologies required to reassemble immune gene clusters and elucidate evolution, organisation, and true gene content of the immune repertoire. Draft-quality genomes generated from short reads with HiC or 10× Chromium linked reads were unable to achieve this. Despite mammalian BUSCOv5 scores of up to 94.1% amongst the 6 genomes, automated annotation pipelines incorrectly annotated up to 59% of manually annotated immune genes regardless of assembly quality or method of automated annotation. CONCLUSIONS Our results demonstrate that long reads and scaffolding technologies, alongside manual annotation, are required to accurately study the immune gene repertoire of wildlife species.
Collapse
Affiliation(s)
- Emma Peel
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
- Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, University of Sydney, Sydney NSW 2006, Australia
| | - Luke Silver
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Parice Brandies
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Ying Zhu
- Sichuan Provincial Academy of Natural Resource Sciences, Chengdu, Sichuan 610000, China
| | - Yuanyuan Cheng
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Carolyn J Hogg
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
- Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, University of Sydney, Sydney NSW 2006, Australia
| | - Katherine Belov
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
- Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, University of Sydney, Sydney NSW 2006, Australia
| |
Collapse
|
29
|
Liu Y, Fu Y, Yang Y, Yi G, Lian J, Xie B, Yao Y, Chen M, Niu Y, Liu L, Wang L, Zhang Y, Fan X, Tang Y, Yuan P, Zhu M, Li Q, Zhang S, Chen Y, Wang B, He J, Lu D, Liachko I, Sullivan ST, Pang B, Chen Y, He X, Li K, Tang Z. Integration of multi-omics data reveals cis-regulatory variants that are associated with phenotypic differentiation of eastern from western pigs. GENETICS SELECTION EVOLUTION 2022; 54:62. [PMID: 36104777 PMCID: PMC9476355 DOI: 10.1186/s12711-022-00754-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 09/02/2022] [Indexed: 11/10/2022]
Abstract
Abstract
Background
The genetic mechanisms that underlie phenotypic differentiation in breeding animals have important implications in evolutionary biology and agriculture. However, the contribution of cis-regulatory variants to pig phenotypes is poorly understood. Therefore, our aim was to elucidate the molecular mechanisms by which non-coding variants cause phenotypic differences in pigs by combining evolutionary biology analyses and functional genomics.
Results
We obtained a high-resolution phased chromosome-scale reference genome with a contig N50 of 18.03 Mb for the Luchuan pig breed (a representative eastern breed) and profiled potential selective sweeps in eastern and western pigs by resequencing the genomes of 234 pigs. Multi-tissue transcriptome and chromatin accessibility analyses of these regions suggest that tissue-specific selection pressure is mediated by promoters and distal cis-regulatory elements. Promoter variants that are associated with increased expression of the lysozyme (LYZ) gene in the small intestine might enhance the immunity of the gastrointestinal tract and roughage tolerance in pigs. In skeletal muscle, an enhancer-modulating single-nucleotide polymorphism that is associated with up-regulation of the expression of the troponin C1, slow skeletal and cardiac type (TNNC1) gene might increase the proportion of slow muscle fibers and affect meat quality.
Conclusions
Our work sheds light on the molecular mechanisms by which non-coding variants shape phenotypic differences in pigs and provides valuable resources and novel perspectives to dissect the role of gene regulatory evolution in animal domestication and breeding.
Collapse
|
30
|
Winter S, Coimbra RTF, Helsen P, Janke A. A chromosome-scale genome assembly of the okapi (Okapia johnstoni). J Hered 2022; 113:568-576. [PMID: 35788365 PMCID: PMC9584810 DOI: 10.1093/jhered/esac033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 06/30/2022] [Indexed: 12/05/2022] Open
Abstract
The okapi (Okapia johnstoni), or forest giraffe, is the only species in its genus and the only extant sister group of the giraffe within the family Giraffidae. The species is one of the remaining large vertebrates surrounded by mystery because of its elusive behavior as well as the armed conflicts in the region where it occurs, making it difficult to study. Deforestation puts the okapi under constant anthropogenic pressure, and it is currently listed as “Endangered” on the IUCN Red List. Here, we present the first annotated de novo okapi genome assembly based on PacBio continuous long reads, polished with short reads, and anchored into chromosome-scale scaffolds using Hi-C proximity ligation sequencing. The final assembly (TBG_Okapi_asm_v1) has a length of 2.39 Gbp, of which 98% are represented by 28 scaffolds > 3.9 Mbp. The contig N50 of 61 Mbp and scaffold N50 of 102 Mbp, together with a BUSCO score of 94.7%, and 23 412 annotated genes, underline the high quality of the assembly. This chromosome-scale genome assembly is a valuable resource for future conservation of the species and comparative genomic studies among the giraffids and other ruminants.
Collapse
Affiliation(s)
- Sven Winter
- Senckenberg Biodiversity and Climate Research Centre, Senckenberganlage, Frankfurt am Main, Germany.,Research Institute of Wildlife Ecology, Vetmeduni Vienna, Savoyenstraße, Vienna, Austria
| | - Raphael T F Coimbra
- Senckenberg Biodiversity and Climate Research Centre, Senckenberganlage, Frankfurt am Main, Germany.,Institute for Ecology, Evolution and Diversity, Goethe University, Max-von-Laue-Straße, Frankfurt am Main, Germany
| | - Philippe Helsen
- Centre for Research and Conservation, Royal Zoological Society of Antwerp, Koningin Astridplein, Antwerp, Belgium
| | - Axel Janke
- Senckenberg Biodiversity and Climate Research Centre, Senckenberganlage, Frankfurt am Main, Germany.,LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Senckenberganlage, Frankfurt am Main, Germany.,Institute for Ecology, Evolution and Diversity, Goethe University, Max-von-Laue-Straße, Frankfurt am Main, Germany
| |
Collapse
|
31
|
Bellott DW, Cho TJ, Jackson EK, Skaletsky H, Hughes JF, Page DC. SHIMS 3.0: Highly efficient single-haplotype iterative mapping and sequencing using ultra-long nanopore reads. PLoS One 2022; 17:e0269692. [PMID: 35700171 PMCID: PMC9197060 DOI: 10.1371/journal.pone.0269692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 05/25/2022] [Indexed: 11/18/2022] Open
Abstract
The reference sequence of structurally complex regions can only be obtained through a highly accurate clone-based approach that we call Single-Haplotype Iterative Mapping and Sequencing (SHIMS). In recent years, improvements to SHIMS have reduced the cost and time required by two orders of magnitude, but internally repetitive clones still require extensive manual effort to transform draft assemblies into reference-quality finished sequences. Here we describe SHIMS 3.0, using ultra-long nanopore reads to augment the Illumina data from SHIMS 2.0 assemblies and resolve internally repetitive structures. This greatly minimizes the need for manual finishing of Illumina-based draft assemblies, allowing a small team with no prior finishing experience to sequence challenging targets with high accuracy. This protocol proceeds from clone-picking to finished assemblies in 2 weeks for about $80 (USD) per clone. We recently used this protocol to produce reference sequence of structurally complex palindromes on chimpanzee and rhesus macaque X chromosomes. Our protocol provides access to structurally complex regions that would otherwise be inaccessible from whole-genome shotgun data or require an impractical amount of manual effort to generate an accurate assembly.
Collapse
Affiliation(s)
- Daniel W. Bellott
- Whitehead Institute, Cambridge, Massachusetts, United States of America
- * E-mail:
| | - Ting-Jan Cho
- Whitehead Institute, Cambridge, Massachusetts, United States of America
| | - Emily K. Jackson
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Helen Skaletsky
- Whitehead Institute, Cambridge, Massachusetts, United States of America
- Howard Hughes Medical Institute, Whitehead Institute, Cambridge, Massachusetts, United States of America
| | | | - David C. Page
- Whitehead Institute, Cambridge, Massachusetts, United States of America
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Howard Hughes Medical Institute, Whitehead Institute, Cambridge, Massachusetts, United States of America
| |
Collapse
|
32
|
Hoyt SJ, Storer JM, Hartley GA, Grady PGS, Gershman A, de Lima LG, Limouse C, Halabian R, Wojenski L, Rodriguez M, Altemose N, Rhie A, Core LJ, Gerton JL, Makalowski W, Olson D, Rosen J, Smit AFA, Straight AF, Vollger MR, Wheeler TJ, Schatz MC, Eichler EE, Phillippy AM, Timp W, Miga KH, O’Neill RJ. From telomere to telomere: The transcriptional and epigenetic state of human repeat elements. Science 2022; 376:eabk3112. [PMID: 35357925 PMCID: PMC9301658 DOI: 10.1126/science.abk3112] [Citation(s) in RCA: 194] [Impact Index Per Article: 64.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Mobile elements and repetitive genomic regions are sources of lineage-specific genomic innovation and uniquely fingerprint individual genomes. Comprehensive analyses of such repeat elements, including those found in more complex regions of the genome, require a complete, linear genome assembly. We present a de novo repeat discovery and annotation of the T2T-CHM13 human reference genome. We identified previously unknown satellite arrays, expanded the catalog of variants and families for repeats and mobile elements, characterized classes of complex composite repeats, and located retroelement transduction events. We detected nascent transcription and delineated CpG methylation profiles to define the structure of transcriptionally active retroelements in humans, including those in centromeres. These data expand our insight into the diversity, distribution, and evolution of repetitive regions that have shaped the human genome.
Collapse
Affiliation(s)
- Savannah J. Hoyt
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | | | - Gabrielle A. Hartley
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Patrick G. S. Grady
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Ariel Gershman
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
| | | | - Charles Limouse
- Department of Biochemistry, Stanford University, Stanford, CA, USA
| | - Reza Halabian
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| | - Luke Wojenski
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Matias Rodriguez
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| | - Nicolas Altemose
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Leighton J. Core
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
| | | | - Wojciech Makalowski
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| | - Daniel Olson
- Department of Computer Science, University of Montana, Missoula, MT, USA
| | - Jeb Rosen
- Institute for Systems Biology, Seattle, WA, USA
| | | | | | - Mitchell R. Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Travis J. Wheeler
- Department of Computer Science, University of Montana, Missoula, MT, USA
| | - Michael C. Schatz
- Department of Computer Science and Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Winston Timp
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Rachel J. O’Neill
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Genetics and Genome Sciences, UConn Health, Farmington, CT, USA
| |
Collapse
|
33
|
Deng Y, Qian Y, Meng M, Jiang H, Dong Y, Fang C, He S, Yang L. Extensive sequence divergence between the reference genomes of two zebrafish strains Tuebingen and AB. Mol Ecol Resour 2022; 22:2148-2157. [DOI: 10.1111/1755-0998.13602] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 01/14/2022] [Accepted: 02/15/2022] [Indexed: 11/28/2022]
Affiliation(s)
- Yu Deng
- State Key Laboratory of Freshwater Ecology and Biotechnology Institute of Hydrobiology Chinese Academy of Sciences Wuhan 430072 China
- Academy of Plateau Science and Sustainability Qinghai Normal University Xining 810016 P. R. China
- University of Chinese Academy of Sciences Beijing 100049 China
| | - Yuting Qian
- State Key Laboratory of Freshwater Ecology and Biotechnology Institute of Hydrobiology Chinese Academy of Sciences Wuhan 430072 China
- University of Chinese Academy of Sciences Beijing 100049 China
| | - Minghui Meng
- Diggers (Wuhan) Biotechnology Co., Ltd Wuhan 430070 China
| | - Haifeng Jiang
- State Key Laboratory of Freshwater Ecology and Biotechnology Institute of Hydrobiology Chinese Academy of Sciences Wuhan 430072 China
- University of Chinese Academy of Sciences Beijing 100049 China
| | - Yang Dong
- State Key Laboratory for Conservation and Utilization of Bio‐Resources in Yunnan Yunnan Agricultural University Kunming 650201 China
| | - Chengchi Fang
- State Key Laboratory of Freshwater Ecology and Biotechnology Institute of Hydrobiology Chinese Academy of Sciences Wuhan 430072 China
- Academy of Plateau Science and Sustainability Qinghai Normal University Xining 810016 P. R. China
| | - Shunping He
- State Key Laboratory of Freshwater Ecology and Biotechnology Institute of Hydrobiology Chinese Academy of Sciences Wuhan 430072 China
- Academy of Plateau Science and Sustainability Qinghai Normal University Xining 810016 P. R. China
- Institute of Deep Sea Science and Engineering Chinese Academy of Sciences Sanya China
- Center for Excellence in Animal Evolution and Genetics Chinese Academy of Sciences Kunming 650223 China
| | - Liandong Yang
- State Key Laboratory of Freshwater Ecology and Biotechnology Institute of Hydrobiology Chinese Academy of Sciences Wuhan 430072 China
- Academy of Plateau Science and Sustainability Qinghai Normal University Xining 810016 P. R. China
| |
Collapse
|
34
|
Cezard T, Cunningham F, Hunt SE, Koylass B, Kumar N, Saunders G, Shen A, Silva A, Tsukanov K, Venkataraman S, Flicek P, Parkinson H, Keane T. The European Variation Archive: a FAIR resource of genomic variation for all species. Nucleic Acids Res 2022; 50:D1216-D1220. [PMID: 34718739 PMCID: PMC8728205 DOI: 10.1093/nar/gkab960] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 09/23/2021] [Accepted: 10/14/2021] [Indexed: 12/13/2022] Open
Abstract
The European Variation Archive (EVA; https://www.ebi.ac.uk/eva/) is a resource for sharing all types of genetic variation data (SNPs, indels, and structural variants) for all species. The EVA was created in 2014 to provide FAIR access to genetic variation data and has since grown to be a primary resource for genomic variants hosting >3 billion records. The EVA and dbSNP have established a compatible global system to assign unique identifiers to all submitted genetic variants. The EVA is active within the Global Alliance of Genomics and Health (GA4GH), maintaining, contributing and implementing standards such as VCF, Refget and Variant Representation Specification (VRS). In this article, we describe the submission and permanent accessioning services along with the different ways the data can be retrieved by the scientific community.
Collapse
Affiliation(s)
- Timothe Cezard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Baron Koylass
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Nitin Kumar
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Gary Saunders
- ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - April Shen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Andres F Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Kirill Tsukanov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Sundararaman Venkataraman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Helen Parkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Thomas M Keane
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
35
|
Auch H, Klymiuk N, Runa-Vochozkova P. Modifying Bacterial Artificial Chromosomes for Extended Genome Modification. Methods Mol Biol 2022; 2495:67-90. [PMID: 35696028 DOI: 10.1007/978-1-0716-2301-5_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Bacterial artificial chromosomes have been used extensively for the exploration of mammalian genomes. Although novel approaches made their initial function expendable, the available BAC libraries are a precious source for life science. Their comprising of extended genomic regions provides an ideal basis for creating a large targeting vector. Here, we describe the identification of suitable BACs from their libraries and their verification prior to manipulation. Further, protocols for modifying BAC, confirming the desired modification and the preparation of transfection into mammalian cells are given.
Collapse
Affiliation(s)
- Hannah Auch
- Large Animal Models in Cardiovascular Research, Internal Medical Department I, TU Munich, Munich, Germany
- Center for Innovative Medical Models, LMU Munich, Munich, Germany
| | - Nikolai Klymiuk
- Large Animal Models in Cardiovascular Research, Internal Medical Department I, TU Munich, Munich, Germany
- Center for Innovative Medical Models, LMU Munich, Munich, Germany
| | - Petra Runa-Vochozkova
- Large Animal Models in Cardiovascular Research, Internal Medical Department I, TU Munich, Munich, Germany.
- Center for Innovative Medical Models, LMU Munich, Munich, Germany.
| |
Collapse
|
36
|
Salama SR. The Complexity of the Mammalian Transcriptome. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1363:11-22. [PMID: 35220563 DOI: 10.1007/978-3-030-92034-0_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Draft genome assemblies for multiple mammalian species combined with new technologies to map transcripts from diverse RNA samples to these genomes developed in the early 2000s revealed that the mammalian transcriptome was vastly larger and more complex than previously anticipated. Efforts to comprehensively catalog the identity and features of transcripts present in a variety of species, tissues and cell lines revealed that a large fraction of the mammalian genome is transcribed in at least some settings. A large number of these transcripts encode long non-coding RNAs (lncRNAs). Many lncRNAs overlap or are anti-sense to protein coding genes and others overlap small RNAs. However, a large number are independent of any previously known mRNA or small RNA. While the functions of a majority of these lncRNAs are unknown, many appear to play roles in gene regulation. Many lncRNAs have species-specific and cell type specific expression patterns and their evolutionary origins are varied. While technological challenges have hindered getting a full picture of the diversity and transcript structure of all of the transcripts arising from lncRNA loci, new technologies including single molecule nanopore sequencing and single cell RNA sequencing promise to generate a comprehensive picture of the mammalian transcriptome.
Collapse
Affiliation(s)
- Sofie R Salama
- UC Santa Cruz Genomics Institute, Department of Biomolecular Engineering and Howard Hughes Medical Institute, University of California, Santa Cruz, Santa Cruz, CA, USA.
| |
Collapse
|
37
|
Brown JL, Swift CL, Mondo SJ, Seppala S, Salamov A, Singan V, Henrissat B, Drula E, Henske JK, Lee S, LaButti K, He G, Yan M, Barry K, Grigoriev IV, O'Malley MA. Co‑cultivation of the anaerobic fungus Caecomyces churrovis with Methanobacterium bryantii enhances transcription of carbohydrate binding modules, dockerins, and pyruvate formate lyases on specific substrates. BIOTECHNOLOGY FOR BIOFUELS 2021; 14:234. [PMID: 34893091 PMCID: PMC8665504 DOI: 10.1186/s13068-021-02083-w] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 11/19/2021] [Indexed: 05/12/2023]
Abstract
Anaerobic fungi and methanogenic archaea are two classes of microorganisms found in the rumen microbiome that metabolically interact during lignocellulose breakdown. Here, stable synthetic co-cultures of the anaerobic fungus Caecomyces churrovis and the methanogen Methanobacterium bryantii (not native to the rumen) were formed, demonstrating that microbes from different environments can be paired based on metabolic ties. Transcriptional and metabolic changes induced by methanogen co-culture were evaluated in C. churrovis across a variety of substrates to identify mechanisms that impact biomass breakdown and sugar uptake. A high-quality genome of C. churrovis was obtained and annotated, which is the first sequenced genome of a non-rhizoid-forming anaerobic fungus. C. churrovis possess an abundance of CAZymes and carbohydrate binding modules and, in agreement with previous studies of early-diverging fungal lineages, N6-methyldeoxyadenine (6mA) was associated with transcriptionally active genes. Co-culture with the methanogen increased overall transcription of CAZymes, carbohydrate binding modules, and dockerin domains in co-cultures grown on both lignocellulose and cellulose and caused upregulation of genes coding associated enzymatic machinery including carbohydrate binding modules in family 18 and dockerin domains across multiple growth substrates relative to C. churrovis monoculture. Two other fungal strains grown on a reed canary grass substrate in co-culture with the same methanogen also exhibited high log2-fold change values for upregulation of genes encoding carbohydrate binding modules in families 1 and 18. Transcriptional upregulation indicated that co-culture of the C. churrovis strain with a methanogen may enhance pyruvate formate lyase (PFL) function for growth on xylan and fructose and production of bottleneck enzymes in sugar utilization pathways, further supporting the hypothesis that co-culture with a methanogen may enhance certain fungal metabolic functions. Upregulation of CBM18 may play a role in fungal-methanogen physical associations and fungal cell wall development and remodeling.
Collapse
Affiliation(s)
- Jennifer L Brown
- Department of Chemical Engineering, University of California Santa Barbara, Santa Barbara, CA, 93106, USA
| | - Candice L Swift
- Department of Chemical Engineering, University of California Santa Barbara, Santa Barbara, CA, 93106, USA
| | - Stephen J Mondo
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Department of Agricultural Biology, Colorado State University, Fort Collins, CO, 80523, USA
| | - Susanna Seppala
- Department of Chemical Engineering, University of California Santa Barbara, Santa Barbara, CA, 93106, USA
| | - Asaf Salamov
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Vasanth Singan
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Bernard Henrissat
- DTU Bioengineering, Technical University of Denmark, 2800, Kgs. Lyngby, Denmark
- Department of Biological Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Elodie Drula
- Architecture Et Fonction Des Macromolécules Biologiques, CNRS/Aix-Marseille University, Marseille, France
- INRAE USC1408, AFMB, 13009, Marseille, France
| | - John K Henske
- Department of Chemical Engineering, University of California Santa Barbara, Santa Barbara, CA, 93106, USA
| | - Samantha Lee
- Department of Chemical Engineering, University of California Santa Barbara, Santa Barbara, CA, 93106, USA
| | - Kurt LaButti
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Guifen He
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Mi Yan
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Kerrie Barry
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Igor V Grigoriev
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, USA
| | - Michelle A O'Malley
- Department of Chemical Engineering, University of California Santa Barbara, Santa Barbara, CA, 93106, USA.
- Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| |
Collapse
|
38
|
Beck KL, Seabolt E, Agarwal A, Nayar G, Bianco S, Krishnareddy H, Ngo TA, Kunitomi M, Mukherjee V, Kaufman JH. Semi-Supervised Pipeline for Autonomous Annotation of SARS-CoV-2 Genomes. Viruses 2021; 13:2426. [PMID: 34960694 PMCID: PMC8706859 DOI: 10.3390/v13122426] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 11/17/2021] [Accepted: 11/20/2021] [Indexed: 12/12/2022] Open
Abstract
SARS-CoV-2 genomic sequencing efforts have scaled dramatically to address the current global pandemic and aid public health. However, autonomous genome annotation of SARS-CoV-2 genes, proteins, and domains is not readily accomplished by existing methods and results in missing or incorrect sequences. To overcome this limitation, we developed a novel semi-supervised pipeline for automated gene, protein, and functional domain annotation of SARS-CoV-2 genomes that differentiates itself by not relying on the use of a single reference genome and by overcoming atypical genomic traits that challenge traditional bioinformatic methods. We analyzed an initial corpus of 66,000 SARS-CoV-2 genome sequences collected from labs across the world using our method and identified the comprehensive set of known proteins with 98.5% set membership accuracy and 99.1% accuracy in length prediction, compared to proteome references, including Replicase polyprotein 1ab (with its transcriptional slippage site). Compared to other published tools, such as Prokka (base) and VAPiD, we yielded a 6.4- and 1.8-fold increase in protein annotations. Our method generated 13,000,000 gene, protein, and domain sequences-some conserved across time and geography and others representing emerging variants. We observed 3362 non-redundant sequences per protein on average within this corpus and described key D614G and N501Y variants spatiotemporally in the initial genome corpus. For spike glycoprotein domains, we achieved greater than 97.9% sequence identity to references and characterized receptor binding domain variants. We further demonstrated the robustness and extensibility of our method on an additional 4000 variant diverse genomes containing all named variants of concern and interest as of August 2021. In this cohort, we successfully identified all keystone spike glycoprotein mutations in our predicted protein sequences with greater than 99% accuracy as well as demonstrating high accuracy of the protein and domain annotations. This work comprehensively presents the molecular targets to refine biomedical interventions for SARS-CoV-2 with a scalable, high-accuracy method to analyze newly sequenced infections as they arise.
Collapse
Affiliation(s)
- Kristen L. Beck
- AI and Cognitive Software, IBM Almaden Research Center, San Jose, CA 95120, USA; (A.A.); (G.N.); (S.B.); (H.K.); (T.A.N.); (M.K.); (V.M.); (J.H.K.)
| | - Edward Seabolt
- AI and Cognitive Software, IBM Almaden Research Center, San Jose, CA 95120, USA; (A.A.); (G.N.); (S.B.); (H.K.); (T.A.N.); (M.K.); (V.M.); (J.H.K.)
| | - Akshay Agarwal
- AI and Cognitive Software, IBM Almaden Research Center, San Jose, CA 95120, USA; (A.A.); (G.N.); (S.B.); (H.K.); (T.A.N.); (M.K.); (V.M.); (J.H.K.)
| | - Gowri Nayar
- AI and Cognitive Software, IBM Almaden Research Center, San Jose, CA 95120, USA; (A.A.); (G.N.); (S.B.); (H.K.); (T.A.N.); (M.K.); (V.M.); (J.H.K.)
| | - Simone Bianco
- AI and Cognitive Software, IBM Almaden Research Center, San Jose, CA 95120, USA; (A.A.); (G.N.); (S.B.); (H.K.); (T.A.N.); (M.K.); (V.M.); (J.H.K.)
- NSF Center for Cellular Construction, San Francisco, CA 94158, USA
| | - Harsha Krishnareddy
- AI and Cognitive Software, IBM Almaden Research Center, San Jose, CA 95120, USA; (A.A.); (G.N.); (S.B.); (H.K.); (T.A.N.); (M.K.); (V.M.); (J.H.K.)
| | - Timothy A. Ngo
- AI and Cognitive Software, IBM Almaden Research Center, San Jose, CA 95120, USA; (A.A.); (G.N.); (S.B.); (H.K.); (T.A.N.); (M.K.); (V.M.); (J.H.K.)
| | - Mark Kunitomi
- AI and Cognitive Software, IBM Almaden Research Center, San Jose, CA 95120, USA; (A.A.); (G.N.); (S.B.); (H.K.); (T.A.N.); (M.K.); (V.M.); (J.H.K.)
| | - Vandana Mukherjee
- AI and Cognitive Software, IBM Almaden Research Center, San Jose, CA 95120, USA; (A.A.); (G.N.); (S.B.); (H.K.); (T.A.N.); (M.K.); (V.M.); (J.H.K.)
| | - James H. Kaufman
- AI and Cognitive Software, IBM Almaden Research Center, San Jose, CA 95120, USA; (A.A.); (G.N.); (S.B.); (H.K.); (T.A.N.); (M.K.); (V.M.); (J.H.K.)
| |
Collapse
|
39
|
Wu Y, Chen Q, Zhang Q, Li M, Li H, Jia L, Huang Y, Zhang J. Analysis of whole-exome data of cfDNA and the tumor tissue of non-small cell lung cancer. ANNALS OF TRANSLATIONAL MEDICINE 2021; 9:1453. [PMID: 34734005 PMCID: PMC8506706 DOI: 10.21037/atm-21-4117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 09/10/2021] [Indexed: 11/13/2022]
Abstract
Background Non-small cell lung cancer (NSCLC) has the highest cancer mortality rate in the world, but currently there is no effective method of dynamic monitoring. Gene mutation is an important factor in tumorigenesis and can be detected using high-throughput sequencing technology. This study aimed to analyze the driving genes in the tumor of NSCLC patients by whole exon sequencing, and to compare and analyze the subclones of the tumor at different time points. Methods We collected 87 cases of NSCLC tumor tissues, para-cancer tissues, and peripheral blood samples for detecting cell-free DNAs (cfDNAs) from January 2016 to December 2018, and whole-exome sequencing was performed. The gene mutation map of NSCLC was drawn in detail by second-generation sequencing data analysis and new driver genes were found. In addition, we performed a subclonal analysis of tumors from different stages of the same patient to further describe the tumor heterogeneity. Results We found that the clonal analysis obtained by cfDNA detection was similar to the clonal analysis of the tissue samples, so real-time monitoring of tumor changes can be carried out through monitoring cfDNA. Conclusions This study provides evidence for studying the gene mutation information of NSCLC and shows the importance of cfDNA in the analysis of tumor subcloning information.
Collapse
Affiliation(s)
- Yuanzhou Wu
- Department of Thoracic Surgery, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Qunqing Chen
- Department of Thoracic Surgery, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | | | - Man Li
- Department of Pathology, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Hui Li
- Department of Thoracic Surgery, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Longfei Jia
- Department of Thoracic Surgery, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Yang Huang
- Department of Thoracic Surgery, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Jian Zhang
- Department of Oncology, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| |
Collapse
|
40
|
Cosentino RO, Brink BG, Siegel TN. Allele-specific assembly of a eukaryotic genome corrects apparent frameshifts and reveals a lack of nonsense-mediated mRNA decay. NAR Genom Bioinform 2021; 3:lqab082. [PMID: 34541528 PMCID: PMC8445201 DOI: 10.1093/nargab/lqab082] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Revised: 08/25/2021] [Accepted: 09/06/2021] [Indexed: 11/14/2022] Open
Abstract
To date, most reference genomes represent a mosaic consensus sequence in which the homologous chromosomes are collapsed into one sequence. This approach produces sequence artefacts and impedes analyses of allele-specific mechanisms. Here, we report an allele-specific genome assembly of the diploid parasite Trypanosoma brucei and reveal allelic variants affecting gene expression. Using long-read sequencing and chromosome conformation capture data, we could assign 99.5% of all heterozygote variants to a specific homologous chromosome and build a 66 Mb long allele-specific genome assembly. The phasing of haplotypes allowed us to resolve hundreds of artefacts present in the previous mosaic consensus assembly. In addition, it revealed allelic recombination events, visible as regions of low allelic heterozygosity, enabling the lineage tracing of T. brucei isolates. Interestingly, analyses of transcriptome and translatome data of genes with allele-specific premature termination codons point to the absence of a nonsense-mediated decay mechanism in trypanosomes. Taken together, this study delivers a reference quality allele-specific genome assembly of T. brucei and demonstrates the importance of such assemblies for the study of gene expression control. We expect the new genome assembly will increase the awareness of allele-specific phenomena and provide a platform to investigate them.
Collapse
Affiliation(s)
- Raúl O Cosentino
- Division of Experimental Parasitology, Faculty of Veterinary Medicine, Ludwig-Maximilians-Universität in Munich, Lena-Christ-Str. 48, Planegg-Martinsried 82152, Germany
| | - Benedikt G Brink
- Division of Experimental Parasitology, Faculty of Veterinary Medicine, Ludwig-Maximilians-Universität in Munich, Lena-Christ-Str. 48, Planegg-Martinsried 82152, Germany
| | - T Nicolai Siegel
- Division of Experimental Parasitology, Faculty of Veterinary Medicine, Ludwig-Maximilians-Universität in Munich, Lena-Christ-Str. 48, Planegg-Martinsried 82152, Germany
| |
Collapse
|
41
|
Amorim MJB, Gansemans Y, Gomes SIL, Van Nieuwerburgh F, Scott-Fordsmand JJ. Annelid genomes: Enchytraeus crypticus, a soil model for the innate (and primed) immune system. Lab Anim (NY) 2021; 50:285-294. [PMID: 34489599 PMCID: PMC8460440 DOI: 10.1038/s41684-021-00831-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Accepted: 07/26/2021] [Indexed: 02/05/2023]
Abstract
Enchytraeids (Annelida) are soil invertebrates with worldwide distribution that have served as ecotoxicology models for over 20 years. We present the first high-quality reference genome of Enchytraeus crypticus, assembled from a combination of Pacific Bioscience single-molecule real-time and Illumina sequencing platforms as a 525.2 Mbp genome (910 gapless scaffolds and 18,452 genes). We highlight isopenicillin, acquired by horizontal gene transfer and conferring antibiotic function. Significant gene family expansions associated with regeneration (long interspersed nuclear elements), the innate immune system (tripartite motif-containing protein) and response to stress (cytochrome P450) were identified. The ACE (Angiotensin-converting enzyme) - a homolog of ACE2, which is involved in the coronavirus SARS-CoV-2 cell entry - is also present in E. crypticus. There is an obvious potential of using E. crypticus as a model to study interactions between regeneration, the innate immune system and aging-dependent decline.
Collapse
Affiliation(s)
- Mónica J B Amorim
- Department of Biology & CESAM, University of Aveiro, Aveiro, Portugal.
| | - Yannick Gansemans
- Department of Pharmaceutics, Laboratory of Pharmaceutical Biotechnology, Ghent University, Ghent, Belgium
| | - Susana I L Gomes
- Department of Biology & CESAM, University of Aveiro, Aveiro, Portugal
| | - Filip Van Nieuwerburgh
- Department of Pharmaceutics, Laboratory of Pharmaceutical Biotechnology, Ghent University, Ghent, Belgium
| | | |
Collapse
|
42
|
Yousaf A, Liu J, Ye S, Chen H. Current Progress in Evolutionary Comparative Genomics of Great Apes. Front Genet 2021; 12:657468. [PMID: 34456962 PMCID: PMC8385753 DOI: 10.3389/fgene.2021.657468] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 07/15/2021] [Indexed: 12/04/2022] Open
Abstract
The availability of high-quality genome sequences of great ape species provides unprecedented opportunities for genomic analyses. Herein, we reviewed the recent progress in evolutionary comparative genomic studies of the existing great ape species, including human, chimpanzee, bonobo, gorilla, and orangutan. We elaborate discovery on evolutionary history, natural selection, structural variations, and new genes of these species, which is informative for understanding the origin of human-specific phenotypes.
Collapse
Affiliation(s)
- Aisha Yousaf
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,China National Center for Bioinformation, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Junfeng Liu
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,China National Center for Bioinformation, Beijing, China
| | - Sicheng Ye
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,China National Center for Bioinformation, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Hua Chen
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,China National Center for Bioinformation, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China.,CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| |
Collapse
|
43
|
Xu S, Ding Y, Sun J, Zhang Z, Wu Z, Yang T, Shen F, Xue G. A high-quality genome assembly of Jasminum sambac provides insight into floral trait formation and Oleaceae genome evolution. Mol Ecol Resour 2021; 22:724-739. [PMID: 34460989 DOI: 10.1111/1755-0998.13497] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Revised: 08/23/2021] [Accepted: 08/24/2021] [Indexed: 11/29/2022]
Abstract
As one of the most economically significant Oleaceae family members, Jasminum sambac is renowned for its distinct sweet, heady fragrance. Using Illumina reads, Nanopore long reads, and HiC-sequencing, we efficiently assembled and annotated the J. sambac genome. The high-quality genome assembly consisted of a total of 507 Mb sequence (contig N50 = 17.6 Mb) with 13 pseudomolecules. A total of 21,143 protein-coding genes and 303 Mb repeat sequences were predicted. An ancient whole-genome triplication event at the base of Oleaceae (~66 million years ago [Ma], Late Cretaceous) was identified and this may have contributed to the diversification of the Oleaceae ancestor and its divergence from the Lamiales. Stress-related (e.g., WRKY) and flowering-related (e.g., MADS-box) genes were located in the triplicated regions, suggesting that the polyploidy event might have contributed adaptive potential. Genes related to terpenoid biosynthesis, for example, FTA and TPS, were observed to be duplicated to a great extent in the J. sambac genome, perhaps explaining the strong fragrance of the flowers. Copy number changes in distinct phylogenetic clades of the MADS-box family were observed in J. sambac genome, for example, AGL6- and Mα- were lost and SOC- expanded, features that might underlie the long flowering period of J. sambac. The structural genes implicated in anthocyanin biosynthesis were depleted and this may explain the absence of vivid colours in jasmine. Collectively, assembling the J. sambac genome provides new insights into the genome evolution of the Oleaceae family and provides mechanistic insights into floral properties.
Collapse
Affiliation(s)
- Shixiao Xu
- Tobacco College, Henan Agricultural University, Zhengzhou City, Henan Province, China.,Scientific Observation and Experiment Station of Tobacco Biology & Processing, Ministry of Agriculture, Zhengzhou City, Henan Province, China.,National Tobacco Cultivation & Physiology & Biochemisty Research Centre, Zhengzhou City, Henan Province, China
| | - Yongle Ding
- Tobacco College, Henan Agricultural University, Zhengzhou City, Henan Province, China.,Scientific Observation and Experiment Station of Tobacco Biology & Processing, Ministry of Agriculture, Zhengzhou City, Henan Province, China.,National Tobacco Cultivation & Physiology & Biochemisty Research Centre, Zhengzhou City, Henan Province, China
| | - Juntao Sun
- Tobacco College, Henan Agricultural University, Zhengzhou City, Henan Province, China.,Scientific Observation and Experiment Station of Tobacco Biology & Processing, Ministry of Agriculture, Zhengzhou City, Henan Province, China.,National Tobacco Cultivation & Physiology & Biochemisty Research Centre, Zhengzhou City, Henan Province, China
| | - Zhiqiang Zhang
- Tobacco College, Henan Agricultural University, Zhengzhou City, Henan Province, China.,Scientific Observation and Experiment Station of Tobacco Biology & Processing, Ministry of Agriculture, Zhengzhou City, Henan Province, China.,National Tobacco Cultivation & Physiology & Biochemisty Research Centre, Zhengzhou City, Henan Province, China
| | - Zhaoyun Wu
- Tobacco College, Henan Agricultural University, Zhengzhou City, Henan Province, China.,Scientific Observation and Experiment Station of Tobacco Biology & Processing, Ministry of Agriculture, Zhengzhou City, Henan Province, China.,National Tobacco Cultivation & Physiology & Biochemisty Research Centre, Zhengzhou City, Henan Province, China
| | - Tiezhao Yang
- Tobacco College, Henan Agricultural University, Zhengzhou City, Henan Province, China
| | - Fei Shen
- Beijing Agro-biotechnology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China
| | - Gang Xue
- Tobacco College, Henan Agricultural University, Zhengzhou City, Henan Province, China.,Scientific Observation and Experiment Station of Tobacco Biology & Processing, Ministry of Agriculture, Zhengzhou City, Henan Province, China.,National Tobacco Cultivation & Physiology & Biochemisty Research Centre, Zhengzhou City, Henan Province, China
| |
Collapse
|
44
|
Jeon S, Kim S, Oh MH, Liang P, Tang W, Han K. A comprehensive analysis of gorilla-specific LINE-1 retrotransposons. Genes Genomics 2021; 43:1133-1141. [PMID: 34406591 DOI: 10.1007/s13258-021-01146-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 07/29/2021] [Indexed: 11/29/2022]
Abstract
BACKGROUND Long interspersed element-1 (LINE-1 or L1) is the most abundant retrotransposons in the primate genome. They have approximately 520,000 copies and make up ~ 17% of the primate genome. Full-length L1s can mobilize to a new genomic location using their enzymatic machinery. Gorilla is the second closest species to humans after the chimpanzee, and human-gorilla split 7-12 million years ago. The gorilla genome provides an opportunity to explore primate origins and evolution. OBJECTIVE L1s have contributed to genome diversity and variations during primate evolution. This study aimed to identify gorilla-specific L1s using a more recent version of the gorilla reference genome (Mar. 2016 GSMRT3/gorGor5). METHODS We collected gorilla-specific L1 candidates through computational analysis and manual inspection. L1Xplorer was used to identify whether full-length gorilla-specific L1s were intact. In addition, to determine the level of sequence conservation between intact fulllength gorilla-specific L1s, two ORFs of intact L1s were aligned with the L1PA2 consensus sequence. RESULTS 2002 gorilla-specific L1 candidates were identified through computational analysis. Among them, we manually inspected 1,883 gorilla-specific L1s, among which most of them belong to the L1PA2 subfamily and 12 were intact L1s that could influence genomic variations in the gorilla genome. Interestingly, the 12 intact full-length gorilla-specific L1s have 14 highly conserved nonsynonymous mutations, including 6 mutations and 8 mutations in ORF1 and ORF2, respectively. In comparison to the intact full-length chimpanzee-specific L1s and human-specific hot-L1s, two of these in ORF1 (L256F and E293G) were shown as gorilla-specific nonsynonymous mutations. CONCLUSION The gorilla-specific L1s may have had significantly affected the gorilla genome to compose a genome different form that of other primates during primate evolution.
Collapse
Affiliation(s)
- Soyeon Jeon
- Department of Microbiology, College of Science and Technology, Dankook University, Cheonan, 31116, Republic of Korea
| | - Songmi Kim
- Department of Microbiology, College of Science and Technology, Dankook University, Cheonan, 31116, Republic of Korea.,Center for Bio-Medical Engineering Core Facility, Dankook University, Cheonan, 31116, Republic of Korea
| | - Man Hwan Oh
- Department of Microbiology, College of Science and Technology, Dankook University, Cheonan, 31116, Republic of Korea
| | - Ping Liang
- Department of Biological Sciences, Brock University, St. Catharines, ON, L2S 3A1, Canada.,Centre of Biotechnologies, Brock University, St. Catharines, ON, L2S 3A1, Canada
| | - Wanxiangfu Tang
- Department of Biological Sciences, Brock University, St. Catharines, ON, L2S 3A1, Canada
| | - Kyudong Han
- Department of Microbiology, College of Science and Technology, Dankook University, Cheonan, 31116, Republic of Korea. .,Center for Bio-Medical Engineering Core Facility, Dankook University, Cheonan, 31116, Republic of Korea.
| |
Collapse
|
45
|
Lv H, Dao FY, Zhang D, Yang H, Lin H. Advances in mapping the epigenetic modifications of 5-methylcytosine (5mC), N6-methyladenine (6mA), and N4-methylcytosine (4mC). Biotechnol Bioeng 2021; 118:4204-4216. [PMID: 34370308 DOI: 10.1002/bit.27911] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Accepted: 08/06/2021] [Indexed: 12/30/2022]
Abstract
DNA modification plays a pivotal role in regulating gene expression in cell development. As prevalent markers on DNA, 5-methylcytosine (5mC), N6-methyladenine (6mA), and N4-methylcytosine (4mC) can be recognized by specific methyltransferases, facilitating cellular defense and the versatile regulation of gene expression in eukaryotes and prokaryotes. Recent advances in DNA sequencing technology have permitted the positions of different modifications to be resolved at the genome-wide scale, which has led to the discovery of several novel insights into the complexity and functions of multiple methylations. In this review, we summarize differences in the various mapping approaches and discuss their pros and cons with respect to their relative read depths, speeds, and costs. We also discuss the development of future sequencing technologies and strategies for improving the detection resolution of current sequencing technologies. Lastly, we speculate on the potentially instrumental role that these sequencing technologies might play in future research.
Collapse
Affiliation(s)
- Hao Lv
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Fu-Ying Dao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
46
|
Jackson EK, Bellott DW, Cho TJ, Skaletsky H, Hughes JF, Pyntikova T, Page DC. Large palindromes on the primate X Chromosome are preserved by natural selection. Genome Res 2021; 31:1337-1352. [PMID: 34290043 PMCID: PMC8327919 DOI: 10.1101/gr.275188.120] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Accepted: 05/17/2021] [Indexed: 12/27/2022]
Abstract
Mammalian sex chromosomes carry large palindromes that harbor protein-coding gene families with testis-biased expression. However, there are few known examples of sex-chromosome palindromes conserved between species. We identified 26 palindromes on the human X Chromosome, constituting more than 2% of its sequence, and characterized orthologous palindromes in the chimpanzee and the rhesus macaque using a clone-based sequencing approach that incorporates full-length nanopore reads. Many of these palindromes are missing or misassembled in the current reference assemblies of these species' genomes. We find that 12 human X palindromes have been conserved for at least 25 million years, with orthologs in both chimpanzee and rhesus macaque. Insertions and deletions between species are significantly depleted within the X palindromes' protein-coding genes compared to their noncoding sequence, demonstrating that natural selection has preserved these gene families. The spacers that separate the left and right arms of palindromes are a site of localized structural instability, with seven of 12 conserved palindromes showing no spacer orthology between human and rhesus macaque. Analysis of the 1000 Genomes Project data set revealed that human X-palindrome spacers are enriched for deletions relative to arms and flanking sequence, including a common spacer deletion that affects 13% of human X Chromosomes. This work reveals an abundance of conserved palindromes on primate X Chromosomes and suggests that protein-coding gene families in palindromes (most of which remain poorly characterized) promote X-palindrome survival in the face of ongoing structural instability.
Collapse
Affiliation(s)
- Emily K Jackson
- Whitehead Institute, Cambridge, Massachusetts 02142, USA
- Howard Hughes Medical Institute, Whitehead Institute, Cambridge, Massachusetts 02142, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | | | - Ting-Jan Cho
- Whitehead Institute, Cambridge, Massachusetts 02142, USA
| | - Helen Skaletsky
- Whitehead Institute, Cambridge, Massachusetts 02142, USA
- Howard Hughes Medical Institute, Whitehead Institute, Cambridge, Massachusetts 02142, USA
| | | | | | - David C Page
- Whitehead Institute, Cambridge, Massachusetts 02142, USA
- Howard Hughes Medical Institute, Whitehead Institute, Cambridge, Massachusetts 02142, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
47
|
Xu Z, Dixon JR. Genome reconstruction and haplotype phasing using chromosome conformation capture methodologies. Brief Funct Genomics 2021; 19:139-150. [PMID: 31875884 DOI: 10.1093/bfgp/elz026] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Revised: 09/06/2019] [Accepted: 09/15/2019] [Indexed: 12/22/2022] Open
Abstract
Genomic analysis of individuals or organisms is predicated on the availability of high-quality reference and genotype information. With the rapidly dropping costs of high-throughput DNA sequencing, this is becoming readily available for diverse organisms and for increasingly large populations of individuals. Despite these advances, there are still aspects of genome sequencing that remain challenging for existing sequencing methods. This includes the generation of long-range contiguity during genome assembly, identification of structural variants in both germline and somatic tissues, the phasing of haplotypes in diploid organisms and the resolution of genome sequence for organisms derived from complex samples. These types of information are valuable for understanding the role of genome sequence and genetic variation on genome function, and numerous approaches have been developed to address them. Recently, chromosome conformation capture (3C) experiments, such as the Hi-C assay, have emerged as powerful tools to aid in these challenges for genome reconstruction. We will review the current use of Hi-C as a tool for aiding in genome sequencing, addressing the applications, strengths, limitations and potential future directions for the use of 3C data in genome analysis. We argue that unique features of Hi-C experiments make this data type a powerful tool to address challenges in genome sequencing, and that future integration of Hi-C data with alternative sequencing assays will facilitate the continuing revolution in genomic analysis and genome sequencing.
Collapse
|
48
|
Vervoort L, Dierckxsens N, Pereboom Z, Capozzi O, Rocchi M, Shaikh TH, Vermeesch JR. 22q11.2 Low Copy Repeats Expanded in the Human Lineage. Front Genet 2021; 12:706641. [PMID: 34335701 PMCID: PMC8320366 DOI: 10.3389/fgene.2021.706641] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Accepted: 06/23/2021] [Indexed: 11/13/2022] Open
Abstract
Segmental duplications or low copy repeats (LCRs) constitute duplicated regions interspersed in the human genome, currently neglected in standard analyses due to their extreme complexity. Recent functional studies have indicated the potential of genes within LCRs in synaptogenesis, neuronal migration, and neocortical expansion in the human lineage. One of the regions with the highest proportion of duplicated sequence is the 22q11.2 locus, carrying eight LCRs (LCR22-A until LCR22-H), and rearrangements between them cause the 22q11.2 deletion syndrome. The LCR22-A block was recently reported to be hypervariable in the human population. It remains unknown whether this variability also exists in non-human primates, since research is strongly hampered by the presence of sequence gaps in the human and non-human primate reference genomes. To chart the LCR22 haplotypes and the associated inter- and intra-species variability, we de novo assembled the region in non-human primates by a combination of optical mapping techniques. A minimal and likely ancient haplotype is present in the chimpanzee, bonobo, and rhesus monkey without intra-species variation. In addition, the optical maps identified assembly errors and closed gaps in the orthologous chromosome 22 reference sequences. These findings indicate the LCR22 expansion to be unique to the human population, which might indicate involvement of the region in human evolution and adaptation. Those maps will enable LCR22-specific functional studies and investigate potential associations with the phenotypic variability in the 22q11.2 deletion syndrome.
Collapse
Affiliation(s)
| | | | - Zjef Pereboom
- Centre for Research and Conservation, Royal Zoological Society of Antwerp, Antwerp, Belgium
- Evolutionary Ecology Group, Department of Biology, Antwerp University, Antwerp, Belgium
| | | | | | - Tamim H. Shaikh
- Section of Genetics and Metabolism, Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO, United States
| | | |
Collapse
|
49
|
Vegesna R, Tomaszkiewicz M, Ryder OA, Campos-Sánchez R, Medvedev P, DeGiorgio M, Makova KD. Ampliconic Genes on the Great Ape Y Chromosomes: Rapid Evolution of Copy Number but Conservation of Expression Levels. Genome Biol Evol 2021; 12:842-859. [PMID: 32374870 PMCID: PMC7313670 DOI: 10.1093/gbe/evaa088] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/28/2020] [Indexed: 12/16/2022] Open
Abstract
Multicopy ampliconic gene families on the Y chromosome play an important role in spermatogenesis. Thus, studying their genetic variation in endangered great ape species is critical. We estimated the sizes (copy number) of nine Y ampliconic gene families in population samples of chimpanzee, bonobo, and orangutan with droplet digital polymerase chain reaction, combined these estimates with published data for human and gorilla, and produced genome-wide testis gene expression data for great apes. Analyzing this comprehensive data set within an evolutionary framework, we, first, found high inter- and intraspecific variation in gene family size, with larger families exhibiting higher variation as compared with smaller families, a pattern consistent with random genetic drift. Second, for four gene families, we observed significant interspecific size differences, sometimes even between sister species—chimpanzee and bonobo. Third, despite substantial variation in copy number, Y ampliconic gene families’ expression levels did not differ significantly among species, suggesting dosage regulation. Fourth, for three gene families, size was positively correlated with gene expression levels across species, suggesting that, given sufficient evolutionary time, copy number influences gene expression. Our results indicate high variability in size but conservation in gene expression levels in Y ampliconic gene families, significantly advancing our understanding of Y-chromosome evolution in great apes.
Collapse
Affiliation(s)
- Rahulsimham Vegesna
- Bioinformatics and Genomics Graduate Program, The Huck Institutes for the Life Sciences, Pennsylvania State University, University Park
| | | | - Oliver A Ryder
- Institute for Conservation Research, San Diego Zoo Global, San Diego, California
| | | | - Paul Medvedev
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park.,Department of Computer Science and Engineering, Pennsylvania State University, University Park.,Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park.,Center for Medical Genomics, Pennsylvania State University, University Park
| | - Michael DeGiorgio
- Department of Biology, Pennsylvania State University, University Park.,Institute for Computational and Data Science, Pennsylvania State University, University Park
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University, University Park.,Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park.,Center for Medical Genomics, Pennsylvania State University, University Park
| |
Collapse
|
50
|
Jayakumar V, Nishimura O, Kadota M, Hirose N, Sano H, Murakawa Y, Yamamoto Y, Nakaya M, Tsukiyama T, Seita Y, Nakamura S, Kawai J, Sasaki E, Ema M, Kuraku S, Kawaji H, Sakakibara Y. Chromosomal-scale de novo genome assemblies of Cynomolgus Macaque and Common Marmoset. Sci Data 2021; 8:159. [PMID: 34183680 PMCID: PMC8239027 DOI: 10.1038/s41597-021-00935-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 04/29/2021] [Indexed: 01/18/2023] Open
Abstract
Cynomolgus macaque (Macaca fascicularis) and common marmoset (Callithrix jacchus) have been widely used in human biomedical research. Long-standing primate genome assemblies used the human genome as a reference for ordering and orienting the assembled fragments into chromosomes. Here we performed de novo genome assembly of these two species without any human genome-based bias observed in the genome assemblies released earlier. We assembled PacBio long reads, and the resultant contigs were scaffolded with Hi-C data, which were further refined based on Hi-C contact maps and alternate de novo assemblies. The assemblies achieved scaffold N50 lengths of 149 Mb and 137 Mb for cynomolgus macaque and common marmoset, respectively. The high fidelity of our assembly is also ascertained by BAC-end concordance in common marmoset. Our assembly of cynomolgus macaque outperformed all the available assemblies of this species in terms of contiguity. The chromosome-scale genome assemblies produced in this study are valuable resources for non-human primate models and provide an important baseline in human biomedical research.
Collapse
Affiliation(s)
- Vasanthan Jayakumar
- Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa, 223-8522, Japan
| | - Osamu Nishimura
- Laboratory for Phyloinformatics, RIKEN Center for Biosystems Dynamics Research, Minatojimaminami-machi 2-2-3, Kobe, Hyogo, 650-0047, Japan
| | - Mitsutaka Kadota
- Laboratory for Phyloinformatics, RIKEN Center for Biosystems Dynamics Research, Minatojimaminami-machi 2-2-3, Kobe, Hyogo, 650-0047, Japan
| | - Naoki Hirose
- RIKEN Center for Integrative Medical Science Preventive Medicine and Applied Genomics Unit, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
- Research Center for Genome & Medical Sciences, Tokyo Metropolitan Institute of Medical Science, 2-1-6 Kamikitazawa, Setagaya-ku, Tokyo, 156-8506, Japan
- Institute for the Advanced Study of Human Biology, Kyoto University, Yoshida-Konoe-cho, Sakyo-ku, Kyoto, 606-8501, Japan
| | - Hiromi Sano
- RIKEN Center for Integrative Medical Science Preventive Medicine and Applied Genomics Unit, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
- RIKEN Center for Integrative Medical Sciences RIKEN-IFOM Joint Laboratory for Cancer Genomics, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Yasuhiro Murakawa
- RIKEN Center for Integrative Medical Sciences RIKEN-IFOM Joint Laboratory for Cancer Genomics, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
- RIKEN Preventive Medicine and Diagnosis Innovation Program, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
- Institute for the Advanced Study of Human Biology, Kyoto University, Yoshida-Konoe-cho, Sakyo-ku, Kyoto, 606-8501, Japan
- Department of Medical Systems Genomics, Graduate School of Medicine, Kyoto University, Yoshida-Konoe-cho, Sakyo-ku, Kyoto, 606-8501, Japan
- IFOM-the FIRC Institute of Molecular Oncology, Milan, Italy
| | - Yumiko Yamamoto
- RIKEN Center for Integrative Medical Sciences Laboratory for Comprehensive Genomic Analysis, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Masataka Nakaya
- Department of Stem Cells and Human Disease Models, Research Center for Animal Life Science, Shiga University of Medical Science, Shiga, 520-2192, Japan
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, 606-8501, Japan
| | - Tomoyuki Tsukiyama
- Department of Stem Cells and Human Disease Models, Research Center for Animal Life Science, Shiga University of Medical Science, Shiga, 520-2192, Japan
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, 606-8501, Japan
| | - Yasunari Seita
- Department of Stem Cells and Human Disease Models, Research Center for Animal Life Science, Shiga University of Medical Science, Shiga, 520-2192, Japan
| | - Shinichiro Nakamura
- Department of Stem Cells and Human Disease Models, Research Center for Animal Life Science, Shiga University of Medical Science, Shiga, 520-2192, Japan
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, 606-8501, Japan
| | - Jun Kawai
- RIKEN Preventive Medicine and Diagnosis Innovation Program, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Erika Sasaki
- Central Institute for Experimental Animals, Department of Marmoset Biology and Medicine, Central Institute for Experimental Animals, 3-25-12, Tonomachi, Kawasaki-ku, Kawasaki, 210-0821, Japan
| | - Masatsugu Ema
- Department of Stem Cells and Human Disease Models, Research Center for Animal Life Science, Shiga University of Medical Science, Shiga, 520-2192, Japan
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, 606-8501, Japan
| | - Shigehiro Kuraku
- Laboratory for Phyloinformatics, RIKEN Center for Biosystems Dynamics Research, Minatojimaminami-machi 2-2-3, Kobe, Hyogo, 650-0047, Japan
| | - Hideya Kawaji
- RIKEN Center for Integrative Medical Science Preventive Medicine and Applied Genomics Unit, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan.
- Research Center for Genome & Medical Sciences, Tokyo Metropolitan Institute of Medical Science, 2-1-6 Kamikitazawa, Setagaya-ku, Tokyo, 156-8506, Japan.
- RIKEN Preventive Medicine and Diagnosis Innovation Program, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan.
| | - Yasubumi Sakakibara
- Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa, 223-8522, Japan.
| |
Collapse
|