1
|
Sebastianelli M, Lukhele SM, Secomandi S, de Souza SG, Haase B, Moysi M, Nikiforou C, Hutfluss A, Mountcastle J, Balacco J, Pelan S, Chow W, Fedrigo O, Downs CT, Monadjem A, Dingemanse NJ, Jarvis ED, Brelsford A, vonHoldt BM, Kirschel ANG. A genomic basis of vocal rhythm in birds. Nat Commun 2024; 15:3095. [PMID: 38653976 DOI: 10.1038/s41467-024-47305-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 03/22/2024] [Indexed: 04/25/2024] Open
Abstract
Vocal rhythm plays a fundamental role in sexual selection and species recognition in birds, but little is known of its genetic basis due to the confounding effect of vocal learning in model systems. Uncovering its genetic basis could facilitate identifying genes potentially important in speciation. Here we investigate the genomic underpinnings of rhythm in vocal non-learning Pogoniulus tinkerbirds using 135 individual whole genomes distributed across a southern African hybrid zone. We find rhythm speed is associated with two genes that are also known to affect human speech, Neurexin-1 and Coenzyme Q8A. Models leveraging ancestry reveal these candidate loci also impact rhythmic stability, a trait linked with motor performance which is an indicator of quality. Character displacement in rhythmic stability suggests possible reinforcement against hybridization, supported by evidence of asymmetric assortative mating in the species producing faster, more stable rhythms. Because rhythm is omnipresent in animal communication, candidate genes identified here may shape vocal rhythm across birds and other vertebrates.
Collapse
Affiliation(s)
- Matteo Sebastianelli
- Department of Biological Sciences, University of Cyprus, PO Box 20537, Nicosia, 1678, Cyprus.
- Department of Medical Biochemistry and Microbiology, Uppsala University, Box 582, 751 23, Uppsala, Sweden.
| | - Sifiso M Lukhele
- Department of Biological Sciences, University of Cyprus, PO Box 20537, Nicosia, 1678, Cyprus
| | - Simona Secomandi
- Department of Biological Sciences, University of Cyprus, PO Box 20537, Nicosia, 1678, Cyprus
| | - Stacey G de Souza
- Department of Biological Sciences, University of Cyprus, PO Box 20537, Nicosia, 1678, Cyprus
| | - Bettina Haase
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | - Michaella Moysi
- Department of Biological Sciences, University of Cyprus, PO Box 20537, Nicosia, 1678, Cyprus
| | - Christos Nikiforou
- Department of Biological Sciences, University of Cyprus, PO Box 20537, Nicosia, 1678, Cyprus
| | - Alexander Hutfluss
- Behavioural Ecology, Faculty of Biology, LMU Munich (LMU), 82152, Planegg-Martinsried, Germany
| | | | - Jennifer Balacco
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | | | | | - Olivier Fedrigo
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | - Colleen T Downs
- Centre for Functional Biodiversity, School of Life Sciences, University of KwaZulu-Natal, Pietermaritzburg, 3209, South Africa
| | - Ara Monadjem
- Department of Biological Sciences, University of Eswatini, Kwaluseni, Eswatini
- Mammal Research Institute, Department of Zoology & Entomology, University of Pretoria, Private Bag 20, Hatfield, 0028, Pretoria, South Africa
| | - Niels J Dingemanse
- Behavioural Ecology, Faculty of Biology, LMU Munich (LMU), 82152, Planegg-Martinsried, Germany
| | - Erich D Jarvis
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Alan Brelsford
- Department of Evolution, Ecology and Organismal Biology, University of California Riverside, Riverside, CA, 92521, USA
| | - Bridgett M vonHoldt
- Department of Ecology & Evolutionary Biology, Princeton University, Princeton, NJ, 08544, USA
| | - Alexander N G Kirschel
- Department of Biological Sciences, University of Cyprus, PO Box 20537, Nicosia, 1678, Cyprus.
| |
Collapse
|
2
|
Bukhman YV, Meyer S, Chu LF, Abueg L, Antosiewicz-Bourget J, Balacco J, Brecht M, Dinatale E, Fedrigo O, Formenti G, Fungtammasan A, Giri SJ, Hiller M, Howe K, Kihara D, Mamott D, Mountcastle J, Pelan S, Rabbani K, Sims Y, Tracey A, Wood JMD, Jarvis ED, Thomson JA, Chaisson MJP, Stewart R. Chromosome level genome assembly of the Etruscan shrew Suncus etruscus. Sci Data 2024; 11:176. [PMID: 38326333 PMCID: PMC10850158 DOI: 10.1038/s41597-024-03011-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 01/26/2024] [Indexed: 02/09/2024] Open
Abstract
Suncus etruscus is one of the world's smallest mammals, with an average body mass of about 2 grams. The Etruscan shrew's small body is accompanied by a very high energy demand and numerous metabolic adaptations. Here we report a chromosome-level genome assembly using PacBio long read sequencing, 10X Genomics linked short reads, optical mapping, and Hi-C linked reads. The assembly is partially phased, with the 2.472 Gbp primary pseudohaplotype and 1.515 Gbp alternate. We manually curated the primary assembly and identified 22 chromosomes, including X and Y sex chromosomes. The NCBI genome annotation pipeline identified 39,091 genes, 19,819 of them protein-coding. We also identified segmental duplications, inferred GO term annotations, and computed orthologs of human and mouse genes. This reference-quality genome will be an important resource for research on mammalian development, metabolism, and body size control.
Collapse
Affiliation(s)
- Yury V Bukhman
- Regenerative Biology, Morgridge Institute for Research, 330 N. Orchard St., Madison, WI, 53715, USA.
| | - Susanne Meyer
- Neuroscience Research Institute, University of California - Santa Barbara, 494 UCEN Rd, Isla Vista, CA, 93117, USA
| | - Li-Fang Chu
- Department of Comparative Biology and Experimental Medicine, University of Calgary, 2500 University Drive NW, Calgary, Alberta, T2N 1N4, Canada
| | - Linelle Abueg
- Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA
| | | | - Jennifer Balacco
- Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA
| | - Michael Brecht
- BCCN/Humboldt University Berlin, Philippstr, 13 House 6, 10115, Berlin, Germany
| | - Erica Dinatale
- Max Planck Institute for Biology Tübingen, Max-Planck-Ring 5, 72076, Tübingen, Germany
| | - Olivier Fedrigo
- Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA
| | - Giulio Formenti
- Laboratory of Neurogenetics of Language, The Rockefeller University/HHMI, 1230 York Avenue, New York, NY, 10065, USA
| | | | - Swagarika Jaharlal Giri
- Department of Computer Science, Purdue University, 249 S. Martin Jischke Dr, West Lafayette, IN, 47907, USA
| | - Michael Hiller
- LOEWE Centre for Translational Biodiversity Genomics, Senckenberganlage 25, 60325, Frankfurt, Germany
- Senckenberg Research Institute, Senckenberganlage 25, 60325, Frankfurt, Germany
- Institute of Cell Biology and Neuroscience, Faculty of Biosciences, Goethe University Frankfurt, Max-von-Laue-Str. 9, 60438, Frankfurt, Germany
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, 249 S. Martin Jischke Dr, West Lafayette, IN, 47907, USA
- Department of Biological Sciences, Purdue University, 249 S. Martin Jischke Dr., West Lafayette, IN, 47907, USA
| | - Daniel Mamott
- Regenerative Biology, Morgridge Institute for Research, 330 N. Orchard St., Madison, WI, 53715, USA
| | - Jacquelyn Mountcastle
- Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA
| | - Sarah Pelan
- Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
| | - Keon Rabbani
- Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way RRI 408, Los Angeles, CA, 90089, USA
| | - Ying Sims
- Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
| | - Alan Tracey
- Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
| | | | - Erich D Jarvis
- Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University/HHMI, 1230 York Avenue, New York, NY, 10065, USA
| | - James A Thomson
- Regenerative Biology, Morgridge Institute for Research, 330 N. Orchard St., Madison, WI, 53715, USA
- Department of Molecular, Cellular and Developmental Biology, University of California Santa Barbara, Santa Barbara, CA, 93106, USA
- Department of Cell and Regenerative Biology, University of Wisconsin School of Medicine and Public Health, Madison, WI, 53726, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way RRI 408, Los Angeles, CA, 90089, USA
| | - Ron Stewart
- Regenerative Biology, Morgridge Institute for Research, 330 N. Orchard St., Madison, WI, 53715, USA
| |
Collapse
|
3
|
Stevens L, Martínez-Ugalde I, King E, Wagah M, Absolon D, Bancroft R, Gonzalez de la Rosa P, Hall JL, Kieninger M, Kloch A, Pelan S, Robertson E, Pedersen AB, Abreu-Goodger C, Buck AH, Blaxter M. Ancient diversity in host-parasite interaction genes in a model parasitic nematode. Nat Commun 2023; 14:7776. [PMID: 38012132 PMCID: PMC10682056 DOI: 10.1038/s41467-023-43556-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 11/13/2023] [Indexed: 11/29/2023] Open
Abstract
Host-parasite interactions exert strong selection pressures on the genomes of both host and parasite. These interactions can lead to negative frequency-dependent selection, a form of balancing selection that is hypothesised to explain the high levels of polymorphism seen in many host immune and parasite antigen loci. Here, we sequence the genomes of several individuals of Heligmosomoides bakeri, a model parasite of house mice, and Heligmosomoides polygyrus, a closely related parasite of wood mice. Although H. bakeri is commonly referred to as H. polygyrus in the literature, their genomes show levels of divergence that are consistent with at least a million years of independent evolution. The genomes of both species contain hyper-divergent haplotypes that are enriched for proteins that interact with the host immune response. Many of these haplotypes originated prior to the divergence between H. bakeri and H. polygyrus, suggesting that they have been maintained by long-term balancing selection. Together, our results suggest that the selection pressures exerted by the host immune response have played a key role in shaping patterns of genetic diversity in the genomes of parasitic nematodes.
Collapse
Affiliation(s)
- Lewis Stevens
- Tree of Life, Wellcome Sanger Institute, Hinxton, UK.
| | - Isaac Martínez-Ugalde
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| | - Erna King
- Tree of Life, Wellcome Sanger Institute, Hinxton, UK
| | - Martin Wagah
- Tree of Life, Wellcome Sanger Institute, Hinxton, UK
| | | | - Rowan Bancroft
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| | | | - Jessica L Hall
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| | | | | | - Sarah Pelan
- Tree of Life, Wellcome Sanger Institute, Hinxton, UK
| | - Elaine Robertson
- Institute of Immunology & Infection Research, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| | - Amy B Pedersen
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| | - Cei Abreu-Goodger
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| | - Amy H Buck
- Institute of Immunology & Infection Research, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| | - Mark Blaxter
- Tree of Life, Wellcome Sanger Institute, Hinxton, UK.
| |
Collapse
|
4
|
Timoshevskaya N, Eşkut KI, Timoshevskiy VA, Robb SMC, Holt C, Hess JE, Parker HJ, Baker CF, Miller AK, Saraceno C, Yandell M, Krumlauf R, Narum SR, Lampman RT, Gemmell NJ, Mountcastle J, Haase B, Balacco JR, Formenti G, Pelan S, Sims Y, Howe K, Fedrigo O, Jarvis ED, Smith JJ. An improved germline genome assembly for the sea lamprey Petromyzon marinus illuminates the evolution of germline-specific chromosomes. Cell Rep 2023; 42:112263. [PMID: 36930644 PMCID: PMC10166183 DOI: 10.1016/j.celrep.2023.112263] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 10/17/2022] [Accepted: 02/28/2023] [Indexed: 03/17/2023] Open
Abstract
Programmed DNA loss is a gene silencing mechanism that is employed by several vertebrate and nonvertebrate lineages, including all living jawless vertebrates and songbirds. Reconstructing the evolution of somatically eliminated (germline-specific) sequences in these species has proven challenging due to a high content of repeats and gene duplications in eliminated sequences and a corresponding lack of highly accurate and contiguous assemblies for these regions. Here, we present an improved assembly of the sea lamprey (Petromyzon marinus) genome that was generated using recently standardized methods that increase the contiguity and accuracy of vertebrate genome assemblies. This assembly resolves highly contiguous, somatically retained chromosomes and at least one germline-specific chromosome, permitting new analyses that reconstruct the timing, mode, and repercussions of recruitment of genes to the germline-specific fraction. These analyses reveal major roles of interchromosomal segmental duplication, intrachromosomal duplication, and positive selection for germline functions in the long-term evolution of germline-specific chromosomes.
Collapse
Affiliation(s)
| | - Kaan I Eşkut
- Department of Biology, University of Kentucky, Lexington, KY 40506, USA
| | | | - Sofia M C Robb
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Carson Holt
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Jon E Hess
- Columbia River Inter-Tribal Fish Commission, Portland, OR 97232, USA
| | - Hugo J Parker
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Cindy F Baker
- National Institute of Water and Atmospheric Research Limited (NIWA), Hamilton, Waikato 3261, New Zealand
| | - Allison K Miller
- Department of Anatomy, School of Biomedical Sciences, University of Otago, Dunedin, Otago 9054, New Zealand
| | - Cody Saraceno
- Department of Biology, University of Kentucky, Lexington, KY 40506, USA
| | - Mark Yandell
- Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Robb Krumlauf
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA; Department of Anatomy & Cell Biology, The University of Kansas School of Medicine, Kansas City, KS 66160, USA
| | - Shawn R Narum
- Columbia River Inter-Tribal Fish Commission, Hagerman, ID 83332, USA
| | - Ralph T Lampman
- Yakama Nation Fisheries Resource Management Program, Pacific Lamprey Project, Toppenish, WA 98948, USA
| | - Neil J Gemmell
- Department of Anatomy, School of Biomedical Sciences, University of Otago, Dunedin, Otago 9054, New Zealand
| | | | - Bettina Haase
- Vertebrate Genome Lab, The Rockefeller University, New York, NY 10065, USA
| | - Jennifer R Balacco
- Vertebrate Genome Lab, The Rockefeller University, New York, NY 10065, USA
| | - Giulio Formenti
- Vertebrate Genome Lab, The Rockefeller University, New York, NY 10065, USA; Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY 10065, USA
| | - Sarah Pelan
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - Ying Sims
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - Olivier Fedrigo
- Vertebrate Genome Lab, The Rockefeller University, New York, NY 10065, USA
| | - Erich D Jarvis
- Vertebrate Genome Lab, The Rockefeller University, New York, NY 10065, USA; Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY 10065, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | - Jeramiah J Smith
- Department of Biology, University of Kentucky, Lexington, KY 40506, USA.
| |
Collapse
|
5
|
Ayala D, Akone-Ella O, Kengne P, Johnson H, Heaton H, Collins J, Krasheninnikova K, Pelan S, Pointon DL, Sims Y, Torrance J, Tracey A, Uliano-Silva M, von Wyschetzki K, Wood J, McCarthy S, Neafsey D, Makunin A, Lawniczak M. The genome sequence of the malaria mosquito, Anopheles funestus, Giles, 1900. Wellcome Open Res 2022; 7:287. [PMID: 36874567 PMCID: PMC9975407.2 DOI: 10.12688/wellcomeopenres.18445.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/15/2023] [Indexed: 03/29/2023] Open
Abstract
We present a genome assembly from an individual female Anopheles funestus (the malaria mosquito; Arthropoda; Insecta; Diptera; Culicidae). The genome sequence is 251 megabases in span. The majority of the assembly is scaffolded into three chromosomal pseudomolecules with the X sex chromosome assembled. The complete mitochondrial genome was also assembled and is 15.4 kilobases in length.
Collapse
Affiliation(s)
- Diego Ayala
- MIVEGEC, IRD, Montpellier, 34394, France
- ESV-GAB, Centre Interdisciplinaire de Recherches Médicales de Franceville (CIRMF), Franceville, BP 769, Gabon
| | - Ousman Akone-Ella
- ESV-GAB, Centre Interdisciplinaire de Recherches Médicales de Franceville (CIRMF), Franceville, BP 769, Gabon
| | - Pierre Kengne
- MIVEGEC, IRD, Montpellier, 34394, France
- ESV-GAB, Centre Interdisciplinaire de Recherches Médicales de Franceville (CIRMF), Franceville, BP 769, Gabon
| | - Harriet Johnson
- Scientific Operations, Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
| | | | - Joanna Collins
- Tree of Life, Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
| | | | - Sarah Pelan
- Tree of Life, Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
| | | | - Ying Sims
- Tree of Life, Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
| | - James Torrance
- Tree of Life, Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
| | - Alan Tracey
- Tree of Life, Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
| | | | | | - Jonathan Wood
- Tree of Life, Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
| | | | - Shane McCarthy
- Tree of Life, Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
| | - Daniel Neafsey
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- Infectious Disease and Microbiome Program, Broad Institute, Cambridge, MA, 02142, USA
| | - Alex Makunin
- Tree of Life, Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
| | - Mara Lawniczak
- Tree of Life, Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
| |
Collapse
|
6
|
Ayala D, Akone-Ella O, Kengne P, Johnson H, Heaton H, Collins J, Krasheninnikova K, Pelan S, Pointon DL, Sims Y, Torrance J, Tracey A, Uliano-Silva M, von Wyschetzki K, Wood J, McCarthy S, Neafsey D, Makunin A, Lawniczak M. The genome sequence of the malaria mosquito, Anopheles funestus, Giles, 1900. Wellcome Open Res 2022; 7:287. [PMID: 36874567 PMCID: PMC9975407 DOI: 10.12688/wellcomeopenres.18445.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/20/2022] [Indexed: 11/27/2022] Open
Abstract
We present a genome assembly from an individual female Anopheles funestus (the malaria mosquito; Arthropoda; Insecta; Diptera; Culicidae). The genome sequence is 251 megabases in span. The majority of the assembly is scaffolded into three chromosomal pseudomolecules with the X sex chromosome assembled. The complete mitochondrial genome was also assembled and is 15.4 kilobases in length.
Collapse
Affiliation(s)
- Diego Ayala
- MIVEGEC, IRD, Montpellier, 34394, France.,ESV-GAB, Centre Interdisciplinaire de Recherches Médicales de Franceville (CIRMF), Franceville, BP 769, Gabon
| | - Ousman Akone-Ella
- ESV-GAB, Centre Interdisciplinaire de Recherches Médicales de Franceville (CIRMF), Franceville, BP 769, Gabon
| | - Pierre Kengne
- MIVEGEC, IRD, Montpellier, 34394, France.,ESV-GAB, Centre Interdisciplinaire de Recherches Médicales de Franceville (CIRMF), Franceville, BP 769, Gabon
| | - Harriet Johnson
- Scientific Operations, Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
| | | | - Joanna Collins
- Tree of Life, Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
| | | | - Sarah Pelan
- Tree of Life, Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
| | | | - Ying Sims
- Tree of Life, Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
| | - James Torrance
- Tree of Life, Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
| | - Alan Tracey
- Tree of Life, Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
| | | | | | - Jonathan Wood
- Tree of Life, Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
| | | | - Shane McCarthy
- Tree of Life, Wellcome Sanger Institute, Hinxton, CB10 1SA, UK.,Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
| | - Daniel Neafsey
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.,Infectious Disease and Microbiome Program, Broad Institute, Cambridge, MA, 02142, USA
| | - Alex Makunin
- Tree of Life, Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
| | - Mara Lawniczak
- Tree of Life, Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
| |
Collapse
|
7
|
Yen EC, McCarthy SA, Galarza JA, Generalovic TN, Pelan S, Nguyen P, Meier JI, Warren IA, Mappes J, Durbin R, Jiggins CD. Correction to: A haplotype-resolved, de novo genome assembly for the wood tiger moth (Arctia plantaginis) through trio binning. Gigascience 2021; 10:6409162. [PMID: 34687311 PMCID: PMC8538893 DOI: 10.1093/gigascience/giab073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Eugenie C Yen
- Department of Zoology, University of Cambridge, Downing Street, Cambridge CB2 3EJ, UK
| | - Shane A McCarthy
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK.,Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Saffron Walden CB10 1SA, UK
| | - Juan A Galarza
- Department of Biological and Environmental Science, University of Jyväskylä, FI-40014 Jyväskylä, Finland
| | - Tomas N Generalovic
- Department of Zoology, University of Cambridge, Downing Street, Cambridge CB2 3EJ, UK
| | - Sarah Pelan
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Saffron Walden CB10 1SA, UK
| | - Petr Nguyen
- Biology Centre of the Czech Academy of Sciences, Institute of Entomology, Branišovská 1160/31, 370 05 České Budějovice, Czech Republic.,University of South Bohemia, Faculty of Science, Branišovská 1645/31A, 370 05 České Budějovice, Czech Republic
| | - Joana I Meier
- Department of Zoology, University of Cambridge, Downing Street, Cambridge CB2 3EJ, UK.,St John's College, University of Cambridge, St John's Street, Cambridge CB2 1TP, UK
| | - Ian A Warren
- Department of Zoology, University of Cambridge, Downing Street, Cambridge CB2 3EJ, UK
| | - Johanna Mappes
- Department of Biological and Environmental Science, University of Jyväskylä, FI-40014 Jyväskylä, Finland
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK.,Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Saffron Walden CB10 1SA, UK
| | - Chris D Jiggins
- Department of Zoology, University of Cambridge, Downing Street, Cambridge CB2 3EJ, UK.,St John's College, University of Cambridge, St John's Street, Cambridge CB2 1TP, UK
| |
Collapse
|
8
|
Dussex N, van der Valk T, Morales HE, Wheat CW, Díez-del-Molino D, von Seth J, Foster Y, Kutschera VE, Guschanski K, Rhie A, Phillippy AM, Korlach J, Howe K, Chow W, Pelan S, Mendes Damas JD, Lewin HA, Hastie AR, Formenti G, Fedrigo O, Guhlin J, Harrop TW, Le Lec MF, Dearden PK, Haggerty L, Martin FJ, Kodali V, Thibaud-Nissen F, Iorns D, Knapp M, Gemmell NJ, Robertson F, Moorhouse R, Digby A, Eason D, Vercoe D, Howard J, Jarvis ED, Robertson BC, Dalén L. Population genomics of the critically endangered kākāpō. Cell Genom 2021; 1:100002. [PMID: 36777713 PMCID: PMC9903828 DOI: 10.1016/j.xgen.2021.100002] [Citation(s) in RCA: 66] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 04/23/2021] [Accepted: 06/22/2021] [Indexed: 12/30/2022]
Abstract
The kākāpō is a flightless parrot endemic to New Zealand. Once common in the archipelago, only 201 individuals remain today, most of them descending from an isolated island population. We report the first genome-wide analyses of the species, including a high-quality genome assembly for kākāpō, one of the first chromosome-level reference genomes sequenced by the Vertebrate Genomes Project (VGP). We also sequenced and analyzed 35 modern genomes from the sole surviving island population and 14 genomes from the extinct mainland population. While theory suggests that such a small population is likely to have accumulated deleterious mutations through genetic drift, our analyses on the impact of the long-term small population size in kākāpō indicate that present-day island kākāpō have a reduced number of harmful mutations compared to mainland individuals. We hypothesize that this reduced mutational load is due to the island population having been subjected to a combination of genetic drift and purging of deleterious mutations, through increased inbreeding and purifying selection, since its isolation from the mainland ∼10,000 years ago. Our results provide evidence that small populations can survive even when isolated for hundreds of generations. This work provides key insights into kākāpō breeding and recovery and more generally into the application of genetic tools in conservation efforts for endangered species.
Collapse
Affiliation(s)
- Nicolas Dussex
- Centre for Palaeogenetics, Svante Arrhenius väg 20C, 10691 Stockholm, Sweden,Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Box 50007, 10405 Stockholm, Sweden,Department of Zoology, Stockholm University, 10691 Stockholm, Sweden,Department of Anatomy, University of Otago, PO Box 913, Dunedin 9016, New Zealand,Corresponding author
| | - Tom van der Valk
- Centre for Palaeogenetics, Svante Arrhenius väg 20C, 10691 Stockholm, Sweden,Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Box 50007, 10405 Stockholm, Sweden
| | - Hernán E. Morales
- Section for Evolutionary Genomics, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
| | | | - David Díez-del-Molino
- Centre for Palaeogenetics, Svante Arrhenius väg 20C, 10691 Stockholm, Sweden,Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Box 50007, 10405 Stockholm, Sweden
| | - Johanna von Seth
- Centre for Palaeogenetics, Svante Arrhenius väg 20C, 10691 Stockholm, Sweden,Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Box 50007, 10405 Stockholm, Sweden,Department of Zoology, Stockholm University, 10691 Stockholm, Sweden
| | - Yasmin Foster
- Department of Zoology, University of Otago, PO Box 56, Dunedin 9054, New Zealand
| | - Verena E. Kutschera
- Department of Biochemistry and Biophysics, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden
| | - Katerina Guschanski
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, UK,Department of Ecology and Genetics, Animal Ecology, Uppsala University, 75236 Uppsala, Sweden
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Jonas Korlach
- Pacific Biosciences, 1305 O’Brien Drive, Menlo Park, CA 94025, USA
| | - Kerstin Howe
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK
| | - William Chow
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK
| | - Sarah Pelan
- Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK
| | - Joanna D. Mendes Damas
- Department of Evolution and Ecology and the UC Davis Genome Center, 4321 Genome and Biomedical Sciences Facility, University of California Davis, Davis, CA 95616, USA
| | - Harris A. Lewin
- Department of Evolution and Ecology and the UC Davis Genome Center, 4321 Genome and Biomedical Sciences Facility, University of California Davis, Davis, CA 95616, USA
| | - Alex R. Hastie
- Bionano Genomics, 9540 Towne Centre Drive, San Diego, CA 92121, USA
| | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY 10065, USA,Laboratory of Neurogenetics of Language, Box 54, The Rockefeller University, New York, NY 10065, USA,Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY 10065, USA
| | - Joseph Guhlin
- Genomics Aotearoa and Laboratory for Evolution and Development, Department of Biochemistry, University of Otago, PO Box 56, Dunedin 9016, New Zealand
| | - Thomas W.R. Harrop
- Genomics Aotearoa and Laboratory for Evolution and Development, Department of Biochemistry, University of Otago, PO Box 56, Dunedin 9016, New Zealand
| | - Marissa F. Le Lec
- Genomics Aotearoa and Laboratory for Evolution and Development, Department of Biochemistry, University of Otago, PO Box 56, Dunedin 9016, New Zealand
| | - Peter K. Dearden
- Genomics Aotearoa and Laboratory for Evolution and Development, Department of Biochemistry, University of Otago, PO Box 56, Dunedin 9016, New Zealand
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Fergal J. Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Vamsi Kodali
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - David Iorns
- The Genetic Rescue Foundation, Wellington, New Zealand
| | - Michael Knapp
- Department of Anatomy, University of Otago, PO Box 913, Dunedin 9016, New Zealand
| | - Neil J. Gemmell
- Department of Anatomy, University of Otago, PO Box 913, Dunedin 9016, New Zealand
| | - Fiona Robertson
- Department of Zoology, University of Otago, PO Box 56, Dunedin 9054, New Zealand
| | - Ron Moorhouse
- Kākāpō Recovery, Department of Conservation, PO Box 743, Invercargill 9840, New Zealand
| | - Andrew Digby
- Kākāpō Recovery, Department of Conservation, PO Box 743, Invercargill 9840, New Zealand
| | - Daryl Eason
- Kākāpō Recovery, Department of Conservation, PO Box 743, Invercargill 9840, New Zealand
| | - Deidre Vercoe
- Kākāpō Recovery, Department of Conservation, PO Box 743, Invercargill 9840, New Zealand
| | - Jason Howard
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY 10065, USA,BioSkryb Genomics, 701 W Main Street, Suite 200, Durham, NC 27701, USA
| | - Erich D. Jarvis
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY 10065, USA,Laboratory of Neurogenetics of Language, Box 54, The Rockefeller University, New York, NY 10065, USA,Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA,Corresponding author
| | - Bruce C. Robertson
- Department of Zoology, University of Otago, PO Box 56, Dunedin 9054, New Zealand,Corresponding author
| | - Love Dalén
- Centre for Palaeogenetics, Svante Arrhenius väg 20C, 10691 Stockholm, Sweden,Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Box 50007, 10405 Stockholm, Sweden,Department of Zoology, Stockholm University, 10691 Stockholm, Sweden,Corresponding author
| |
Collapse
|
9
|
Sætre CLC, Eroukhmanoff F, Rönkä K, Kluen E, Thorogood R, Torrance J, Tracey A, Chow W, Pelan S, Howe K, Jakobsen KS, Tørresen OK. A Chromosome-Level Genome Assembly of the Reed Warbler (Acrocephalus scirpaceus). Genome Biol Evol 2021; 13:6367782. [PMID: 34499122 PMCID: PMC8459166 DOI: 10.1093/gbe/evab212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/06/2021] [Indexed: 11/13/2022] Open
Abstract
The reed warbler (Acrocephalus scirpaceus) is a long-distance migrant passerine with a wide distribution across Eurasia. This species has fascinated researchers for decades, especially its role as host of a brood parasite, and its capacity for rapid phenotypic change in the face of climate change. Currently, it is expanding its range northwards in Europe, and is altering its migratory behavior in certain areas. Thus, there is great potential to discover signs of recent evolution and its impact on the genomic composition of the reed warbler. Here, we present a high-quality reference genome for the reed warbler, based on PacBio, 10×, and Hi-C sequencing. The genome has an assembly size of 1,075,083,815 bp with a scaffold N50 of 74,438,198 bp and a contig N50 of 12,742,779 bp. BUSCO analysis using aves_odb10 as a model showed that 95.7% of BUSCO genes were complete. We found unequivocal evidence of two separate macrochromosomal fusions in the reed warbler genome, in addition to the previously identified fusion between chromosome Z and a part of chromosome 4A in the Sylvioidea superfamily. We annotated 14,645 protein-coding genes, and a BUSCO analysis of the protein sequences indicated 97.5% completeness. This reference genome will serve as an important resource, and will provide new insights into the genomic effects of evolutionary drivers such as coevolution, range expansion, and adaptations to climate change, as well as chromosomal rearrangements in birds.
Collapse
Affiliation(s)
| | | | - Katja Rönkä
- HiLIFE Helsinki Institute of Life Sciences, University of Helsinki, Finland.,Research Programme in Organismal and Evolutionary Biology, Faculty of Biological and Environmental Sciences, University of Helsinki, Finland
| | - Edward Kluen
- HiLIFE Helsinki Institute of Life Sciences, University of Helsinki, Finland.,Research Programme in Organismal and Evolutionary Biology, Faculty of Biological and Environmental Sciences, University of Helsinki, Finland
| | - Rose Thorogood
- HiLIFE Helsinki Institute of Life Sciences, University of Helsinki, Finland.,Research Programme in Organismal and Evolutionary Biology, Faculty of Biological and Environmental Sciences, University of Helsinki, Finland
| | - James Torrance
- Tree of Life, Wellcome Sanger Institute, Cambridge, United Kingdom
| | - Alan Tracey
- Tree of Life, Wellcome Sanger Institute, Cambridge, United Kingdom
| | - William Chow
- Tree of Life, Wellcome Sanger Institute, Cambridge, United Kingdom
| | - Sarah Pelan
- Tree of Life, Wellcome Sanger Institute, Cambridge, United Kingdom
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Cambridge, United Kingdom
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis, University of Oslo, Norway
| | - Ole K Tørresen
- Centre for Ecological and Evolutionary Synthesis, University of Oslo, Norway
| |
Collapse
|
10
|
Dunn JC, Hamer KC, Morris AJ, Grice PV, Smith M, Corton C, Oliver K, Skelton J, Betteridge E, Dolucan J, Quail MA, McCarthy SA, Uliano-Silva M, Howe K, Torrance J, Chow W, Pelan S, Sims Y, Challis R, Threlfall J, Mead D, Blaxter M. The genome sequence of the European turtle dove, Streptopelia turtur Linnaeus 1758. Wellcome Open Res 2021. [DOI: 10.12688/wellcomeopenres.17060.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
We present a genome assembly from an individual female Streptopelia turtur (the European turtle dove; Chordata; Aves; Columbidae). The genome sequence is 1.18 gigabases in span. The majority of the assembly is scaffolded into 35 chromosomal pseudomolecules, with the W and Z sex chromosomes assembled.
Collapse
|
11
|
Dunn JC, Liedvogel M, Smith M, Corton C, Oliver K, Skelton J, Betteridge E, Dolucan J, Quail MA, Uliano-Silva M, McCarthy SA, Howe K, Torrance J, Wood J, Pelan S, Sims Y, Challis R, Threlfall J, Mead D, Blaxter M. The genome sequence of the European robin, Erithacus rubecula Linnaeus 1758. Wellcome Open Res 2021. [DOI: 10.12688/wellcomeopenres.16988.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
We present a genome assembly from an individual female Erithacus rubecula (the European robin; Chordata; Aves; Passeriformes; Turdidae). The genome sequence is 1.09 gigabases in span. The majority of the assembly is scaffolded into 36 chromosomal pseudomolecules, with both W and Z sex chromosomes assembled.
Collapse
|
12
|
Carpenter AI, Smith M, Corton C, Oliver K, Skelton J, Betteridge E, Doulcan J, Quail MA, McCarthy SA, Uliano Da Silva M, Howe K, Torrance J, Wood J, Pelan S, Sims Y, Tricomi FF, Challis R, Threlfall J, Mead D, Blaxter M. The genome sequence of the European water vole, Arvicola amphibius Linnaeus 1758. Wellcome Open Res 2021; 6:162. [PMID: 35600244 PMCID: PMC9114827 DOI: 10.12688/wellcomeopenres.16753.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/07/2021] [Indexed: 11/20/2022] Open
Abstract
We present a genome assembly from an individual male Arvicola amphibius (the European water vole; Chordata; Mammalia; Rodentia; Cricetidae). The genome sequence is 2.30 gigabases in span. The majority of the assembly is scaffolded into 18 chromosomal pseudomolecules, including the X sex chromosome. Gene annotation of this assembly on Ensembl has identified 21,394 protein coding genes.
Collapse
Affiliation(s)
| | - Michelle Smith
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Craig Corton
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Karen Oliver
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Jason Skelton
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Emma Betteridge
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Jale Doulcan
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
- Achilles Therapeutics Plc, London, W6 8PW, UK
| | - Michael A. Quail
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Shane A. McCarthy
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
| | | | - Kerstin Howe
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - James Torrance
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Jonathan Wood
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Sarah Pelan
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Ying Sims
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | | | - Richard Challis
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Jonathan Threlfall
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Daniel Mead
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
- Owlstone Medical, Cambridge Science Park, Cambridge, CB4 0GJ, UK
| | - Mark Blaxter
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| |
Collapse
|
13
|
Vine C, Teeling EC, Smith M, Corton C, Oliver K, Skelton J, Betteridge E, Doulcan J, Quail MA, McCarthy SA, Howe K, Torrance J, Wood J, Pelan S, Sims Y, Challis R, Threlfall J, Mead D, Blaxter M. The genome sequence of the common pipistrelle, Pipistrellus pipistrellus Schreber 1774. Wellcome Open Res 2021. [DOI: 10.12688/wellcomeopenres.16895.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
We present a genome assembly from an individual female Pipistrellus pipistrellus (the common pipistrelle; Chordata; Mammalia; Chiroptera; Vespertilionidae). The genome sequence is 1.76 gigabases in span. The majority of the assembly is scaffolded into 21 chromosomal pseudomolecules, with the X sex chromosome assembled.
Collapse
|
14
|
Rhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S, Uliano-Silva M, Chow W, Fungtammasan A, Kim J, Lee C, Ko BJ, Chaisson M, Gedman GL, Cantin LJ, Thibaud-Nissen F, Haggerty L, Bista I, Smith M, Haase B, Mountcastle J, Winkler S, Paez S, Howard J, Vernes SC, Lama TM, Grutzner F, Warren WC, Balakrishnan CN, Burt D, George JM, Biegler MT, Iorns D, Digby A, Eason D, Robertson B, Edwards T, Wilkinson M, Turner G, Meyer A, Kautt AF, Franchini P, Detrich HW, Svardal H, Wagner M, Naylor GJP, Pippel M, Malinsky M, Mooney M, Simbirsky M, Hannigan BT, Pesout T, Houck M, Misuraca A, Kingan SB, Hall R, Kronenberg Z, Sović I, Dunn C, Ning Z, Hastie A, Lee J, Selvaraj S, Green RE, Putnam NH, Gut I, Ghurye J, Garrison E, Sims Y, Collins J, Pelan S, Torrance J, Tracey A, Wood J, Dagnew RE, Guan D, London SE, Clayton DF, Mello CV, Friedrich SR, Lovell PV, Osipova E, Al-Ajli FO, Secomandi S, Kim H, Theofanopoulou C, Hiller M, Zhou Y, Harris RS, Makova KD, Medvedev P, Hoffman J, Masterson P, Clark K, Martin F, Howe K, Flicek P, Walenz BP, Kwak W, Clawson H, Diekhans M, Nassar L, Paten B, Kraus RHS, Crawford AJ, Gilbert MTP, Zhang G, Venkatesh B, Murphy RW, Koepfli KP, Shapiro B, Johnson WE, Di Palma F, Marques-Bonet T, Teeling EC, Warnow T, Graves JM, Ryder OA, Haussler D, O'Brien SJ, Korlach J, Lewin HA, Howe K, Myers EW, Durbin R, Phillippy AM, Jarvis ED. Towards complete and error-free genome assemblies of all vertebrate species. Nature 2021; 592:737-746. [PMID: 33911273 PMCID: PMC8081667 DOI: 10.1038/s41586-021-03451-0] [Citation(s) in RCA: 617] [Impact Index Per Article: 205.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 03/12/2021] [Indexed: 02/02/2023]
Abstract
High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1-4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
Collapse
Affiliation(s)
- Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shane A McCarthy
- Department of Genetics, University of Cambridge, Cambridge, UK
- Wellcome Sanger Institute, Cambridge, UK
| | - Olivier Fedrigo
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | - Joana Damas
- The Genome Center, University of California Davis, Davis, CA, USA
| | - Giulio Formenti
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Marcela Uliano-Silva
- Leibniz Institute for Zoo and Wildlife Research, Department of Evolutionary Genetics, Berlin, Germany
- Berlin Center for Genomics in Biodiversity Research, Berlin, Germany
| | | | | | - Juwan Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Chul Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Byung June Ko
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
| | - Mark Chaisson
- University of Southern California, Los Angeles, CA, USA
| | - Gregory L Gedman
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Lindsey J Cantin
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Francoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, USA
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Iliana Bista
- Department of Genetics, University of Cambridge, Cambridge, UK
- Wellcome Sanger Institute, Cambridge, UK
| | | | - Bettina Haase
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | | | - Sylke Winkler
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- DRESDEN-concept Genome Center, Dresden, Germany
| | - Sadye Paez
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | | | - Sonja C Vernes
- Neurogenetics of Vocal Communication Group, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands
- School of Biology, University of St Andrews, St Andrews, UK
| | - Tanya M Lama
- University of Massachusetts Cooperative Fish and Wildlife Research Unit, Amherst, MA, USA
| | - Frank Grutzner
- School of Biological Science, The Environment Institute, University of Adelaide, Adelaide, South Australia, Australia
| | - Wesley C Warren
- Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | | | - Dave Burt
- UQ Genomics, University of Queensland, Brisbane, Queensland, Australia
| | - Julia M George
- Department of Biological Sciences, Clemson University, Clemson, SC, USA
| | - Matthew T Biegler
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - David Iorns
- The Genetic Rescue Foundation, Wellington, New Zealand
| | - Andrew Digby
- Kākāpō Recovery, Department of Conservation, Invercargill, New Zealand
| | - Daryl Eason
- Kākāpō Recovery, Department of Conservation, Invercargill, New Zealand
| | - Bruce Robertson
- Department of Zoology, University of Otago, Dunedin, New Zealand
| | | | - Mark Wilkinson
- Department of Life Sciences, Natural History Museum, London, UK
| | - George Turner
- School of Natural Sciences, Bangor University, Gwynedd, UK
| | - Axel Meyer
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Andreas F Kautt
- Department of Biology, University of Konstanz, Konstanz, Germany
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Paolo Franchini
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - H William Detrich
- Department of Marine and Environmental Sciences, Northeastern University Marine Science Center, Nahant, MA, USA
| | - Hannes Svardal
- Department of Biology, University of Antwerp, Antwerp, Belgium
- Naturalis Biodiversity Center, Leiden, The Netherlands
| | - Maximilian Wagner
- Institute of Biology, Karl-Franzens University of Graz, Graz, Austria
| | - Gavin J P Naylor
- Florida Museum of Natural History, University of Florida, Gainesville, FL, USA
| | - Martin Pippel
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- Center for Systems Biology, Dresden, Germany
| | - Milan Malinsky
- Wellcome Sanger Institute, Cambridge, UK
- Zoological Institute, University of Basel, Basel, Switzerland
| | | | | | | | - Trevor Pesout
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | | | | | | | | | | | - Ivan Sović
- Pacific Biosciences, Menlo Park, CA, USA
- Digital BioLogic, Ivanić-Grad, Croatia
| | | | - Zemin Ning
- Wellcome Sanger Institute, Cambridge, UK
| | | | - Joyce Lee
- Bionano Genomics, San Diego, CA, USA
| | | | - Richard E Green
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
- Dovetail Genomics, Santa Cruz, CA, USA
| | | | - Ivo Gut
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Jay Ghurye
- Dovetail Genomics, Santa Cruz, CA, USA
- Department of Computer Science, University of Maryland College Park, College Park, MD, USA
| | - Erik Garrison
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Ying Sims
- Wellcome Sanger Institute, Cambridge, UK
| | | | | | | | | | | | | | - Dengfeng Guan
- Department of Genetics, University of Cambridge, Cambridge, UK
- School of Computer Science and Technology, Center for Bioinformatics, Harbin Institute of Technology, Harbin, China
| | - Sarah E London
- Department of Psychology, Institute for Mind and Biology, University of Chicago, Chicago, IL, USA
| | - David F Clayton
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA
| | - Claudio V Mello
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR, USA
| | - Samantha R Friedrich
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR, USA
| | - Peter V Lovell
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR, USA
| | - Ekaterina Osipova
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- Center for Systems Biology, Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, Dresden, Germany
| | - Farooq O Al-Ajli
- Monash University Malaysia Genomics Facility, School of Science, Selangor Darul Ehsan, Malaysia
- Tropical Medicine and Biology Multidisciplinary Platform, Monash University Malaysia, Selangor Darul Ehsan, Malaysia
- Qatar Falcon Genome Project, Doha, Qatar
| | | | - Heebal Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
- eGnome, Inc., Seoul, Republic of Korea
| | | | - Michael Hiller
- LOEWE Centre for Translational Biodiversity Genomics, Frankfurt, Germany
- Senckenberg Research Institute, Frankfurt, Germany
- Goethe-University, Faculty of Biosciences, Frankfurt, Germany
| | | | - Robert S Harris
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University, University Park, PA, USA
- Center for Medical Genomics, Pennsylvania State University, University Park, PA, USA
- Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, USA
| | - Paul Medvedev
- Center for Medical Genomics, Pennsylvania State University, University Park, PA, USA
- Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, USA
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
| | - Jinna Hoffman
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, USA
| | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, USA
| | - Karen Clark
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, USA
| | - Fergal Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Kevin Howe
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Brian P Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Woori Kwak
- eGnome, Inc., Seoul, Republic of Korea
- Hoonygen, Seoul, Korea
| | - Hiram Clawson
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Luis Nassar
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Robert H S Kraus
- Department of Biology, University of Konstanz, Konstanz, Germany
- Department of Migration, Max Planck Institute of Animal Behavior, Radolfzell, Germany
| | - Andrew J Crawford
- Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - M Thomas P Gilbert
- Center for Evolutionary Hologenomics, The GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
- University Museum, NTNU, Trondheim, Norway
| | - Guojie Zhang
- China National Genebank, BGI-Shenzhen, Shenzhen, China
- Villum Center for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| | - Byrappa Venkatesh
- Institute of Molecular and Cell Biology, A*STAR, Biopolis, Singapore, Singapore
| | - Robert W Murphy
- Centre for Biodiversity, Royal Ontario Museum, Toronto, Ontario, Canada
| | - Klaus-Peter Koepfli
- Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park, Washington, DC, USA
| | - Beth Shapiro
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Warren E Johnson
- Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park, Washington, DC, USA
- The Walter Reed Biosystematics Unit, Museum Support Center MRC-534, Smithsonian Institution, Suitland, MD, USA
- Walter Reed Army Institute of Research, Silver Spring, MD, USA
| | - Federica Di Palma
- Department of Biological Sciences, Earlham Institute, University of East Anglia, Norwich, UK
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Barcelona, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), Barcelona, Spain
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Emma C Teeling
- School of Biology and Environmental Science, University College Dublin, Dublin, Ireland
| | - Tandy Warnow
- Department of Computer Science, The University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | | | - Oliver A Ryder
- San Diego Zoo Global, Escondido, CA, USA
- Department of Evolution, Behavior, and Ecology, University of California San Diego, La Jolla, CA, USA
| | - David Haussler
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Stephen J O'Brien
- Laboratory of Genomics Diversity-Center for Computer Technologies, ITMO University, St. Petersburg, Russian Federation
- Guy Harvey Oceanographic Center, Halmos College of Natural Sciences and Oceanography, Nova Southeastern University, Fort Lauderdale, FL, USA
| | | | - Harris A Lewin
- The Genome Center, University of California Davis, Davis, CA, USA
- Department of Evolution and Ecology, University of California Davis, Davis, CA, USA
- John Muir Institute for the Environment, University of California Davis, Davis, CA, USA
| | | | - Eugene W Myers
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany.
- Center for Systems Biology, Dresden, Germany.
- Faculty of Computer Science, Technical University Dresden, Dresden, Germany.
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Cambridge, UK.
- Wellcome Sanger Institute, Cambridge, UK.
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Erich D Jarvis
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA.
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA.
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| |
Collapse
|
15
|
Howe K, Chow W, Collins J, Pelan S, Pointon DL, Sims Y, Torrance J, Tracey A, Wood J. Significantly improving the quality of genome assemblies through curation. Gigascience 2021; 10:giaa153. [PMID: 33420778 PMCID: PMC7794651 DOI: 10.1093/gigascience/giaa153] [Citation(s) in RCA: 386] [Impact Index Per Article: 128.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Revised: 11/17/2020] [Accepted: 11/30/2020] [Indexed: 11/29/2022] Open
Abstract
Genome sequence assemblies provide the basis for our understanding of biology. Generating error-free assemblies is therefore the ultimate, but sadly still unachieved goal of a multitude of research projects. Despite the ever-advancing improvements in data generation, assembly algorithms and pipelines, no automated approach has so far reliably generated near error-free genome assemblies for eukaryotes. Whilst working towards improved datasets and fully automated pipelines, assembly evaluation and curation is actively used to bridge this shortcoming and significantly reduce the number of assembly errors. In addition to this increase in product value, the insights gained from assembly curation are fed back into the automated assembly strategy and contribute to notable improvements in genome assembly quality. We describe our tried and tested approach for assembly curation using gEVAL, the genome evaluation browser. We outline the procedures applied to genome curation using gEVAL and also our recommendations for assembly curation in a gEVAL-independent context to facilitate the uptake of genome curation in the wider community.
Collapse
Affiliation(s)
- Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - William Chow
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - Joanna Collins
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - Sarah Pelan
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | | | - Ying Sims
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - James Torrance
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - Alan Tracey
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - Jonathan Wood
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| |
Collapse
|
16
|
Morin PA, Archer FI, Avila CD, Balacco JR, Bukhman YV, Chow W, Fedrigo O, Formenti G, Fronczek JA, Fungtammasan A, Gulland FMD, Haase B, Peter Heide-Jorgensen M, Houck ML, Howe K, Misuraca AC, Mountcastle J, Musser W, Paez S, Pelan S, Phillippy A, Rhie A, Robinson J, Rojas-Bracho L, Rowles TK, Ryder OA, Smith CR, Stevenson S, Taylor BL, Teilmann J, Torrance J, Wells RS, Westgate AJ, Jarvis ED. Reference genome and demographic history of the most endangered marine mammal, the vaquita. Mol Ecol Resour 2020; 21:1008-1020. [PMID: 33089966 PMCID: PMC8247363 DOI: 10.1111/1755-0998.13284] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Revised: 09/08/2020] [Accepted: 10/08/2020] [Indexed: 12/12/2022]
Abstract
The vaquita is the most critically endangered marine mammal, with fewer than 19 remaining in the wild. First described in 1958, the vaquita has been in rapid decline for more than 20 years resulting from inadvertent deaths due to the increasing use of large-mesh gillnets. To understand the evolutionary and demographic history of the vaquita, we used combined long-read sequencing and long-range scaffolding methods with long- and short-read RNA sequencing to generate a near error-free annotated reference genome assembly from cell lines derived from a female individual. The genome assembly consists of 99.92% of the assembled sequence contained in 21 nearly gapless chromosome-length autosome scaffolds and the X-chromosome scaffold, with a scaffold N50 of 115 Mb. Genome-wide heterozygosity is the lowest (0.01%) of any mammalian species analysed to date, but heterozygosity is evenly distributed across the chromosomes, consistent with long-term small population size at genetic equilibrium, rather than low diversity resulting from a recent population bottleneck or inbreeding. Historical demography of the vaquita indicates long-term population stability at less than 5,000 (Ne) for over 200,000 years. Together, these analyses indicate that the vaquita genome has had ample opportunity to purge highly deleterious alleles and potentially maintain diversity necessary for population health.
Collapse
Affiliation(s)
- Phillip A Morin
- Southwest Fisheries Science Center, National Marine Fisheries Service, NOAA, La Jolla, CA, USA
| | - Frederick I Archer
- Southwest Fisheries Science Center, National Marine Fisheries Service, NOAA, La Jolla, CA, USA
| | - Catherine D Avila
- San Diego Zoo Institute for Conservation Research, Escondido, CA, USA
| | - Jennifer R Balacco
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Yury V Bukhman
- Regenerative Biology, Morgridge Institute for Research, Madison, WI, USA
| | | | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Julie A Fronczek
- San Diego Zoo Institute for Conservation Research, Escondido, CA, USA
| | | | | | - Bettina Haase
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | | | - Marlys L Houck
- San Diego Zoo Institute for Conservation Research, Escondido, CA, USA
| | | | - Ann C Misuraca
- San Diego Zoo Institute for Conservation Research, Escondido, CA, USA
| | | | | | - Sadye Paez
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | | | - Adam Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Jacqueline Robinson
- Institute for Human Genetics, University of California, San Francisco, CA, USA
| | | | - Teri K Rowles
- Office of Protected Resources, National Marine Fisheries Service, NOAA, Silver Spring, MD, USA
| | - Oliver A Ryder
- San Diego Zoo Institute for Conservation Research, Escondido, CA, USA
| | | | | | - Barbara L Taylor
- Southwest Fisheries Science Center, National Marine Fisheries Service, NOAA, La Jolla, CA, USA
| | - Jonas Teilmann
- Marine Mammal Research, Department of Bioscience, Aarhus University, Roskilde, Denmark
| | | | - Randall S Wells
- Chicago Zoological Society's Sarasota Dolphin Research Program, c/o Mote Marine Laboratory, Sarasota, FL, USA
| | | | - Erich D Jarvis
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA.,Howard Hughes Medical Institute, Chevy Chase, MD, USA
| |
Collapse
|
17
|
Yen EC, McCarthy SA, Galarza JA, Generalovic TN, Pelan S, Nguyen P, Meier JI, Warren IA, Mappes J, Durbin R, Jiggins CD. A haplotype-resolved, de novo genome assembly for the wood tiger moth (Arctia plantaginis) through trio binning. Gigascience 2020; 9:giaa088. [PMID: 32808665 PMCID: PMC7433188 DOI: 10.1093/gigascience/giaa088] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 07/03/2020] [Accepted: 07/27/2020] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Diploid genome assembly is typically impeded by heterozygosity because it introduces errors when haplotypes are collapsed into a consensus sequence. Trio binning offers an innovative solution that exploits heterozygosity for assembly. Short, parental reads are used to assign parental origin to long reads from their F1 offspring before assembly, enabling complete haplotype resolution. Trio binning could therefore provide an effective strategy for assembling highly heterozygous genomes, which are traditionally problematic, such as insect genomes. This includes the wood tiger moth (Arctia plantaginis), which is an evolutionary study system for warning colour polymorphism. FINDINGS We produced a high-quality, haplotype-resolved assembly for Arctia plantaginis through trio binning. We sequenced a same-species family (F1 heterozygosity ∼1.9%) and used parental Illumina reads to bin 99.98% of offspring Pacific Biosciences reads by parental origin, before assembling each haplotype separately and scaffolding with 10X linked reads. Both assemblies are contiguous (mean scaffold N50: 8.2 Mb) and complete (mean BUSCO completeness: 97.3%), with annotations and 31 chromosomes identified through karyotyping. We used the assembly to analyse genome-wide population structure and relationships between 40 wild resequenced individuals from 5 populations across Europe, revealing the Georgian population as the most genetically differentiated with the lowest genetic diversity. CONCLUSIONS We present the first invertebrate genome to be assembled via trio binning. This assembly is one of the highest quality genomes available for Lepidoptera, supporting trio binning as a potent strategy for assembling heterozygous genomes. Using our assembly, we provide genomic insights into the geographic population structure of A. plantaginis.
Collapse
Affiliation(s)
- Eugenie C Yen
- Department of Zoology, University of Cambridge, Downing
Street, Cambridge CB2 3EJ, UK
| | - Shane A McCarthy
- Department of Genetics, University of Cambridge, Downing
Street, Cambridge CB2 3EH, UK
- Wellcome Sanger Institute, Wellcome Trust Genome Campus,
Hinxton, Saffron Walden CB10 1SA, UK
| | - Juan A Galarza
- Department of Biological and Environmental Science, University of
Jyväskylä, FI-40014 Jyväskylä, Finland
| | - Tomas N Generalovic
- Department of Zoology, University of Cambridge, Downing
Street, Cambridge CB2 3EJ, UK
| | - Sarah Pelan
- Wellcome Sanger Institute, Wellcome Trust Genome Campus,
Hinxton, Saffron Walden CB10 1SA, UK
| | - Petr Nguyen
- Biology Centre of the Czech Academy of Sciences, Institute of
Entomology, Branišovská 1160/31, 370 05 České Budějovice, Czech
Republic
- University of South Bohemia, Faculty of Science, Branišovská
1645/31A, 370 05 České Budějovice, Czech Republic
| | - Joana I Meier
- Department of Zoology, University of Cambridge, Downing
Street, Cambridge CB2 3EJ, UK
- St John's College, University of Cambridge, St John's Street,
Cambridge CB2 1TP, UK
| | - Ian A Warren
- Department of Zoology, University of Cambridge, Downing
Street, Cambridge CB2 3EJ, UK
| | - Johanna Mappes
- Department of Biological and Environmental Science, University of
Jyväskylä, FI-40014 Jyväskylä, Finland
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Downing
Street, Cambridge CB2 3EH, UK
- Wellcome Sanger Institute, Wellcome Trust Genome Campus,
Hinxton, Saffron Walden CB10 1SA, UK
| | - Chris D Jiggins
- Department of Zoology, University of Cambridge, Downing
Street, Cambridge CB2 3EJ, UK
- St John's College, University of Cambridge, St John's Street,
Cambridge CB2 1TP, UK
| |
Collapse
|
18
|
Kenny NJ, McCarthy SA, Dudchenko O, James K, Betteridge E, Corton C, Dolucan J, Mead D, Oliver K, Omer AD, Pelan S, Ryan Y, Sims Y, Skelton J, Smith M, Torrance J, Weisz D, Wipat A, Aiden EL, Howe K, Williams ST. The gene-rich genome of the scallop Pecten maximus. Gigascience 2020; 9:giaa037. [PMID: 32352532 PMCID: PMC7191990 DOI: 10.1093/gigascience/giaa037] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2020] [Revised: 02/26/2020] [Accepted: 03/24/2020] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND The king scallop, Pecten maximus, is distributed in shallow waters along the Atlantic coast of Europe. It forms the basis of a valuable commercial fishery and plays a key role in coastal ecosystems and food webs. Like other filter feeding bivalves it can accumulate potent phytotoxins, to which it has evolved some immunity. The molecular origins of this immunity are of interest to evolutionary biologists, pharmaceutical companies, and fisheries management. FINDINGS Here we report the genome assembly of this species, conducted as part of the Wellcome Sanger 25 Genomes Project. This genome was assembled from PacBio reads and scaffolded with 10X Chromium and Hi-C data. Its 3,983 scaffolds have an N50 of 44.8 Mb (longest scaffold 60.1 Mb), with 92% of the assembly sequence contained in 19 scaffolds, corresponding to the 19 chromosomes found in this species. The total assembly spans 918.3 Mb and is the best-scaffolded marine bivalve genome published to date, exhibiting 95.5% recovery of the metazoan BUSCO set. Gene annotation resulted in 67,741 gene models. Analysis of gene content revealed large numbers of gene duplicates, as previously seen in bivalves, with little gene loss, in comparison with the sequenced genomes of other marine bivalve species. CONCLUSIONS The genome assembly of P. maximus and its annotated gene set provide a high-quality platform for studies on such disparate topics as shell biomineralization, pigmentation, vision, and resistance to algal toxins. As a result of our findings we highlight the sodium channel gene Nav1, known to confer resistance to saxitoxin and tetrodotoxin, as a candidate for further studies investigating immunity to domoic acid.
Collapse
Affiliation(s)
- Nathan J Kenny
- Natural History Museum, Department of Life Sciences,Cromwell Road, London SW7 5BD, UK
| | - Shane A McCarthy
- University of Cambridge, Department of Genetics,Cambridge CB2 3EH, UK
| | - Olga Dudchenko
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- The Center for Theoretical Biological Physics, Rice University, 6100 Main St, Houston, TX 77005-1827, USA
| | - Katherine James
- Natural History Museum, Department of Life Sciences,Cromwell Road, London SW7 5BD, UK
| | | | - Craig Corton
- Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - Jale Dolucan
- Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - Dan Mead
- Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - Karen Oliver
- Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - Arina D Omer
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Sarah Pelan
- Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - Yan Ryan
- School of Computing, Newcastle University, Newcastle upon Tyne NE1 7RU, UK
- Institute of Infection and Global Health, Liverpool University, iC2, 146 Brownlow Hill, Liverpool L3 5RF, UK
| | - Ying Sims
- Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | | | | | | | - David Weisz
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Anil Wipat
- School of Computing, Newcastle University, Newcastle upon Tyne NE1 7RU, UK
| | - Erez L Aiden
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- The Center for Theoretical Biological Physics, Rice University, 6100 Main St, Houston, TX 77005-1827, USA
- Shanghai Institute for Advanced Immunochemical Studies, Shanghai Tech University, Shanghai, China
- School of Agriculture and Environment, University of Western Australia, Perth, Australia
| | - Kerstin Howe
- Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - Suzanne T Williams
- Natural History Museum, Department of Life Sciences,Cromwell Road, London SW7 5BD, UK
| |
Collapse
|
19
|
Lilue J, Doran AG, Fiddes IT, Abrudan M, Armstrong J, Bennett R, Chow W, Collins J, Collins S, Czechanski A, Danecek P, Diekhans M, Dolle DD, Dunn M, Durbin R, Earl D, Ferguson-Smith A, Flicek P, Flint J, Frankish A, Fu B, Gerstein M, Gilbert J, Goodstadt L, Harrow J, Howe K, Ibarra-Soria X, Kolmogorov M, Lelliott C, Logan DW, Loveland J, Mathews CE, Mott R, Muir P, Nachtweide S, Navarro FC, Odom DT, Park N, Pelan S, Pham SK, Quail M, Reinholdt L, Romoth L, Shirley L, Sisu C, Sjoberg-Herrera M, Stanke M, Steward C, Thomas M, Threadgold G, Thybert D, Torrance J, Wong K, Wood J, Yalcin B, Yang F, Adams DJ, Paten B, Keane TM. Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci. Nat Genet 2018; 50:1574-1583. [PMID: 30275530 PMCID: PMC6205630 DOI: 10.1038/s41588-018-0223-8] [Citation(s) in RCA: 119] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Accepted: 08/02/2018] [Indexed: 12/11/2022]
Abstract
We report full-length draft de novo genome assemblies for 16 widely used inbred mouse strains and find extensive strain-specific haplotype variation. We identify and characterize 2,567 regions on the current mouse reference genome exhibiting the greatest sequence diversity. These regions are enriched for genes involved in pathogen defence and immunity and exhibit enrichment of transposable elements and signatures of recent retrotransposition events. Combinations of alleles and genes unique to an individual strain are commonly observed at these loci, reflecting distinct strain phenotypes. We used these genomes to improve the mouse reference genome, resulting in the completion of 10 new gene structures. Also, 62 new coding loci were added to the reference genome annotation. These genomes identified a large, previously unannotated, gene (Efcab3-like) encoding 5,874 amino acids. Mutant Efcab3-like mice display anomalies in multiple brain regions, suggesting a possible role for this gene in the regulation of brain development.
Collapse
MESH Headings
- Animals
- Animals, Laboratory
- Chromosome Mapping/veterinary
- Genetic Loci
- Genome
- Haplotypes/genetics
- Mice
- Mice, Inbred BALB C/genetics
- Mice, Inbred C3H/genetics
- Mice, Inbred C57BL/genetics
- Mice, Inbred CBA/genetics
- Mice, Inbred DBA/genetics
- Mice, Inbred NOD/genetics
- Mice, Inbred Strains/classification
- Mice, Inbred Strains/genetics
- Molecular Sequence Annotation
- Phylogeny
- Polymorphism, Single Nucleotide
- Species Specificity
Collapse
Affiliation(s)
- Jingtao Lilue
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Anthony G. Doran
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Ian T. Fiddes
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Monica Abrudan
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Joel Armstrong
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Ruth Bennett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - William Chow
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Joanna Collins
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Stephan Collins
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Centre National de la Recherche Scientifique UMR7104, Institut National de la Santé et de la Recherche Médicale U964, Université de Strasbourg, 67404 Illkirch, France
- Centre des Sciences du Goût et de l’Alimentation, University of Bourgogne Franche-Comté, 21000 Dijon, France
| | - Anne Czechanski
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | - Petr Danecek
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Mark Diekhans
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Dirk-Dominik Dolle
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Matt Dunn
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Richard Durbin
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Department of Genetics, University of Cambridge, Downing Site, Cambridge CB2 3EH, UK
| | - Dent Earl
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Anne Ferguson-Smith
- Department of Genetics, University of Cambridge, Downing Site, Cambridge CB2 3EH, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Jonathan Flint
- Brain Research Institute, University of California, 695 Charles E Young Dr S, Los Angeles, CA 90095, USA
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Beiyuan Fu
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Mark Gerstein
- Yale Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - James Gilbert
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Leo Goodstadt
- OxFORD Asset Management, OxAM House, 6 George Street, Oxford OX1 2BW
| | - Jennifer Harrow
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Kerstin Howe
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | | | - Mikhail Kolmogorov
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Chris Lelliott
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Darren W. Logan
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Jane Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Clayton E. Mathews
- Department of Pathology, Immunology, and Laboratory Medicine, University of Florida, Gainesville, FL, USA
| | - Richard Mott
- Genetics Institute, University College London, Gower Street, London WC1E 6BT, UK
| | - Paul Muir
- Yale Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Stefanie Nachtweide
- Institute of Mathematics and Computer Science, University of Greifswald, Domstraße 11, 17489 Greifswald, Germany
| | - Fabio C.P. Navarro
- Yale Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Duncan T. Odom
- Cancer Research UK Cambridge Institute, University of Cambridge, Robinson Way, Cambridge, CB2 0RE, UK
- German Cancer Research Center (DKFZ), Division Signaling and Functional Genomics, 69120 Heidelberg, Germany
| | - Naomi Park
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Sarah Pelan
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Son K Pham
- BioTuring Inc., San Diego, California, CA92121
| | - Mike Quail
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Laura Reinholdt
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | - Lars Romoth
- Institute of Mathematics and Computer Science, University of Greifswald, Domstraße 11, 17489 Greifswald, Germany
| | - Lesley Shirley
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Cristina Sisu
- Yale Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Bioscience, Brunel University London, Uxbridge UB8 3PH, UK
| | - Marcela Sjoberg-Herrera
- Departamento de Biología Celular y Molecular, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago 8331150, Chile
| | - Mario Stanke
- Institute of Mathematics and Computer Science, University of Greifswald, Domstraße 11, 17489 Greifswald, Germany
| | - Charles Steward
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Mark Thomas
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Glen Threadgold
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - David Thybert
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, UK
| | - James Torrance
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Kim Wong
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Jonathan Wood
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Binnaz Yalcin
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Centre National de la Recherche Scientifique UMR7104, Institut National de la Santé et de la Recherche Médicale U964, Université de Strasbourg, 67404 Illkirch, France
| | - Fengtang Yang
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - David J. Adams
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Benedict Paten
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Thomas M. Keane
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- School of Life Sciences, University of Nottingham, Nottingham, UK
| |
Collapse
|
20
|
Schneider VA, Graves-Lindsay T, Howe K, Bouk N, Chen HC, Kitts PA, Murphy TD, Pruitt KD, Thibaud-Nissen F, Albracht D, Fulton RS, Kremitzki M, Magrini V, Markovic C, McGrath S, Steinberg KM, Auger K, Chow W, Collins J, Harden G, Hubbard T, Pelan S, Simpson JT, Threadgold G, Torrance J, Wood JM, Clarke L, Koren S, Boitano M, Peluso P, Li H, Chin CS, Phillippy AM, Durbin R, Wilson RK, Flicek P, Eichler EE, Church DM. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res 2017; 27:849-864. [PMID: 28396521 PMCID: PMC5411779 DOI: 10.1101/gr.213611.116] [Citation(s) in RCA: 509] [Impact Index Per Article: 72.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Accepted: 03/14/2017] [Indexed: 11/24/2022]
Abstract
The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.
Collapse
Affiliation(s)
- Valerie A Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Tina Graves-Lindsay
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Kerstin Howe
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Nathan Bouk
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Hsiu-Chuan Chen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Paul A Kitts
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Derek Albracht
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Robert S Fulton
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Milinn Kremitzki
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Vincent Magrini
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Chris Markovic
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Sean McGrath
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | | | - Kate Auger
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - William Chow
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Joanna Collins
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Glenn Harden
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Timothy Hubbard
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Sarah Pelan
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Jared T Simpson
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Glen Threadgold
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - James Torrance
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Jonathan M Wood
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Laura Clarke
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Sergey Koren
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | | - Paul Peluso
- Pacific Biosciences, Menlo Park, California 94025, USA
| | - Heng Li
- Broad Institute, Cambridge, Massachusetts 02142, USA
| | | | - Adam M Phillippy
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Richard Durbin
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Richard K Wilson
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Deanna M Church
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| |
Collapse
|
21
|
Howe K, Clark MD, Torroja CF, Torrance J, Berthelot C, Muffato M, Collins JE, Humphray S, McLaren K, Matthews L, McLaren S, Sealy I, Caccamo M, Churcher C, Scott C, Barrett JC, Koch R, Rauch GJ, White S, Chow W, Kilian B, Quintais LT, Guerra-Assunção JA, Zhou Y, Gu Y, Yen J, Vogel JH, Eyre T, Banerjee R, Chi J, Fu B, Langley E, Maguire SF, Laird G, Lloyd D, Kenyon E, Donaldson S, Sehra H, Almeida-King J, Loveland J, Trevanion S, Jones M, Quail M, Willey D, Hunt A, Burton J, Sims S, McLay K, Plumb B, Davis J, Clee C, Oliver K, Clark R, Riddle C, Elliott D, Threadgold G, Harden G, Ware D, Begum S, Mortimore B, Kerry G, Heath P, Phillimore B, Tracey A, Corby N, Dunn M, Johnson C, Wood J, Clark S, Pelan S, Griffiths G, Smith M, Glithero R, Howden P, Barker N, Lloyd C, Stevens C, Harley J, Holt K, Panagiotidis G, Lovell J, Beasley H, Henderson C, Gordon D, Auger K, Wright D, Collins J, Raisen C, Dyer L, Leung K, Robertson L, Ambridge K, Leongamornlert D, McGuire S, Gilderthorp R, Griffiths C, Manthravadi D, Nichol S, Barker G, Whitehead S, Kay M, Brown J, Murnane C, Gray E, Humphries M, Sycamore N, Barker D, Saunders D, Wallis J, Babbage A, Hammond S, Mashreghi-Mohammadi M, Barr L, Martin S, Wray P, Ellington A, Matthews N, Ellwood M, Woodmansey R, Clark G, Cooper JD, Tromans A, Grafham D, Skuce C, Pandian R, Andrews R, Harrison E, Kimberley A, Garnett J, Fosker N, Hall R, Garner P, Kelly D, Bird C, Palmer S, Gehring I, Berger A, Dooley C, Ersan-Ürün Z, Eser C, Geiger H, Geisler M, Karotki L, Kirn A, Konantz J, Konantz M, Oberländer M, Rudolph-Geiger S, Teucke M, Lanz C, Raddatz G, Osoegawa K, Zhu B, Rapp A, Widaa S, Langford C, Yang F, Schuster SC, Carter NP, Harrow J, Ning Z, Herrero J, Searle SMJ, Enright A, Geisler R, Plasterk RHA, Lee C, Westerfield M, de Jong PJ, Zon LI, Postlethwait JH, Volhard CN, Hubbard TJP, Crollius HR, Rogers J, Stemple DL. Erratum: Corrigendum: The zebrafish reference genome sequence and its relationship to the human genome. Nature 2013. [DOI: 10.1038/nature12813] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
22
|
Church DM, Schneider VA, Graves T, Auger K, Cunningham F, Bouk N, Chen HC, Agarwala R, McLaren WM, Ritchie GRS, Albracht D, Kremitzki M, Rock S, Kotkiewicz H, Kremitzki C, Wollam A, Trani L, Fulton L, Fulton R, Matthews L, Whitehead S, Chow W, Torrance J, Dunn M, Harden G, Threadgold G, Wood J, Collins J, Heath P, Griffiths G, Pelan S, Grafham D, Eichler EE, Weinstock G, Mardis ER, Wilson RK, Howe K, Flicek P, Hubbard T. Modernizing reference genome assemblies. PLoS Biol 2011; 9:e1001091. [PMID: 21750661 PMCID: PMC3130012 DOI: 10.1371/journal.pbio.1001091] [Citation(s) in RCA: 321] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Affiliation(s)
- Deanna M Church
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Gregory SG, Barlow KF, McLay KE, Kaul R, Swarbreck D, Dunham A, Scott CE, Howe KL, Woodfine K, Spencer CCA, Jones MC, Gillson C, Searle S, Zhou Y, Kokocinski F, McDonald L, Evans R, Phillips K, Atkinson A, Cooper R, Jones C, Hall RE, Andrews TD, Lloyd C, Ainscough R, Almeida JP, Ambrose KD, Anderson F, Andrew RW, Ashwell RIS, Aubin K, Babbage AK, Bagguley CL, Bailey J, Banerjee R, Beasley H, Bethel G, Bird CP, Bray-Allen S, Brown JY, Brown AJ, Bryant SP, Buckley D, Burford DC, Burrill WDH, Burton J, Bye J, Carder C, Chapman JC, Clark SY, Clarke G, Clee C, Clegg SM, Cobley V, Collier RE, Corby N, Coville GJ, Davies J, Deadman R, Dhami P, Dovey O, Dunn M, Earthrowl M, Ellington AG, Errington H, Faulkner LM, Frankish A, Frankland J, French L, Garner P, Garnett J, Gay L, Ghori MRJ, Gibson R, Gilby LM, Gillett W, Glithero RJ, Grafham DV, Gribble SM, Griffiths C, Griffiths-Jones S, Grocock R, Hammond S, Harrison ESI, Hart E, Haugen E, Heath PD, Holmes S, Holt K, Howden PJ, Hunt AR, Hunt SE, Hunter G, Isherwood J, James R, Johnson C, Johnson D, Joy A, Kay M, Kershaw JK, Kibukawa M, Kimberley AM, King A, Knights AJ, Lad H, Laird G, Langford CF, Lawlor S, Leongamornlert DA, Lloyd DM, Loveland J, Lovell J, Lush MJ, Lyne R, Martin S, Mashreghi-Mohammadi M, Matthews L, Matthews NSW, McLaren S, Milne S, Mistry S, oore MJFM, Nickerson T, O'Dell CN, Oliver K, Palmeiri A, Palmer SA, Pandian RD, Parker A, Patel D, Pearce AV, Peck AI, Pelan S, Phelps K, Phillimore BJ, Plumb R, Porter KM, Prigmore E, Rajan J, Raymond C, Rouse G, Saenphimmachak C, Sehra HK, Sheridan E, Shownkeen R, Sims S, Skuce CD, Smith M, Steward C, Subramanian S, Sycamore N, Tracey A, Tromans A, Van Helmond Z, Wall J. M. Wallis M, White S, Whitehead SL, Wilkinson JE, Willey DL, Williams H, Wilming L, Wray PW, Wu Z, Coulson A, Vaudin M, Sulston JE, Durbin R, Hubbard T, Wooster R, Dunham I, Carter NP, McVean G, Ross MT, Harrow J, Olson MV, Beck S, Rogers J, Bentley DR. Erratum: The DNA sequence and biological annotation of human chromosome 1. Nature 2006. [DOI: 10.1038/nature05152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
24
|
Gregory SG, Barlow KF, McLay KE, Kaul R, Swarbreck D, Dunham A, Scott CE, Howe KL, Woodfine K, Spencer CCA, Jones MC, Gillson C, Searle S, Zhou Y, Kokocinski F, McDonald L, Evans R, Phillips K, Atkinson A, Cooper R, Jones C, Hall RE, Andrews TD, Lloyd C, Ainscough R, Almeida JP, Ambrose KD, Anderson F, Andrew RW, Ashwell RIS, Aubin K, Babbage AK, Bagguley CL, Bailey J, Beasley H, Bethel G, Bird CP, Bray-Allen S, Brown JY, Brown AJ, Buckley D, Burton J, Bye J, Carder C, Chapman JC, Clark SY, Clarke G, Clee C, Cobley V, Collier RE, Corby N, Coville GJ, Davies J, Deadman R, Dunn M, Earthrowl M, Ellington AG, Errington H, Frankish A, Frankland J, French L, Garner P, Garnett J, Gay L, Ghori MRJ, Gibson R, Gilby LM, Gillett W, Glithero RJ, Grafham DV, Griffiths C, Griffiths-Jones S, Grocock R, Hammond S, Harrison ESI, Hart E, Haugen E, Heath PD, Holmes S, Holt K, Howden PJ, Hunt AR, Hunt SE, Hunter G, Isherwood J, James R, Johnson C, Johnson D, Joy A, Kay M, Kershaw JK, Kibukawa M, Kimberley AM, King A, Knights AJ, Lad H, Laird G, Lawlor S, Leongamornlert DA, Lloyd DM, Loveland J, Lovell J, Lush MJ, Lyne R, Martin S, Mashreghi-Mohammadi M, Matthews L, Matthews NSW, McLaren S, Milne S, Mistry S, Moore MJF, Nickerson T, O'Dell CN, Oliver K, Palmeiri A, Palmer SA, Parker A, Patel D, Pearce AV, Peck AI, Pelan S, Phelps K, Phillimore BJ, Plumb R, Rajan J, Raymond C, Rouse G, Saenphimmachak C, Sehra HK, Sheridan E, Shownkeen R, Sims S, Skuce CD, Smith M, Steward C, Subramanian S, Sycamore N, Tracey A, Tromans A, Van Helmond Z, Wall M, Wallis JM, White S, Whitehead SL, Wilkinson JE, Willey DL, Williams H, Wilming L, Wray PW, Wu Z, Coulson A, Vaudin M, Sulston JE, Durbin R, Hubbard T, Wooster R, Dunham I, Carter NP, McVean G, Ross MT, Harrow J, Olson MV, Beck S, Rogers J, Bentley DR, Banerjee R, Bryant SP, Burford DC, Burrill WDH, Clegg SM, Dhami P, Dovey O, Faulkner LM, Gribble SM, Langford CF, Pandian RD, Porter KM, Prigmore E. The DNA sequence and biological annotation of human chromosome 1. Nature 2006; 441:315-21. [PMID: 16710414 DOI: 10.1038/nature04727] [Citation(s) in RCA: 170] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2005] [Accepted: 03/13/2006] [Indexed: 11/08/2022]
Abstract
The reference sequence for each human chromosome provides the framework for understanding genome function, variation and evolution. Here we report the finished sequence and biological annotation of human chromosome 1. Chromosome 1 is gene-dense, with 3,141 genes and 991 pseudogenes, and many coding sequences overlap. Rearrangements and mutations of chromosome 1 are prevalent in cancer and many other diseases. Patterns of sequence variation reveal signals of recent selection in specific genes that may contribute to human fitness, and also in regions where no function is evident. Fine-scale recombination occurs in hotspots of varying intensity along the sequence, and is enriched near genes. These and other studies of human biology and disease encoded within chromosome 1 are made possible with the highly accurate annotated sequence, as part of the completed set of chromosome sequences that comprise the reference human genome.
Collapse
Affiliation(s)
- S G Gregory
- The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Smyth IM, Wilming L, Lee AW, Taylor MS, Gautier P, Barlow K, Wallis J, Martin S, Glithero R, Phillimore B, Pelan S, Andrew R, Holt K, Taylor R, McLaren S, Burton J, Bailey J, Sims S, Squares J, Plumb B, Joy A, Gibson R, Gilbert J, Hart E, Laird G, Loveland J, Mudge J, Steward C, Swarbreck D, Harrow J, North P, Leaves N, Greystrong J, Coppola M, Manjunath S, Campbell M, Smith M, Strachan G, Tofts C, Boal E, Cobley V, Hunter G, Kimberley C, Thomas D, Cave-Berry L, Weston P, Botcherby MRM, White S, Edgar R, Cross SH, Irvani M, Hummerich H, Simpson EH, Johnson D, Hunsicker PR, Little PFR, Hubbard T, Campbell RD, Rogers J, Jackson IJ. Genomic anatomy of the Tyrp1 (brown) deletion complex. Proc Natl Acad Sci U S A 2006; 103:3704-9. [PMID: 16505357 PMCID: PMC1450144 DOI: 10.1073/pnas.0600199103] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Chromosome deletions in the mouse have proven invaluable in the dissection of gene function. The brown deletion complex comprises >28 independent genome rearrangements, which have been used to identify several functional loci on chromosome 4 required for normal embryonic and postnatal development. We have constructed a 172-bacterial artificial chromosome contig that spans this 22-megabase (Mb) interval and have produced a contiguous, finished, and manually annotated sequence from these clones. The deletion complex is strikingly gene-poor, containing only 52 protein-coding genes (of which only 39 are supported by human homologues) and has several further notable genomic features, including several segments of >1 Mb, apparently devoid of a coding sequence. We have used sequence polymorphisms to finely map the deletion breakpoints and identify strong candidate genes for the known phenotypes that map to this region, including three lethal loci (l4Rn1, l4Rn2, and l4Rn3) and the fitness mutant brown-associated fitness (baf). We have also characterized misexpression of the basonuclin homologue, Bnc2, associated with the inversion-mediated coat color mutant white-based brown (B(w)). This study provides a molecular insight into the basis of several characterized mouse mutants, which will allow further dissection of this region by targeted or chemical mutagenesis.
Collapse
Affiliation(s)
- Ian M. Smyth
- *Medical Research Council Human Genetics Unit, Edinburgh EH4 2XU, United Kingdom
| | | | - Angela W. Lee
- *Medical Research Council Human Genetics Unit, Edinburgh EH4 2XU, United Kingdom
| | - Martin S. Taylor
- *Medical Research Council Human Genetics Unit, Edinburgh EH4 2XU, United Kingdom
| | - Phillipe Gautier
- *Medical Research Council Human Genetics Unit, Edinburgh EH4 2XU, United Kingdom
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Bob Plumb
- Wellcome Trust Sanger Institute, and
| | - Ann Joy
- Wellcome Trust Sanger Institute, and
| | | | | | | | | | | | | | | | | | | | - Philip North
- Medical Research Council Rosalind Franklin Centre for Genome Research, Hinxton CB10 1SA, United Kingdom
| | - Nicholas Leaves
- Medical Research Council Rosalind Franklin Centre for Genome Research, Hinxton CB10 1SA, United Kingdom
| | - John Greystrong
- Medical Research Council Rosalind Franklin Centre for Genome Research, Hinxton CB10 1SA, United Kingdom
| | - Maria Coppola
- Medical Research Council Rosalind Franklin Centre for Genome Research, Hinxton CB10 1SA, United Kingdom
| | - Shilpa Manjunath
- Medical Research Council Rosalind Franklin Centre for Genome Research, Hinxton CB10 1SA, United Kingdom
| | - Mark Campbell
- Medical Research Council Rosalind Franklin Centre for Genome Research, Hinxton CB10 1SA, United Kingdom
| | - Mark Smith
- Medical Research Council Rosalind Franklin Centre for Genome Research, Hinxton CB10 1SA, United Kingdom
| | - Gregory Strachan
- Medical Research Council Rosalind Franklin Centre for Genome Research, Hinxton CB10 1SA, United Kingdom
| | - Calli Tofts
- Medical Research Council Rosalind Franklin Centre for Genome Research, Hinxton CB10 1SA, United Kingdom
| | - Esther Boal
- Medical Research Council Rosalind Franklin Centre for Genome Research, Hinxton CB10 1SA, United Kingdom
| | - Victoria Cobley
- Medical Research Council Rosalind Franklin Centre for Genome Research, Hinxton CB10 1SA, United Kingdom
| | - Giselle Hunter
- Medical Research Council Rosalind Franklin Centre for Genome Research, Hinxton CB10 1SA, United Kingdom
| | - Christopher Kimberley
- Medical Research Council Rosalind Franklin Centre for Genome Research, Hinxton CB10 1SA, United Kingdom
| | - Daniel Thomas
- Medical Research Council Rosalind Franklin Centre for Genome Research, Hinxton CB10 1SA, United Kingdom
| | - Lee Cave-Berry
- Medical Research Council Rosalind Franklin Centre for Genome Research, Hinxton CB10 1SA, United Kingdom
| | - Paul Weston
- Medical Research Council Rosalind Franklin Centre for Genome Research, Hinxton CB10 1SA, United Kingdom
| | - Marc R. M. Botcherby
- Medical Research Council Rosalind Franklin Centre for Genome Research, Hinxton CB10 1SA, United Kingdom
| | - Sharon White
- *Medical Research Council Human Genetics Unit, Edinburgh EH4 2XU, United Kingdom
| | - Ruth Edgar
- *Medical Research Council Human Genetics Unit, Edinburgh EH4 2XU, United Kingdom
| | - Sally H. Cross
- *Medical Research Council Human Genetics Unit, Edinburgh EH4 2XU, United Kingdom
| | - Marjan Irvani
- Department of Biochemistry, Imperial College, London SW7 2AZ, United Kingdom
| | - Holger Hummerich
- Department of Biochemistry, Imperial College, London SW7 2AZ, United Kingdom
| | - Eleanor H. Simpson
- *Medical Research Council Human Genetics Unit, Edinburgh EH4 2XU, United Kingdom
| | - Dabney Johnson
- Life Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831; and
| | | | - Peter F. R. Little
- Department of Biochemistry, Imperial College, London SW7 2AZ, United Kingdom
| | | | - R. Duncan Campbell
- Medical Research Council Rosalind Franklin Centre for Genome Research, Hinxton CB10 1SA, United Kingdom
| | | | - Ian J. Jackson
- *Medical Research Council Human Genetics Unit, Edinburgh EH4 2XU, United Kingdom
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|
26
|
Humphray SJ, Oliver K, Hunt AR, Plumb RW, Loveland JE, Howe KL, Andrews TD, Searle S, Hunt SE, Scott CE, Jones MC, Ainscough R, Almeida JP, Ambrose KD, Ashwell RIS, Babbage AK, Babbage S, Bagguley CL, Bailey J, Banerjee R, Barker DJ, Barlow KF, Bates K, Beasley H, Beasley O, Bird CP, Bray-Allen S, Brown AJ, Brown JY, Burford D, Burrill W, Burton J, Carder C, Carter NP, Chapman JC, Chen Y, Clarke G, Clark SY, Clee CM, Clegg S, Collier RE, Corby N, Crosier M, Cummings AT, Davies J, Dhami P, Dunn M, Dutta I, Dyer LW, Earthrowl ME, Faulkner L, Fleming CJ, Frankish A, Frankland JA, French L, Fricker DG, Garner P, Garnett J, Ghori J, Gilbert JGR, Glison C, Grafham DV, Gribble S, Griffiths C, Griffiths-Jones S, Grocock R, Guy J, Hall RE, Hammond S, Harley JL, Harrison ESI, Hart EA, Heath PD, Henderson CD, Hopkins BL, Howard PJ, Howden PJ, Huckle E, Johnson C, Johnson D, Joy AA, Kay M, Keenan S, Kershaw JK, Kimberley AM, King A, Knights A, Laird GK, Langford C, Lawlor S, Leongamornlert DA, Leversha M, Lloyd C, Lloyd DM, Lovell J, Martin S, Mashreghi-Mohammadi M, Matthews L, McLaren S, McLay KE, McMurray A, Milne S, Nickerson T, Nisbett J, Nordsiek G, Pearce AV, Peck AI, Porter KM, Pandian R, Pelan S, Phillimore B, Povey S, Ramsey Y, Rand V, Scharfe M, Sehra HK, Shownkeen R, Sims SK, Skuce CD, Smith M, Steward CA, Swarbreck D, Sycamore N, Tester J, Thorpe A, Tracey A, Tromans A, Thomas DW, Wall M, Wallis JM, West AP, Whitehead SL, Willey DL, Williams SA, Wilming L, Wray PW, Young L, Ashurst JL, Coulson A, Blöcker H, Durbin R, Sulston JE, Hubbard T, Jackson MJ, Bentley DR, Beck S, Rogers J, Dunham I. DNA sequence and analysis of human chromosome 9. Nature 2004; 429:369-74. [PMID: 15164053 PMCID: PMC2734081 DOI: 10.1038/nature02465] [Citation(s) in RCA: 90] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2003] [Accepted: 03/08/2004] [Indexed: 11/09/2022]
Abstract
Chromosome 9 is highly structurally polymorphic. It contains the largest autosomal block of heterochromatin, which is heteromorphic in 6-8% of humans, whereas pericentric inversions occur in more than 1% of the population. The finished euchromatic sequence of chromosome 9 comprises 109,044,351 base pairs and represents >99.6% of the region. Analysis of the sequence reveals many intra- and interchromosomal duplications, including segmental duplications adjacent to both the centromere and the large heterochromatic block. We have annotated 1,149 genes, including genes implicated in male-to-female sex reversal, cancer and neurodegenerative disease, and 426 pseudogenes. The chromosome contains the largest interferon gene cluster in the human genome. There is also a region of exceptionally high gene and G + C content including genes paralogous to those in the major histocompatibility complex. We have also detected recently duplicated genes that exhibit different rates of sequence divergence, presumably reflecting natural selection.
Collapse
Affiliation(s)
- S J Humphray
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Deloukas P, Earthrowl ME, Grafham DV, Rubenfield M, French L, Steward CA, Sims SK, Jones MC, Searle S, Scott C, Howe K, Hunt SE, Andrews TD, Gilbert JGR, Swarbreck D, Ashurst JL, Taylor A, Battles J, Bird CP, Ainscough R, Almeida JP, Ashwell RIS, Ambrose KD, Babbage AK, Bagguley CL, Bailey J, Banerjee R, Bates K, Beasley H, Bray-Allen S, Brown AJ, Brown JY, Burford DC, Burrill W, Burton J, Cahill P, Camire D, Carter NP, Chapman JC, Clark SY, Clarke G, Clee CM, Clegg S, Corby N, Coulson A, Dhami P, Dutta I, Dunn M, Faulkner L, Frankish A, Frankland JA, Garner P, Garnett J, Gribble S, Griffiths C, Grocock R, Gustafson E, Hammond S, Harley JL, Hart E, Heath PD, Ho TP, Hopkins B, Horne J, Howden PJ, Huckle E, Hynds C, Johnson C, Johnson D, Kana A, Kay M, Kimberley AM, Kershaw JK, Kokkinaki M, Laird GK, Lawlor S, Lee HM, Leongamornlert DA, Laird G, Lloyd C, Lloyd DM, Loveland J, Lovell J, McLaren S, McLay KE, McMurray A, Mashreghi-Mohammadi M, Matthews L, Milne S, Nickerson T, Nguyen M, Overton-Larty E, Palmer SA, Pearce AV, Peck AI, Pelan S, Phillimore B, Porter K, Rice CM, Rogosin A, Ross MT, Sarafidou T, Sehra HK, Shownkeen R, Skuce CD, Smith M, Standring L, Sycamore N, Tester J, Thorpe A, Torcasso W, Tracey A, Tromans A, Tsolas J, Wall M, Walsh J, Wang H, Weinstock K, West AP, Willey DL, Whitehead SL, Wilming L, Wray PW, Young L, Chen Y, Lovering RC, Moschonas NK, Siebert R, Fechtel K, Bentley D, Durbin R, Hubbard T, Doucette-Stamm L, Beck S, Smith DR, Rogers J. The DNA sequence and comparative analysis of human chromosome 10. Nature 2004; 429:375-81. [PMID: 15164054 DOI: 10.1038/nature02462] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2003] [Accepted: 03/09/2004] [Indexed: 11/08/2022]
Abstract
The finished sequence of human chromosome 10 comprises a total of 131,666,441 base pairs. It represents 99.4% of the euchromatic DNA and includes one megabase of heterochromatic sequence within the pericentromeric region of the short and long arm of the chromosome. Sequence annotation revealed 1,357 genes, of which 816 are protein coding, and 430 are pseudogenes. We observed widespread occurrence of overlapping coding genes (either strand) and identified 67 antisense transcripts. Our analysis suggests that both inter- and intrachromosomal segmental duplications have impacted on the gene count on chromosome 10. Multispecies comparative analysis indicated that we can readily annotate the protein-coding genes with current resources. We estimate that over 95% of all coding exons were identified in this study. Assessment of single base changes between the human chromosome 10 and chimpanzee sequence revealed nonsense mutations in only 21 coding genes with respect to the human sequence.
Collapse
Affiliation(s)
- P Deloukas
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Dunham A, Matthews LH, Burton J, Ashurst JL, Howe KL, Ashcroft KJ, Beare DM, Burford DC, Hunt SE, Griffiths-Jones S, Jones MC, Keenan SJ, Oliver K, Scott CE, Ainscough R, Almeida JP, Ambrose KD, Andrews DT, Ashwell RIS, Babbage AK, Bagguley CL, Bailey J, Bannerjee R, Barlow KF, Bates K, Beasley H, Bird CP, Bray-Allen S, Brown AJ, Brown JY, Burrill W, Carder C, Carter NP, Chapman JC, Clamp ME, Clark SY, Clarke G, Clee CM, Clegg SCM, Cobley V, Collins JE, Corby N, Coville GJ, Deloukas P, Dhami P, Dunham I, Dunn M, Earthrowl ME, Ellington AG, Faulkner L, Frankish AG, Frankland J, French L, Garner P, Garnett J, Gilbert JGR, Gilson CJ, Ghori J, Grafham DV, Gribble SM, Griffiths C, Hall RE, Hammond S, Harley JL, Hart EA, Heath PD, Howden PJ, Huckle EJ, Hunt PJ, Hunt AR, Johnson C, Johnson D, Kay M, Kimberley AM, King A, Laird GK, Langford CJ, Lawlor S, Leongamornlert DA, Lloyd DM, Lloyd C, Loveland JE, Lovell J, Martin S, Mashreghi-Mohammadi M, McLaren SJ, McMurray A, Milne S, Moore MJF, Nickerson T, Palmer SA, Pearce AV, Peck AI, Pelan S, Phillimore B, Porter KM, Rice CM, Searle S, Sehra HK, Shownkeen R, Skuce CD, Smith M, Steward CA, Sycamore N, Tester J, Thomas DW, Tracey A, Tromans A, Tubby B, Wall M, Wallis JM, West AP, Whitehead SL, Willey DL, Wilming L, Wray PW, Wright MW, Young L, Coulson A, Durbin R, Hubbard T, Sulston JE, Beck S, Bentley DR, Rogers J, Ross MT. The DNA sequence and analysis of human chromosome 13. Nature 2004; 428:522-8. [PMID: 15057823 PMCID: PMC2665288 DOI: 10.1038/nature02379] [Citation(s) in RCA: 75] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2003] [Accepted: 01/27/2004] [Indexed: 12/14/2022]
Abstract
Chromosome 13 is the largest acrocentric human chromosome. It carries genes involved in cancer including the breast cancer type 2 (BRCA2) and retinoblastoma (RB1) genes, is frequently rearranged in B-cell chronic lymphocytic leukaemia, and contains the DAOA locus associated with bipolar disorder and schizophrenia. We describe completion and analysis of 95.5 megabases (Mb) of sequence from chromosome 13, which contains 633 genes and 296 pseudogenes. We estimate that more than 95.4% of the protein-coding genes of this chromosome have been identified, on the basis of comparison with other vertebrate genome sequences. Additionally, 105 putative non-coding RNA genes were found. Chromosome 13 has one of the lowest gene densities (6.5 genes per Mb) among human chromosomes, and contains a central region of 38 Mb where the gene density drops to only 3.1 genes per Mb.
Collapse
Affiliation(s)
- A Dunham
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|