1
|
Makova KD, Pickett BD, Harris RS, Hartley GA, Cechova M, Pal K, Nurk S, Yoo D, Li Q, Hebbar P, McGrath BC, Antonacci F, Aubel M, Biddanda A, Borchers M, Bornberg-Bauer E, Bouffard GG, Brooks SY, Carbone L, Carrel L, Carroll A, Chang PC, Chin CS, Cook DE, Craig SJC, de Gennaro L, Diekhans M, Dutra A, Garcia GH, Grady PGS, Green RE, Haddad D, Hallast P, Harvey WT, Hickey G, Hillis DA, Hoyt SJ, Jeong H, Kamali K, Pond SLK, LaPolice TM, Lee C, Lewis AP, Loh YHE, Masterson P, McGarvey KM, McCoy RC, Medvedev P, Miga KH, Munson KM, Pak E, Paten B, Pinto BJ, Potapova T, Rhie A, Rocha JL, Ryabov F, Ryder OA, Sacco S, Shafin K, Shepelev VA, Slon V, Solar SJ, Storer JM, Sudmant PH, Sweetalana, Sweeten A, Tassia MG, Thibaud-Nissen F, Ventura M, Wilson MA, Young AC, Zeng H, Zhang X, Szpiech ZA, Huber CD, Gerton JL, Yi SV, Schatz MC, Alexandrov IA, Koren S, O'Neill RJ, Eichler EE, Phillippy AM. The complete sequence and comparative analysis of ape sex chromosomes. Nature 2024:10.1038/s41586-024-07473-2. [PMID: 38811727 DOI: 10.1038/s41586-024-07473-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 04/26/2024] [Indexed: 05/31/2024]
Abstract
Apes possess two sex chromosomes-the male-specific Y chromosome and the X chromosome, which is present in both males and females. The Y chromosome is crucial for male reproduction, with deletions being linked to infertility1. The X chromosome is vital for reproduction and cognition2. Variation in mating patterns and brain function among apes suggests corresponding differences in their sex chromosomes. However, owing to their repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the methodology developed for the telomere-to-telomere (T2T) human genome, we produced gapless assemblies of the X and Y chromosomes for five great apes (bonobo (Pan paniscus), chimpanzee (Pan troglodytes), western lowland gorilla (Gorilla gorilla gorilla), Bornean orangutan (Pongo pygmaeus) and Sumatran orangutan (Pongo abelii)) and a lesser ape (the siamang gibbon (Symphalangus syndactylus)), and untangled the intricacies of their evolution. Compared with the X chromosomes, the ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements-owing to the accumulation of lineage-specific ampliconic regions, palindromes, transposable elements and satellites. Many Y chromosome genes expand in multi-copy families and some evolve under purifying selection. Thus, the Y chromosome exhibits dynamic evolution, whereas the X chromosome is more stable. Mapping short-read sequencing data to these assemblies revealed diversity and selection patterns on sex chromosomes of more than 100 individual great apes. These reference assemblies are expected to inform human evolution and conservation genetics of non-human apes, all of which are endangered species.
Collapse
Affiliation(s)
| | - Brandon D Pickett
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Monika Cechova
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Karol Pal
- Penn State University, University Park, PA, USA
| | - Sergey Nurk
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - DongAhn Yoo
- University of Washington School of Medicine, Seattle, WA, USA
| | - Qiuhui Li
- Johns Hopkins University, Baltimore, MD, USA
| | - Prajna Hebbar
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | | | | | - Erich Bornberg-Bauer
- University of Münster, Münster, Germany
- MPI for Developmental Biology, Tübingen, Germany
| | - Gerard G Bouffard
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shelise Y Brooks
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Lucia Carbone
- Oregon Health and Science University, Portland, OR, USA
- Oregon National Primate Research Center, Hillsboro, OR, USA
| | - Laura Carrel
- Penn State University School of Medicine, Hershey, PA, USA
| | | | | | - Chen-Shan Chin
- Foundation of Biological Data Sciences, Belmont, CA, USA
| | | | | | | | - Mark Diekhans
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Amalia Dutra
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Gage H Garcia
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Diana Haddad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Glenn Hickey
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - David A Hillis
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | - Hyeonsoo Jeong
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Yong-Hwee E Loh
- University of California Santa Barbara, Santa Barbara, CA, USA
| | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Kelly M McGarvey
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Karen H Miga
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Evgenia Pak
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Benedict Paten
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | - Arang Rhie
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Joana L Rocha
- University of California Berkeley, Berkeley, CA, USA
| | - Fedor Ryabov
- Masters Program in National Research, University Higher School of Economics, Moscow, Russia
| | | | - Samuel Sacco
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | - Steven J Solar
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Sweetalana
- Penn State University, University Park, PA, USA
| | - Alex Sweeten
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Johns Hopkins University, Baltimore, MD, USA
| | | | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Mario Ventura
- Università degli Studi di Bari Aldo Moro, Bari, Italy
| | | | - Alice C Young
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Xinru Zhang
- Penn State University, University Park, PA, USA
| | | | | | | | - Soojin V Yi
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | | | - Sergey Koren
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Evan E Eichler
- University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| | - Adam M Phillippy
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
2
|
Rick JA, Brock CD, Lewanski AL, Golcher-Benavides J, Wagner CE. Reference Genome Choice and Filtering Thresholds Jointly Influence Phylogenomic Analyses. Syst Biol 2024; 73:76-101. [PMID: 37881861 DOI: 10.1093/sysbio/syad065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 09/20/2023] [Accepted: 10/20/2023] [Indexed: 10/27/2023] Open
Abstract
Molecular phylogenies are a cornerstone of modern comparative biology and are commonly employed to investigate a range of biological phenomena, such as diversification rates, patterns in trait evolution, biogeography, and community assembly. Recent work has demonstrated that significant biases may be introduced into downstream phylogenetic analyses from processing genomic data; however, it remains unclear whether there are interactions among bioinformatic parameters or biases introduced through the choice of reference genome for sequence alignment and variant calling. We address these knowledge gaps by employing a combination of simulated and empirical data sets to investigate the extent to which the choice of reference genome in upstream bioinformatic processing of genomic data influences phylogenetic inference, as well as the way that reference genome choice interacts with bioinformatic filtering choices and phylogenetic inference method. We demonstrate that more stringent minor allele filters bias inferred trees away from the true species tree topology, and that these biased trees tend to be more imbalanced and have a higher center of gravity than the true trees. We find the greatest topological accuracy when filtering sites for minor allele count (MAC) >3-4 in our 51-taxa data sets, while tree center of gravity was closest to the true value when filtering for sites with MAC >1-2. In contrast, filtering for missing data increased accuracy in the inferred topologies; however, this effect was small in comparison to the effect of minor allele filters and may be undesirable due to a subsequent mutation spectrum distortion. The bias introduced by these filters differs based on the reference genome used in short read alignment, providing further support that choosing a reference genome for alignment is an important bioinformatic decision with implications for downstream analyses. These results demonstrate that attributes of the study system and dataset (and their interaction) add important nuance for how best to assemble and filter short-read genomic data for phylogenetic inference.
Collapse
Affiliation(s)
- Jessica A Rick
- School of Natural Resources & the Environment, University of Arizona, Tucson, AZ 85719, USA
| | - Chad D Brock
- Department of Biological Sciences, Tarleton State University, Stephenville, TX 76401, USA
| | - Alexander L Lewanski
- Department of Integrative Biology and W.K. Kellogg Biological Station, Michigan State University, East Lansing, MI 48824, USA
| | - Jimena Golcher-Benavides
- Department of Natural Resource Ecology and Management, Iowa State University, Ames, IA 50011, USA
| | - Catherine E Wagner
- Program in Ecology and Evolution, University of Wyoming, Laramie, WY 82071, USA
- Department of Botany, University of Wyoming, Laramie, WY 82071, USA
| |
Collapse
|
3
|
Hartley GA, Frankenberg SR, Robinson NM, MacDonald AJ, Hamede RK, Burridge CP, Jones ME, Faulkner T, Shute H, Rose K, Brewster R, O'Neill RJ, Renfree MB, Pask AJ, Feigin CY. Genome of the endangered eastern quoll (Dasyurus viverrinus) reveals signatures of historical decline and pelage color evolution. Commun Biol 2024; 7:636. [PMID: 38796620 PMCID: PMC11128018 DOI: 10.1038/s42003-024-06251-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 04/26/2024] [Indexed: 05/28/2024] Open
Abstract
The eastern quoll (Dasyurus viverrinus) is an endangered marsupial native to Australia. Since the extirpation of its mainland populations in the 20th century, wild eastern quolls have been restricted to two islands at the southern end of their historical range. Eastern quolls are the subject of captive breeding programs and attempts have been made to re-establish a population in mainland Australia. However, few resources currently exist to guide the genetic management of this species. Here, we generated a reference genome for the eastern quoll with gene annotations supported by multi-tissue transcriptomes. Our assembly is among the most complete marsupial genomes currently available. Using this assembly, we infer the species' demographic history, identifying potential evidence of a long-term decline beginning in the late Pleistocene. Finally, we identify a deletion at the ASIP locus that likely underpins pelage color differences between the eastern quoll and the closely related Tasmanian devil (Sarcophilus harrisii).
Collapse
Affiliation(s)
- Gabrielle A Hartley
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, 06269, USA
| | | | - Natasha M Robinson
- Fenner School of Environment & Society, Australian National University, Canberra, ACT, 2601, Australia
| | - Anna J MacDonald
- Research School of Biology, Australian National University, Canberra, ACT, 2601, Australia
- Australian Antarctic Division, Department of Climate Change, Energy, the Environment and Water, Kingston, TAS, 7050, Australia
| | - Rodrigo K Hamede
- School of Natural Sciences, University of Tasmania, Hobart, TAS, 7005, Australia
| | | | - Menna E Jones
- School of Natural Sciences, University of Tasmania, Hobart, TAS, 7005, Australia
| | - Tim Faulkner
- Australian Reptile Park & Aussie Ark, Somersby, NSW, 2250, Australia
| | - Hayley Shute
- Australian Reptile Park & Aussie Ark, Somersby, NSW, 2250, Australia
| | - Karrie Rose
- Australian Registry of Wildlife Health, Taronga Conservation Society Australia, Mosman, NSW, 2088, Australia
| | - Rob Brewster
- WWF-Australia, PO Box 528, Sydney, NSW, 2001, Australia
| | - Rachel J O'Neill
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, 06269, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, 06269, USA
| | - Marilyn B Renfree
- School of BioSciences, The University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Andrew J Pask
- School of BioSciences, The University of Melbourne, Melbourne, VIC, 3010, Australia
- Department of Sciences, Museums Victoria, Carlton, VIC, 3053, Australia
| | - Charles Y Feigin
- School of BioSciences, The University of Melbourne, Melbourne, VIC, 3010, Australia.
- Department of Environment and Genetics, La Trobe University, Bundoora, VIC, 3086, Australia.
| |
Collapse
|
4
|
Fodor E, Okendo J, Szabó N, Szabó K, Czimer D, Tarján-Rácz A, Szeverényi I, Low BW, Liew JH, Koren S, Rhie A, Orbán L, Miklósi Á, Varga M, Burgess SM. The reference genome of Macropodus opercularis (the paradise fish). Sci Data 2024; 11:540. [PMID: 38796485 PMCID: PMC11127978 DOI: 10.1038/s41597-024-03277-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 04/18/2024] [Indexed: 05/28/2024] Open
Abstract
Amongst fishes, zebrafish (Danio rerio) has gained popularity as a model system over most other species and while their value as a model is well documented, their usefulness is limited in certain fields of research such as behavior. By embracing other, less conventional experimental organisms, opportunities arise to gain broader insights into evolution and development, as well as studying behavioral aspects not available in current popular model systems. The anabantoid paradise fish (Macropodus opercularis), an "air-breather" species has a highly complex behavioral repertoire and has been the subject of many ethological investigations but lacks genomic resources. Here we report the reference genome assembly of M. opercularis using long-read sequences at 150-fold coverage. The final assembly consisted of 483,077,705 base pairs (~483 Mb) on 152 contigs. Within the assembled genome we identified and annotated 20,157 protein coding genes and assigned ~90% of them to orthogroups.
Collapse
Affiliation(s)
- Erika Fodor
- Department of Genetics, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Javan Okendo
- Translational and Functional Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Nóra Szabó
- Department of Genetics, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Kata Szabó
- Department of Genetics, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Dávid Czimer
- Department of Genetics, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Anita Tarján-Rácz
- Department of Genetics, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Ildikó Szeverényi
- Frontline Fish Genomics Research Group, Department of Applied Fish Biology, Institute of Aquaculture and Environmental Safety, Hungarian University of Agriculture and Life Sciences, Georgikon Campus, Keszthely, Hungary
| | - Bi Wei Low
- Science Unit, Lingnan University, Hong Kong, China
| | | | - Sergey Koren
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Arang Rhie
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - László Orbán
- Frontline Fish Genomics Research Group, Department of Applied Fish Biology, Institute of Aquaculture and Environmental Safety, Hungarian University of Agriculture and Life Sciences, Georgikon Campus, Keszthely, Hungary
| | - Ádám Miklósi
- Department of Ethology, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Máté Varga
- Department of Genetics, ELTE Eötvös Loránd University, Budapest, Hungary.
| | - Shawn M Burgess
- Translational and Functional Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA.
| |
Collapse
|
5
|
Peel E, Hogg C, Belov K. Characterisation of defensins across the marsupial family tree. DEVELOPMENTAL AND COMPARATIVE IMMUNOLOGY 2024:105207. [PMID: 38797458 DOI: 10.1016/j.dci.2024.105207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 05/23/2024] [Accepted: 05/23/2024] [Indexed: 05/29/2024]
Abstract
Defensins are antimicrobial peptides involved in innate immunity, and gene number differs amongst eutherian mammals. Few studies have investigated defensins in marsupials, despite their potential involvement in immunological protection of altricial young. Here we use recently sequenced marsupial genomes and transcriptomes to annotate defensins in nine species across the marsupial family tree. We characterised 35 alpha and 286 beta defensins; gene number differed between species, although Dasyuromorphs had the largest repertoire. Defensins were encoded in three gene clusters within the genome, syntenic to eutherians, and were expressed in the pouch and mammary gland. Marsupial beta defensins were closely related to eutherians, however marsupial alpha defensins were more divergent. We identified marsupial orthologs of human DEFB3 and 6, and several marsupial-specific beta defensin lineages which may have novel functions. Marsupial predicted mature peptides were highly variable in length and sequence composition. We propose candidate peptides for future testing to elucidate the function of marsupial defensins.
Collapse
Affiliation(s)
- Emma Peel
- School of life and Environmental Sciences, Faculty of Science, The University of Sydney, Sydney, New South Wales, Australia 2006; Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science.
| | - Carolyn Hogg
- School of life and Environmental Sciences, Faculty of Science, The University of Sydney, Sydney, New South Wales, Australia 2006; Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science.
| | - Katherine Belov
- School of life and Environmental Sciences, Faculty of Science, The University of Sydney, Sydney, New South Wales, Australia 2006; Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science.
| |
Collapse
|
6
|
Sivell O, Raper C, Mitchell R, Sivell D. The genome sequence of a hoverfly Eristalinus aeneus (Scopoli, 1763). Wellcome Open Res 2024; 9:69. [PMID: 38813464 PMCID: PMC11134147 DOI: 10.12688/wellcomeopenres.20636.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/20/2024] [Indexed: 05/31/2024] Open
Abstract
We present a genome assembly from an individual female Eristalinus aeneus (a hoverfly; Arthropoda; Insecta; Diptera; Syrphidae). The genome sequence is 495.4 megabases in span. Most of the assembly is scaffolded into 6 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 15.97 kilobases in length.
Collapse
Affiliation(s)
| | | | - Ryan Mitchell
- Independent researcher, Sligo Town, County Sligo, Ireland
| | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Dong Z, Wang C, Qu Q. WGCCRR: a web-based tool for genome-wide screening of convergent indels and substitutions of amino acids. BIOINFORMATICS ADVANCES 2024; 4:vbae070. [PMID: 38808070 PMCID: PMC11132816 DOI: 10.1093/bioadv/vbae070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 04/05/2024] [Accepted: 05/23/2024] [Indexed: 05/30/2024]
Abstract
Summary Genome-wide analyses of proteincoding gene sequences are being employed to examine the genetic basis of adaptive evolution in many organismal groups. Previous studies have revealed that convergent/parallel adaptive evolution may be caused by convergent/parallel amino acid changes. Similarly, detailed analysis of lineage-specific amino acid changes has shown correlations with certain lineage-specific traits. However, experimental validation remains the ultimate measure of causality. With the increasing availability of genomic data, a streamlined tool for such analyses would facilitate and expedite the screening of genetic loci that hold potential for adaptive evolution, while alleviating the bioinformatic burden for experimental biologists. In this study, we present a user-friendly web-based tool called WGCCRR (Whole Genome Comparative Coding Region Read) designed to screen both convergent/parallel and lineage-specific amino acid changes on a genome-wide scale. Our tool allows users to replicate previous analyses with just a few clicks, and the exported results are straightforward to interpret. In addition, we have also included amino acid indels that are usually neglected in previous work. Our website provides an efficient platform for screening candidate loci for downstream experimental tests. Availability and Implementation The tool is available at: https://fishevo.xmu.edu.cn/.
Collapse
Affiliation(s)
- Zheng Dong
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Xiamen University, Xià-Mén, Fú-Jiàn 361102, China
| | - Chen Wang
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Xiamen University, Xià-Mén, Fú-Jiàn 361102, China
| | - Qingming Qu
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Xiamen University, Xià-Mén, Fú-Jiàn 361102, China
| |
Collapse
|
8
|
Skipp S, Wallace I. The genome sequence of the Grey Sedge caddis fly, Odontocerum albicorne (Scopoli, 1769). Wellcome Open Res 2024; 8:445. [PMID: 38784714 PMCID: PMC11112305 DOI: 10.12688/wellcomeopenres.20124.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/20/2024] [Indexed: 05/25/2024] Open
Abstract
We present a genome assembly from an individual male Odontocerum albicorne (the Grey Sedge caddis fly; Arthropoda; Insecta; Trichoptera; Odontoceridae). The genome sequence is 1,287.3 megabases in span. Most of the assembly is scaffolded into 31 chromosomal pseudomolecules, including the Z sex chromosome. The mitochondrial genome has also been assembled and is 16.57 kilobases in length.
Collapse
Affiliation(s)
- Sue Skipp
- Environment Agency, Rochester, England, UK
| | - Ian Wallace
- British Caddis Recording Scheme, Wirral, England, UK
| | | | | | | | | | | | | |
Collapse
|
9
|
Abueg LAL, Afgan E, Allart O, Awan AH, Bacon WA, Baker D, Bassetti M, Batut B, Bernt M, Blankenberg D, Bombarely A, Bretaudeau A, Bromhead CJ, Burke ML, Capon PK, Čech M, Chavero-Díez M, Chilton JM, Collins TJ, Coppens F, Coraor N, Cuccuru G, Cumbo F, Davis J, De Geest PF, de Koning W, Demko M, DeSanto A, Begines JMD, Doyle MA, Droesbeke B, Erxleben-Eggenhofer A, Föll MC, Formenti G, Fouilloux A, Gangazhe R, Genthon T, Goecks J, Beltran ANG, Goonasekera NA, Goué N, Griffin TJ, Grüning BA, Guerler A, Gundersen S, Gustafsson OJR, Hall C, Harrop TW, Hecht H, Heidari A, Heisner T, Heyl F, Hiltemann S, Hotz HR, Hyde CJ, Jagtap PD, Jakiela J, Johnson JE, Joshi J, Jossé M, Jum’ah K, Kalaš M, Kamieniecka K, Kayikcioglu T, Konkol M, Kostrykin L, Kucher N, Kumar A, Kuntz M, Lariviere D, Lazarus R, Bras YL, Corguillé GL, Lee J, Leo S, Liborio L, Libouban R, Tabernero DL, Lopez-Delisle L, Los LS, Mahmoud A, Makunin I, Marin P, Mehta S, Mok W, Moreno PA, Morier-Genoud F, Mosher S, Müller T, Nasr E, Nekrutenko A, Nelson TM, Oba AJ, Ostrovsky A, Polunina PV, Poterlowicz K, Price EJ, Price GR, Rasche H, Raubenolt B, Royaux C, Sargent L, Savage MT, Savchenko V, Savchenko D, Schatz MC, Seguineau P, Serrano-Solano B, Soranzo N, Srikakulam SK, Suderman K, Syme AE, Tangaro MA, Tedds JA, Tekman M, Cheng (Mike) Thang W, Thanki AS, Uhl M, van den Beek M, Varshney D, Vessio J, Videm P, Von Kuster G, Watson GR, Whitaker-Allen N, Winter U, Wolstencroft M, Zambelli F, Zierep P, Zoabi R. The Galaxy platform for accessible, reproducible, and collaborative data analyses: 2024 update. Nucleic Acids Res 2024:gkae410. [PMID: 38769056 DOI: 10.1093/nar/gkae410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 04/18/2024] [Accepted: 05/02/2024] [Indexed: 05/22/2024] Open
Abstract
Galaxy (https://galaxyproject.org) is deployed globally, predominantly through free-to-use services, supporting user-driven research that broadens in scope each year. Users are attracted to public Galaxy services by platform stability, tool and reference dataset diversity, training, support and integration, which enables complex, reproducible, shareable data analysis. Applying the principles of user experience design (UXD), has driven improvements in accessibility, tool discoverability through Galaxy Labs/subdomains, and a redesigned Galaxy ToolShed. Galaxy tool capabilities are progressing in two strategic directions: integrating general purpose graphical processing units (GPGPU) access for cutting-edge methods, and licensed tool support. Engagement with global research consortia is being increased by developing more workflows in Galaxy and by resourcing the public Galaxy services to run them. The Galaxy Training Network (GTN) portfolio has grown in both size, and accessibility, through learning paths and direct integration with Galaxy tools that feature in training courses. Code development continues in line with the Galaxy Project roadmap, with improvements to job scheduling and the user interface. Environmental impact assessment is also helping engage users and developers, reminding them of their role in sustainability, by displaying estimated CO2 emissions generated by each Galaxy job.
Collapse
|
10
|
Nevue AA, Sairavi A, Huang SJ, Nakai H, Mello CV. Genomic loss of GPR108 disrupts AAV transduction in birds. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.16.589954. [PMID: 38798475 PMCID: PMC11118497 DOI: 10.1101/2024.05.16.589954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
The G protein-coupled receptor 108 ( GPR108 ) gene encodes a protein factor identified as critical for adeno-associated virus (AAV) entry into mammalian cells, but whether it is universally involved in AAV transduction is unknown. Remarkably, we have discovered that GPR108 is absent in the genomes of birds and in most other sauropsids, providing a likely explanation for the overall lower AAV transduction efficacy of common AAV serotypes in birds compared to mammals. Importantly, transgenic expression of human GPR108 and manipulation of related glycan binding sites in the viral capsid significantly boost AAV transduction in zebra finch cells. These findings contribute to a more in depth understanding of the mechanisms and evolution of AAV transduction, with potential implications for the design of efficient tools for gene manipulation in experimental animal models, and a range of gene therapy applications in humans.
Collapse
|
11
|
Isdaner AJ, Levis NA, Ehrenreich IM, Pfennig DW. Genetic Variants Underlying Plasticity in Natural Populations of Spadefoot Toads: Environmental Assessment versus Phenotypic Response. Genes (Basel) 2024; 15:611. [PMID: 38790242 PMCID: PMC11121243 DOI: 10.3390/genes15050611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 05/02/2024] [Accepted: 05/09/2024] [Indexed: 05/26/2024] Open
Abstract
Many organisms facultatively produce different phenotypes depending on their environment, yet relatively little is known about the genetic bases of such plasticity in natural populations. In this study, we describe the genetic variation underlying an extreme form of plasticity--resource polyphenism--in Mexican spadefoot toad tadpoles, Spea multiplicata. Depending on their environment, these tadpoles develop into one of two drastically different forms: a carnivore morph or an omnivore morph. We collected both morphs from two ponds that differed in which morph had an adaptive advantage and performed genome-wide association studies of phenotype (carnivore vs. omnivore) and adaptive plasticity (adaptive vs. maladaptive environmental assessment). We identified four quantitative trait loci associated with phenotype and nine with adaptive plasticity, two of which exhibited signatures of minor allele dominance and two of which (one phenotype locus and one adaptive plasticity locus) did not occur as minor allele homozygotes. Investigations into the genetics of plastic traits in natural populations promise to provide novel insights into how such complex, adaptive traits arise and evolve.
Collapse
Affiliation(s)
- Andrew J. Isdaner
- Department of Biology, CB#3280, University of North Carolina, Chapel Hill, NC 27599, USA; (A.J.I.); (N.A.L.)
| | - Nicholas A. Levis
- Department of Biology, CB#3280, University of North Carolina, Chapel Hill, NC 27599, USA; (A.J.I.); (N.A.L.)
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Ian M. Ehrenreich
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA;
| | - David W. Pfennig
- Department of Biology, CB#3280, University of North Carolina, Chapel Hill, NC 27599, USA; (A.J.I.); (N.A.L.)
| |
Collapse
|
12
|
Tsai WLE, Escalona M, Garrett KL, Terrill RS, Sahasrabudhe R, Nguyen O, Beraut E, Seligmann W, Fairbairn CW, Harrigan RJ, McCormack JE, Alfaro ME, Smith TB, Bay RA. A highly contiguous genome assembly for the Yellow Warbler (Setophaga petechia). J Hered 2024; 115:317-325. [PMID: 38401156 PMCID: PMC11081134 DOI: 10.1093/jhered/esae008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 02/16/2024] [Indexed: 02/26/2024] Open
Abstract
The Yellow Warbler (Setophaga petechia) is a small songbird in the wood-warbler family (Parulidae) that exhibits phenotypic and ecological differences across a widespread distribution and is important to California's riparian habitat conservation. Here, we present a high-quality de novo genome assembly of a vouchered female Yellow Warbler from southern California. Using HiFi long-read and Omni-C proximity sequencing technologies, we generated a 1.22 Gb assembly including 687 scaffolds with a contig N50 of 6.80 Mb, scaffold N50 of 21.18 Mb, and a BUSCO completeness score of 96.0%. This highly contiguous genome assembly provides an essential resource for understanding the history of gene flow, divergence, and local adaptation in Yellow Warblers and can inform conservation management of this charismatic bird species.
Collapse
Affiliation(s)
- Whitney L E Tsai
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095, United States
- Moore Laboratory of Zoology, Biology Department, Occidental College, Los Angeles, CA 90041, United States
| | - Merly Escalona
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064, United States
| | - Kimball L Garrett
- Ornithology Department, Natural History Museum of Los Angeles County, Los Angeles, CA 90007, United States
| | - Ryan S Terrill
- Moore Laboratory of Zoology, Biology Department, Occidental College, Los Angeles, CA 90041, United States
| | - Ruta Sahasrabudhe
- DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California, Davis, CA 95616, United States
| | - Oanh Nguyen
- DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California, Davis, CA 95616, United States
| | - Eric Beraut
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA 95064, United States
| | - William Seligmann
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA 95064, United States
| | - Colin W Fairbairn
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA 95064, United States
| | - Ryan J Harrigan
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095, United States
| | - John E McCormack
- Moore Laboratory of Zoology, Biology Department, Occidental College, Los Angeles, CA 90041, United States
| | - Michael E Alfaro
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095, United States
| | - Thomas B Smith
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095, United States
| | - Rachael A Bay
- Department of Evolution and Ecology, University of California, Davis, CA 95616, United States
| |
Collapse
|
13
|
Leonard A, Alberdi A. A global initiative for ecological and evolutionary hologenomics. Trends Ecol Evol 2024:S0169-5347(24)00074-0. [PMID: 38777633 DOI: 10.1016/j.tree.2024.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 03/11/2024] [Accepted: 03/20/2024] [Indexed: 05/25/2024]
Abstract
The Earth Hologenome Initiative (EHI) is a global collaboration to generate and analyse hologenomic data from wild animals and associated microorganisms using standardised methodologies underpinned by open and inclusive research principles. Initially focused on vertebrates, it aims to re-examine ecological and evolutionary questions by studying host-microbiota interactions from a systemic perspective.
Collapse
Affiliation(s)
- Aoife Leonard
- Center for Evolutionary Hologenomics (CEH), Globe Institute, University of Copenhagen, Øster Farimagsgade 5, 1353 Copenhagen, Denmark
| | - Antton Alberdi
- Center for Evolutionary Hologenomics (CEH), Globe Institute, University of Copenhagen, Øster Farimagsgade 5, 1353 Copenhagen, Denmark.
| |
Collapse
|
14
|
Hogg CJ. Translating genomic advances into biodiversity conservation. Nat Rev Genet 2024; 25:362-373. [PMID: 38012268 DOI: 10.1038/s41576-023-00671-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/12/2023] [Indexed: 11/29/2023]
Abstract
A key action of the new Global Biodiversity Framework is the maintenance of genetic diversity in all species to safeguard their adaptive potential. To achieve this goal, a translational mindset, which aims to convert results of basic research into direct practical benefits, needs to be applied to biodiversity conservation. Despite much discussion on the value of genomics to conservation, a disconnect between those generating genomic resources and those applying it to biodiversity management remains. As global efforts to generate reference genomes for non-model species increase, investment into practical biodiversity applications is critically important. Applications such as understanding population and multispecies diversity and longitudinal monitoring need support alongside education for policymakers on integrating the data into evidence-based decisions. Without such investment, the opportunity to revolutionize global biodiversity conservation using genomics will not be fully realized.
Collapse
Affiliation(s)
- Carolyn J Hogg
- School of Life & Environmental Sciences, The University of Sydney, Sydney, NSW, Australia.
| |
Collapse
|
15
|
Steenwyk JL, King N. The promise and pitfalls of synteny in phylogenomics. PLoS Biol 2024; 22:e3002632. [PMID: 38768403 PMCID: PMC11105162 DOI: 10.1371/journal.pbio.3002632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024] Open
Abstract
Reconstructing the tree of life remains a central goal in biology. Early methods, which relied on small numbers of morphological or genetic characters, often yielded conflicting evolutionary histories, undermining confidence in the results. Investigations based on phylogenomics, which use hundreds to thousands of loci for phylogenetic inquiry, have provided a clearer picture of life's history, but certain branches remain problematic. To resolve difficult nodes on the tree of life, 2 recent studies tested the utility of synteny, the conserved collinearity of orthologous genetic loci in 2 or more organisms, for phylogenetics. Synteny exhibits compelling phylogenomic potential while also raising new challenges. This Essay identifies and discusses specific opportunities and challenges that bear on the value of synteny data and other rare genomic changes for phylogenomic studies. Synteny-based analyses of highly contiguous genome assemblies mark a new chapter in the phylogenomic era and the quest to reconstruct the tree of life.
Collapse
Affiliation(s)
- Jacob L. Steenwyk
- Howard Hughes Medical Institute, University of California, Berkeley, California, United States of America
- Department of Molecular and Cell Biology, University of California, Berkeley, California, United States of America
| | - Nicole King
- Howard Hughes Medical Institute, University of California, Berkeley, California, United States of America
- Department of Molecular and Cell Biology, University of California, Berkeley, California, United States of America
| |
Collapse
|
16
|
Wu S, Dou T, Yuan S, Yan S, Xu Z, Liu Y, Jian Z, Zhao J, Zhao R, Zi X, Gu D, Liu L, Li Q, Wu DD, Jia J, Ge C, Su Z, Wang K. Annotations of four high-quality indigenous chicken genomes identify more than one thousand missing genes in subtelomeric regions and micro-chromosomes with high G/C contents. BMC Genomics 2024; 25:430. [PMID: 38693501 PMCID: PMC11061957 DOI: 10.1186/s12864-024-10316-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 04/16/2024] [Indexed: 05/03/2024] Open
Abstract
BACKGROUND Although multiple chicken genomes have been assembled and annotated, the numbers of protein-coding genes in chicken genomes and their variation among breeds are still uncertain due to the low quality of these genome assemblies and limited resources used in their gene annotations. To fill these gaps, we recently assembled genomes of four indigenous chicken breeds with distinct traits at chromosome-level. In this study, we annotated genes in each of these assembled genomes using a combination of RNA-seq- and homology-based approaches. RESULTS We identified varying numbers (17,497-17,718) of protein-coding genes in the four indigenous chicken genomes, while recovering 51 of the 274 "missing" genes in birds in general, and 36 of the 174 "missing" genes in chickens in particular. Intriguingly, based on deeply sequenced RNA-seq data collected in multiple tissues in the four breeds, we found 571 ~ 627 protein-coding genes in each genome, which were missing in the annotations of the reference chicken genomes (GRCg6a and GRCg7b/w). After removing redundancy, we ended up with a total of 1,420 newly annotated genes (NAGs). The NAGs tend to be found in subtelomeric regions of macro-chromosomes (chr1 to chr5, plus chrZ) and middle chromosomes (chr6 to chr13, plus chrW), as well as in micro-chromosomes (chr14 to chr39) and unplaced contigs, where G/C contents are high. Moreover, the NAGs have elevated quadruplexes G frequencies, while both G/C contents and quadruplexes G frequencies in their surrounding regions are also high. The NAGs showed tissue-specific expression, and we were able to verify 39 (92.9%) of 42 randomly selected ones in various tissues of the four chicken breeds using RT-qPCR experiments. Most of the NAGs were also encoded in the reference chicken genomes, thus, these genomes might harbor more genes than previously thought. CONCLUSION The NAGs are widely distributed in wild, indigenous and commercial chickens, and they might play critical roles in chicken physiology. Counting these new genes, chicken genomes harbor more genes than originally thought.
Collapse
Affiliation(s)
- Siwen Wu
- Department of Bioinformatics and Genomics, the University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Tengfei Dou
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, China
| | - Sisi Yuan
- Department of Bioinformatics and Genomics, the University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Shixiong Yan
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, China
| | - Zhiqiang Xu
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, China
| | - Yong Liu
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, China
| | - Zonghui Jian
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, China
| | - Jingying Zhao
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, China
| | - Rouhan Zhao
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, China
| | - Xiannian Zi
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, China
| | - Dahai Gu
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, China
| | - Lixian Liu
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, China
| | - Qihua Li
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, China
| | - Dong-Dong Wu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| | - Junjing Jia
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, China
| | - Changrong Ge
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, China
| | - Zhengchang Su
- Department of Bioinformatics and Genomics, the University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Kun Wang
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, China.
| |
Collapse
|
17
|
Espinosa E, Bautista R, Larrosa R, Plata O. Advancements in long-read genome sequencing technologies and algorithms. Genomics 2024; 116:110842. [PMID: 38608738 DOI: 10.1016/j.ygeno.2024.110842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 04/01/2024] [Accepted: 04/06/2024] [Indexed: 04/14/2024]
Abstract
The recent advent of long read sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore technology (ONT), have led to substantial improvements in accuracy and computational cost in sequencing genomes. However, de novo whole-genome assembly still presents significant challenges related to the quality of the results. Pursuing de novo whole-genome assembly remains a formidable challenge, underscored by intricate considerations surrounding computational demands and result quality. As sequencing accuracy and throughput steadily advance, a continuous stream of innovative assembly tools floods the field. Navigating this dynamic landscape necessitates a reasonable choice of sequencing platform, depth, and assembly tools to orchestrate high-quality genome reconstructions. This comprehensive review delves into the intricate interplay between cutting-edge long read sequencing technologies, assembly methodologies, and the ever-evolving field of genomics. With a focus on addressing the pivotal challenges and harnessing the opportunities presented by these advancements, we provide an in-depth exploration of the crucial factors influencing the selection of optimal strategies for achieving robust and insightful genome assemblies.
Collapse
Affiliation(s)
- Elena Espinosa
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain.
| | - Rocio Bautista
- Supercomputing and Bioinnovation Center, University of Malaga, C. Severo Ochoa, 34, Malaga 29590, Spain.
| | - Rafael Larrosa
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain; Supercomputing and Bioinnovation Center, University of Malaga, C. Severo Ochoa, 34, Malaga 29590, Spain.
| | - Oscar Plata
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain.
| |
Collapse
|
18
|
Merondun J, Marques CI, Andrade P, Meshcheryagina S, Galván I, Afonso S, Alves JM, Araújo PM, Bachurin G, Balacco J, Bán M, Fedrigo O, Formenti G, Fossøy F, Fülöp A, Golovatin M, Granja S, Hewson C, Honza M, Howe K, Larson G, Marton A, Moskát C, Mountcastle J, Procházka P, Red’kin Y, Sims Y, Šulc M, Tracey A, Wood JMD, Jarvis ED, Hauber ME, Carneiro M, Wolf JBW. Evolution and genetic architecture of sex-limited polymorphism in cuckoos. SCIENCE ADVANCES 2024; 10:eadl5255. [PMID: 38657058 PMCID: PMC11042743 DOI: 10.1126/sciadv.adl5255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 03/20/2024] [Indexed: 04/26/2024]
Abstract
Sex-limited polymorphism has evolved in many species including our own. Yet, we lack a detailed understanding of the underlying genetic variation and evolutionary processes at work. The brood parasitic common cuckoo (Cuculus canorus) is a prime example of female-limited color polymorphism, where adult males are monochromatic gray and females exhibit either gray or rufous plumage. This polymorphism has been hypothesized to be governed by negative frequency-dependent selection whereby the rarer female morph is protected against harassment by males or from mobbing by parasitized host species. Here, we show that female plumage dichromatism maps to the female-restricted genome. We further demonstrate that, consistent with balancing selection, ancestry of the rufous phenotype is shared with the likewise female dichromatic sister species, the oriental cuckoo (Cuculus optatus). This study shows that sex-specific polymorphism in trait variation can be resolved by genetic variation residing on a sex-limited chromosome and be maintained across species boundaries.
Collapse
Affiliation(s)
- Justin Merondun
- Division of Evolutionary Biology, LMU Munich, Planegg-Martinsried, Germany
- Department of Ornithology, Max Planck Institute for Biological Intelligence, Seewiesen, Germany
| | - Cristiana I. Marques
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, Portugal
- Departamento de Biologia, Faculdade de Ciências da Universidade do Porto, Porto, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
| | - Pedro Andrade
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
| | - Swetlana Meshcheryagina
- Institute of Plant and Animal Ecology, Ural Branch, Russian Academy of Sciences, Yekaterinburg, Russia
| | - Ismael Galván
- Departamento de Ecología Evolutiva, Museo Nacional de Ciencias Naturales, CSIC, Madrid, Spain
| | - Sandra Afonso
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
| | - Joel M. Alves
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
- Palaeogenomics and Bio-Archaeology Research Network, School of Archaeology, University of Oxford, Oxford, OX1 3QY, UK
| | - Pedro M. Araújo
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
- Department of Life Sciences, MARE–Marine and Environmental Sciences Centre/ARNET–Aquatic Research Network, University of Coimbra, Coimbra, Portugal
| | | | - Jennifer Balacco
- The Vertebrate Genome Lab, Rockefeller University, New York, NY 10065, USA
| | - Miklós Bán
- HUN-REN-UD Behavioral Ecology Research Group, Department of Evolutionary Zoology and Human Biology, University of Debrecen, Debrecen, Hungary
| | - Olivier Fedrigo
- The Vertebrate Genome Lab, Rockefeller University, New York, NY 10065, USA
| | - Giulio Formenti
- The Vertebrate Genome Lab, Rockefeller University, New York, NY 10065, USA
| | - Frode Fossøy
- Centre for Biodiversity Genetics, Norwegian Institute for Nature Research, Trondheim, Norway
| | - Attila Fülöp
- HUN-REN-UD Behavioral Ecology Research Group, Department of Evolutionary Zoology and Human Biology, University of Debrecen, Debrecen, Hungary
- Evolutionary Ecology Group, Hungarian Department of Biology and Ecology, Babeş-Bolyai University, Cluj-Napoca, Romania
- STAR-UBB Institute of Advanced Studies in Science and Technology, Babeş-Bolyai University, Cluj-Napoca, Romania
| | - Mikhail Golovatin
- Institute of Plant and Animal Ecology, Ural Branch, Russian Academy of Sciences, Yekaterinburg, Russia
| | - Sofia Granja
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
- Palaeogenomics and Bio-Archaeology Research Network, School of Archaeology, University of Oxford, Oxford, OX1 3QY, UK
| | | | - Marcel Honza
- Institute of Vertebrate Biology, Czech Academy of Sciences, Brno, Czech Republic
| | - Kerstin Howe
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Greger Larson
- Palaeogenomics and Bio-Archaeology Research Network, School of Archaeology, University of Oxford, Oxford, OX1 3QY, UK
| | - Attila Marton
- Evolutionary Ecology Group, Faculty of Biology and Geology, Babeș-Bolyai University, Cluj-Napoca, Romania
- Department of Evolutionary Zoology and Human Biology, University of Debrecen, Debrecen, Hungary
| | - Csaba Moskát
- Hungarian Natural History Museum, Budapest, Hungary
| | | | - Petr Procházka
- Institute of Vertebrate Biology, Czech Academy of Sciences, Brno, Czech Republic
| | | | - Ying Sims
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Michal Šulc
- Institute of Vertebrate Biology, Czech Academy of Sciences, Brno, Czech Republic
| | - Alan Tracey
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | | | - Erich D. Jarvis
- The Vertebrate Genome Lab, Rockefeller University, New York, NY 10065, USA
| | - Mark E. Hauber
- Advanced Science Research Center and Program in Psychology, Graduate Center of the City University of New York, New York, NY 10031, USA
| | - Miguel Carneiro
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
| | - Jochen B. W. Wolf
- Division of Evolutionary Biology, LMU Munich, Planegg-Martinsried, Germany
| |
Collapse
|
19
|
Galià-Camps C, Pegueroles C, Turon X, Carreras C, Pascual M. Genome composition and GC content influence loci distribution in reduced representation genomic studies. BMC Genomics 2024; 25:410. [PMID: 38664648 PMCID: PMC11046876 DOI: 10.1186/s12864-024-10312-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 04/15/2024] [Indexed: 04/28/2024] Open
Abstract
BACKGROUND Genomic architecture is a key evolutionary trait for living organisms. Due to multiple complex adaptive and neutral forces which impose evolutionary pressures on genomes, there is a huge variability of genomic features. However, their variability and the extent to which genomic content determines the distribution of recovered loci in reduced representation sequencing studies is largely unexplored. RESULTS Here, by using 80 genome assemblies, we observed that whereas plants primarily increase their genome size by expanding their intergenic regions, animals expand both intergenic and intronic regions, although the expansion patterns differ between deuterostomes and protostomes. Loci mapping in introns, exons, and intergenic categories obtained by in silico digestion using 2b-enzymes are positively correlated with the percentage of these regions in the corresponding genomes, suggesting that loci distribution mostly mirrors genomic architecture of the selected taxon. However, exonic regions showed a significant enrichment of loci in all groups regardless of the used enzyme. Moreover, when using selective adaptors to obtain a secondarily reduced loci dataset, the percentage and distribution of retained loci also varied. Adaptors with G/C terminals recovered a lower percentage of selected loci, with a further enrichment of exonic regions, while adaptors with A/T terminals retained a higher percentage of loci and slightly selected more intronic regions than expected. CONCLUSIONS Our results highlight how genome composition, genome GC content, RAD enzyme choice and use of base-selective adaptors influence reduced genome representation techniques. This is important to acknowledge in population and conservation genomic studies, as it determines the abundance and distribution of loci.
Collapse
Affiliation(s)
- Carles Galià-Camps
- Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona, Avinguda Diagonal 643, Barcelona, 08028, Spain.
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona (UB), Barcelona, Spain.
- Department of Marine Ecology, Centre d'Estudis Avançats de Blanes (CEAB-CSIC), Accés Cala Sant Francesc 14, Blanes, 17300, Spain.
| | - Cinta Pegueroles
- Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona, Avinguda Diagonal 643, Barcelona, 08028, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona (UB), Barcelona, Spain
| | - Xavier Turon
- Department of Marine Ecology, Centre d'Estudis Avançats de Blanes (CEAB-CSIC), Accés Cala Sant Francesc 14, Blanes, 17300, Spain
| | - Carlos Carreras
- Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona, Avinguda Diagonal 643, Barcelona, 08028, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona (UB), Barcelona, Spain
| | - Marta Pascual
- Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona, Avinguda Diagonal 643, Barcelona, 08028, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona (UB), Barcelona, Spain
| |
Collapse
|
20
|
Sebastianelli M, Lukhele SM, Secomandi S, de Souza SG, Haase B, Moysi M, Nikiforou C, Hutfluss A, Mountcastle J, Balacco J, Pelan S, Chow W, Fedrigo O, Downs CT, Monadjem A, Dingemanse NJ, Jarvis ED, Brelsford A, vonHoldt BM, Kirschel ANG. A genomic basis of vocal rhythm in birds. Nat Commun 2024; 15:3095. [PMID: 38653976 DOI: 10.1038/s41467-024-47305-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 03/22/2024] [Indexed: 04/25/2024] Open
Abstract
Vocal rhythm plays a fundamental role in sexual selection and species recognition in birds, but little is known of its genetic basis due to the confounding effect of vocal learning in model systems. Uncovering its genetic basis could facilitate identifying genes potentially important in speciation. Here we investigate the genomic underpinnings of rhythm in vocal non-learning Pogoniulus tinkerbirds using 135 individual whole genomes distributed across a southern African hybrid zone. We find rhythm speed is associated with two genes that are also known to affect human speech, Neurexin-1 and Coenzyme Q8A. Models leveraging ancestry reveal these candidate loci also impact rhythmic stability, a trait linked with motor performance which is an indicator of quality. Character displacement in rhythmic stability suggests possible reinforcement against hybridization, supported by evidence of asymmetric assortative mating in the species producing faster, more stable rhythms. Because rhythm is omnipresent in animal communication, candidate genes identified here may shape vocal rhythm across birds and other vertebrates.
Collapse
Affiliation(s)
- Matteo Sebastianelli
- Department of Biological Sciences, University of Cyprus, PO Box 20537, Nicosia, 1678, Cyprus.
- Department of Medical Biochemistry and Microbiology, Uppsala University, Box 582, 751 23, Uppsala, Sweden.
| | - Sifiso M Lukhele
- Department of Biological Sciences, University of Cyprus, PO Box 20537, Nicosia, 1678, Cyprus
| | - Simona Secomandi
- Department of Biological Sciences, University of Cyprus, PO Box 20537, Nicosia, 1678, Cyprus
| | - Stacey G de Souza
- Department of Biological Sciences, University of Cyprus, PO Box 20537, Nicosia, 1678, Cyprus
| | - Bettina Haase
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | - Michaella Moysi
- Department of Biological Sciences, University of Cyprus, PO Box 20537, Nicosia, 1678, Cyprus
| | - Christos Nikiforou
- Department of Biological Sciences, University of Cyprus, PO Box 20537, Nicosia, 1678, Cyprus
| | - Alexander Hutfluss
- Behavioural Ecology, Faculty of Biology, LMU Munich (LMU), 82152, Planegg-Martinsried, Germany
| | | | - Jennifer Balacco
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | | | | | - Olivier Fedrigo
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | - Colleen T Downs
- Centre for Functional Biodiversity, School of Life Sciences, University of KwaZulu-Natal, Pietermaritzburg, 3209, South Africa
| | - Ara Monadjem
- Department of Biological Sciences, University of Eswatini, Kwaluseni, Eswatini
- Mammal Research Institute, Department of Zoology & Entomology, University of Pretoria, Private Bag 20, Hatfield, 0028, Pretoria, South Africa
| | - Niels J Dingemanse
- Behavioural Ecology, Faculty of Biology, LMU Munich (LMU), 82152, Planegg-Martinsried, Germany
| | - Erich D Jarvis
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Alan Brelsford
- Department of Evolution, Ecology and Organismal Biology, University of California Riverside, Riverside, CA, 92521, USA
| | - Bridgett M vonHoldt
- Department of Ecology & Evolutionary Biology, Princeton University, Princeton, NJ, 08544, USA
| | - Alexander N G Kirschel
- Department of Biological Sciences, University of Cyprus, PO Box 20537, Nicosia, 1678, Cyprus.
| |
Collapse
|
21
|
Li H, Durbin R. Genome assembly in the telomere-to-telomere era. Nat Rev Genet 2024:10.1038/s41576-024-00718-w. [PMID: 38649458 DOI: 10.1038/s41576-024-00718-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/27/2024] [Indexed: 04/25/2024]
Abstract
Genome sequences largely determine the biology and encode the history of an organism, and de novo assembly - the process of reconstructing the genome sequence of an organism from sequencing reads - has been a central problem in bioinformatics for four decades. Until recently, genomes were typically assembled into fragments of a few megabases at best, but now technological advances in long-read sequencing enable the near-complete assembly of each chromosome - also known as telomere-to-telomere assembly - for many organisms. Here, we review recent progress on assembly algorithms and protocols, with a focus on how to derive near-telomere-to-telomere assemblies. We also discuss the additional developments that will be required to resolve remaining assembly gaps and to assemble non-diploid genomes.
Collapse
Affiliation(s)
- Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| | - Richard Durbin
- Department of Genetics, Cambridge University, Cambridge, UK.
| |
Collapse
|
22
|
Hogan MP, Holding ML, Nystrom GS, Colston TJ, Bartlett DA, Mason AJ, Ellsworth SA, Rautsaw RM, Lawrence KC, Strickland JL, He B, Fraser P, Margres MJ, Gilbert DM, Gibbs HL, Parkinson CL, Rokyta DR. The genetic regulatory architecture and epigenomic basis for age-related changes in rattlesnake venom. Proc Natl Acad Sci U S A 2024; 121:e2313440121. [PMID: 38578985 PMCID: PMC11032440 DOI: 10.1073/pnas.2313440121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 03/13/2024] [Indexed: 04/07/2024] Open
Abstract
Developmental phenotypic changes can evolve under selection imposed by age- and size-related ecological differences. Many of these changes occur through programmed alterations to gene expression patterns, but the molecular mechanisms and gene-regulatory networks underlying these adaptive changes remain poorly understood. Many venomous snakes, including the eastern diamondback rattlesnake (Crotalus adamanteus), undergo correlated changes in diet and venom expression as snakes grow larger with age, providing models for identifying mechanisms of timed expression changes that underlie adaptive life history traits. By combining a highly contiguous, chromosome-level genome assembly with measures of expression, chromatin accessibility, and histone modifications, we identified cis-regulatory elements and trans-regulatory factors controlling venom ontogeny in the venom glands of C. adamanteus. Ontogenetic expression changes were significantly correlated with epigenomic changes within genes, immediately adjacent to genes (e.g., promoters), and more distant from genes (e.g., enhancers). We identified 37 candidate transcription factors (TFs), with the vast majority being up-regulated in adults. The ontogenetic change is largely driven by an increase in the expression of TFs associated with growth signaling, transcriptional activation, and circadian rhythm/biological timing systems in adults with corresponding epigenomic changes near the differentially expressed venom genes. However, both expression activation and repression contributed to the composition of both adult and juvenile venoms, demonstrating the complexity and potential evolvability of gene regulation for this trait. Overall, given that age-based trait variation is common across the tree of life, we provide a framework for understanding gene-regulatory-network-driven life-history evolution more broadly.
Collapse
Affiliation(s)
- Michael P. Hogan
- Department of Biological Science, Florida State University, Tallahassee, FL32306
| | - Matthew L. Holding
- Department of Biological Science, Florida State University, Tallahassee, FL32306
- Life Sciences Institute, University of Michigan, Ann Arbor, MI48109
| | - Gunnar S. Nystrom
- Department of Biological Science, Florida State University, Tallahassee, FL32306
| | - Timothy J. Colston
- Department of Biological Science, Florida State University, Tallahassee, FL32306
- Department of Biology, University of Puerto Rico at Mayagüez, Mayagüez, PR00681
| | - Daniel A. Bartlett
- Department of Biological Science, Florida State University, Tallahassee, FL32306
| | - Andrew J. Mason
- Department of Biological Sciences, Clemson University, Clemson, SC29634
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, OH43210
| | - Schyler A. Ellsworth
- Department of Biological Science, Florida State University, Tallahassee, FL32306
| | - Rhett M. Rautsaw
- Department of Biological Sciences, Clemson University, Clemson, SC29634
- Department of Integrative Biology, University of South Florida, Tampa, FL33620
- School of Biological Sciences, Washington State University, Pullman, WA99164
| | - Kylie C. Lawrence
- Department of Biological Science, Florida State University, Tallahassee, FL32306
| | - Jason L. Strickland
- Department of Biological Sciences, Clemson University, Clemson, SC29634
- Department of Biology, University of South Alabama, Mobile, AL36688
| | - Bing He
- Department of Biological Science, Florida State University, Tallahassee, FL32306
| | - Peter Fraser
- Department of Biological Science, Florida State University, Tallahassee, FL32306
| | - Mark J. Margres
- Department of Integrative Biology, University of South Florida, Tampa, FL33620
| | - David M. Gilbert
- Laboratory of Chromosome Replication and Epigenome Regulation, San Diego Biomedical Research Institute, San Diego, CA92121
| | - H. Lisle Gibbs
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, Columbus, OH43210
| | - Christopher L. Parkinson
- Department of Biological Sciences, Clemson University, Clemson, SC29634
- Department of Forestry and Environmental Conservation, Clemson University, Clemson, SC29634
| | - Darin R. Rokyta
- Department of Biological Science, Florida State University, Tallahassee, FL32306
| |
Collapse
|
23
|
Ball A, Robertson C, Doubleday M. The genome sequence of the Western Capercaillie Tetrao urogallus Linnaeus, 1758. Wellcome Open Res 2024; 9:198. [PMID: 38706509 PMCID: PMC11066522 DOI: 10.12688/wellcomeopenres.21261.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/21/2024] [Indexed: 05/07/2024] Open
Abstract
We present a genome assembly from an individual male Tetrao urogallus (the Western Capercaillie; Chordata; Aves; Galliformes; Phasianidae). The genome sequence is 1,013.2 megabases in length. Most of the assembly is scaffolded into 39 chromosomal pseudomolecules, including the Z sex chromosome. The mitochondrial genome has also been assembled and is 16.68 kilobases in length.
Collapse
Affiliation(s)
- Alex Ball
- RZSS WildGenes, Royal Zoological Society of Scotland, Edinburgh, Scotland, UK
| | | | - Molly Doubleday
- Royal Society for the Protection of Birds (RSPB) Scotland, Edinburgh, Scotland, UK
| | - Wellcome Sanger Institute Tree of Life Management, Samples and Laboratory team
- RZSS WildGenes, Royal Zoological Society of Scotland, Edinburgh, Scotland, UK
- Cairngorms National Park Authority, Grantown on Spey, Scotland, UK
- Royal Society for the Protection of Birds (RSPB) Scotland, Edinburgh, Scotland, UK
| | - Wellcome Sanger Institute Scientific Operations: Sequencing Operations
- RZSS WildGenes, Royal Zoological Society of Scotland, Edinburgh, Scotland, UK
- Cairngorms National Park Authority, Grantown on Spey, Scotland, UK
- Royal Society for the Protection of Birds (RSPB) Scotland, Edinburgh, Scotland, UK
| | - Wellcome Sanger Institute Tree of Life Core Informatics team
- RZSS WildGenes, Royal Zoological Society of Scotland, Edinburgh, Scotland, UK
- Cairngorms National Park Authority, Grantown on Spey, Scotland, UK
- Royal Society for the Protection of Birds (RSPB) Scotland, Edinburgh, Scotland, UK
| | - Tree of Life Core Informatics collective
- RZSS WildGenes, Royal Zoological Society of Scotland, Edinburgh, Scotland, UK
- Cairngorms National Park Authority, Grantown on Spey, Scotland, UK
- Royal Society for the Protection of Birds (RSPB) Scotland, Edinburgh, Scotland, UK
| | | |
Collapse
|
24
|
de Jong TV, Pan Y, Rastas P, Munro D, Tutaj M, Akil H, Benner C, Chen D, Chitre AS, Chow W, Colonna V, Dalgard CL, Demos WM, Doris PA, Garrison E, Geurts AM, Gunturkun HM, Guryev V, Hourlier T, Howe K, Huang J, Kalbfleisch T, Kim P, Li L, Mahaffey S, Martin FJ, Mohammadi P, Ozel AB, Polesskaya O, Pravenec M, Prins P, Sebat J, Smith JR, Solberg Woods LC, Tabakoff B, Tracey A, Uliano-Silva M, Villani F, Wang H, Sharp BM, Telese F, Jiang Z, Saba L, Wang X, Murphy TD, Palmer AA, Kwitek AE, Dwinell MR, Williams RW, Li JZ, Chen H. A revamped rat reference genome improves the discovery of genetic diversity in laboratory rats. CELL GENOMICS 2024; 4:100527. [PMID: 38537634 PMCID: PMC11019364 DOI: 10.1016/j.xgen.2024.100527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 12/26/2023] [Accepted: 02/29/2024] [Indexed: 04/09/2024]
Abstract
The seventh iteration of the reference genome assembly for Rattus norvegicus-mRatBN7.2-corrects numerous misplaced segments and reduces base-level errors by approximately 9-fold and increases contiguity by 290-fold compared with its predecessor. Gene annotations are now more complete, improving the mapping precision of genomic, transcriptomic, and proteomics datasets. We jointly analyzed 163 short-read whole-genome sequencing datasets representing 120 laboratory rat strains and substrains using mRatBN7.2. We defined ∼20.0 million sequence variations, of which 18,700 are predicted to potentially impact the function of 6,677 genes. We also generated a new rat genetic map from 1,893 heterogeneous stock rats and annotated transcription start sites and alternative polyadenylation sites. The mRatBN7.2 assembly, along with the extensive analysis of genomic variations among rat strains, enhances our understanding of the rat genome, providing researchers with an expanded resource for studies involving rats.
Collapse
Affiliation(s)
- Tristan V de Jong
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Yanchao Pan
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Pasi Rastas
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Daniel Munro
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA; Department of Integrative Structural and Computational Biology, Scripps Research, San Diego, CA, USA
| | - Monika Tutaj
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Huda Akil
- Michigan Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA
| | - Chris Benner
- Department of Medicine, University of California San Diego, San Diego, CA, USA
| | - Denghui Chen
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Apurva S Chitre
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - William Chow
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Vincenza Colonna
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy; Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Clifton L Dalgard
- Department of Anatomy, Physiology & Genetics, The American Genome Center, Uniformed Services University of the Health Sciences, Bethesda, MD, USA
| | - Wendy M Demos
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Peter A Doris
- The Brown Foundation Institute of Molecular Medicine, Center for Human Genetics, University of Texas Health Science Center, Houston, TX, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Aron M Geurts
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Hakan M Gunturkun
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Victor Guryev
- Genome Structure and Ageing, University of Groningen, UMC, Groningen, the Netherlands
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus in Hinxton, Cambridgeshire, UK
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Jun Huang
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ted Kalbfleisch
- Gluck Equine Research Center, Department of Veterinary Science, University of Kentucky, Louisville, KY, USA
| | - Panjun Kim
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ling Li
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA; Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Spencer Mahaffey
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus in Hinxton, Cambridgeshire, UK
| | - Pejman Mohammadi
- Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA, USA; Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, USA
| | - Ayse Bilge Ozel
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Oksana Polesskaya
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Michal Pravenec
- Institute of Physiology, Czech Academy of Sciences, Prague, Czechia
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jonathan Sebat
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Jennifer R Smith
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Leah C Solberg Woods
- Department of Internal Medicine, Section on Molecular Medicine, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Boris Tabakoff
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Alan Tracey
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | | | - Flavia Villani
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Hongyang Wang
- Department of Animal Sciences, Washington State University, Pullman, WA, USA
| | - Burt M Sharp
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Francesca Telese
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Zhihua Jiang
- Department of Animal Sciences, Washington State University, Pullman, WA, USA
| | - Laura Saba
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Xusheng Wang
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA; Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Abraham A Palmer
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA; Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Anne E Kwitek
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Melinda R Dwinell
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Robert W Williams
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jun Z Li
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
| | - Hao Chen
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA.
| |
Collapse
|
25
|
Mirarab S, Rivas-González I, Feng S, Stiller J, Fang Q, Mai U, Hickey G, Chen G, Brajuka N, Fedrigo O, Formenti G, Wolf JBW, Howe K, Antunes A, Schierup MH, Paten B, Jarvis ED, Zhang G, Braun EL. A region of suppressed recombination misleads neoavian phylogenomics. Proc Natl Acad Sci U S A 2024; 121:e2319506121. [PMID: 38557186 PMCID: PMC11009670 DOI: 10.1073/pnas.2319506121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 02/07/2024] [Indexed: 04/04/2024] Open
Abstract
Genomes are typically mosaics of regions with different evolutionary histories. When speciation events are closely spaced in time, recombination makes the regions sharing the same history small, and the evolutionary history changes rapidly as we move along the genome. When examining rapid radiations such as the early diversification of Neoaves 66 Mya, typically no consistent history is observed across segments exceeding kilobases of the genome. Here, we report an exception. We found that a 21-Mb region in avian genomes, mapped to chicken chromosome 4, shows an extremely strong and discordance-free signal for a history different from that of the inferred species tree. Such a strong discordance-free signal, indicative of suppressed recombination across many millions of base pairs, is not observed elsewhere in the genome for any deep avian relationships. Although long regions with suppressed recombination have been documented in recently diverged species, our results pertain to relationships dating circa 65 Mya. We provide evidence that this strong signal may be due to an ancient rearrangement that blocked recombination and remained polymorphic for several million years prior to fixation. We show that the presence of this region has misled previous phylogenomic efforts with lower taxon sampling, showing the interplay between taxon and locus sampling. We predict that similar ancient rearrangements may confound phylogenetic analyses in other clades, pointing to a need for new analytical models that incorporate the possibility of such events.
Collapse
Affiliation(s)
- Siavash Mirarab
- Electrical and Computer Engineering Department, University of California, San Diego, CA95032
| | | | - Shaohong Feng
- Center for Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou310058, China
- Liangzhu Laboratory, Zhejiang University, Hangzhou311121, China
| | - Josefin Stiller
- Section for Ecology & Evolution, Department of Biology, University of Copenhagen, København2100, Denmark
| | - Qi Fang
- BGI-Research, Shenzhen518083, China
| | - Uyen Mai
- Electrical and Computer Engineering Department, University of California, San Diego, CA95032
| | - Glenn Hickey
- Genomics Institute, University of California, Santa Cruz, CA96064
| | - Guangji Chen
- Center for Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou310058, China
- Liangzhu Laboratory, Zhejiang University, Hangzhou311121, China
| | - Nadolina Brajuka
- Vertebrate Genome Lab, Rockefeller University, New York, NY10065
| | - Olivier Fedrigo
- Vertebrate Genome Lab, Rockefeller University, New York, NY10065
| | - Giulio Formenti
- Vertebrate Genome Lab, Rockefeller University, New York, NY10065
| | - Jochen B. W. Wolf
- Division of Evolutionary Biology, Faculty of Biology, Ludwig-Maximillians-Universität, Munich82152, Germany
| | - Kerstin Howe
- Tree of Life Division, Wellcome Sanger Institute, CambridgeCB10 1RQ, United Kingdom
| | - Agostinho Antunes
- Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Porto4099-002, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Porto4099-002, Portugal
| | | | - Benedict Paten
- Genomics Institute, University of California, Santa Cruz, CA96064
| | - Erich D. Jarvis
- Vertebrate Genome Lab, Rockefeller University, New York, NY10065
| | - Guojie Zhang
- Center for Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou310058, China
| | - Edward L. Braun
- Department of Biology, University of Florida, Gainesville, FL32611
| |
Collapse
|
26
|
Peralta DM, Túnez JI, Rodríguez Cruz UE, Ceballos SG. A rapid approach for sex assignment by RAD-seq using a reference genome. PLoS One 2024; 19:e0297987. [PMID: 38578816 PMCID: PMC10997085 DOI: 10.1371/journal.pone.0297987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 01/14/2024] [Indexed: 04/07/2024] Open
Abstract
Sex identification is a common objective in molecular ecology. While many vertebrates display sexual dimorphism, determining the sex can be challenging in certain situations, such as species lacking clear sex-related phenotypic characteristics or in studies using non-invasive methods. In these cases, DNA analyses serve as valuable tools not only for sex determination but also for validating sex assignment based on phenotypic traits. In this study, we developed a bioinformatic framework for sex assignment using genomic data obtained through GBS, and having an available closely related genome assembled at the chromosome level. Our method consists of two ad hoc indexes that rely on the different properties of the mammalian heteromorphic sex chromosomes. For this purpose, we mapped RAD-seq loci to a reference genome and then obtained missingness and coverage depth values for the autosomes and X and Y chromosomes of each individual. Our methodology successfully determined the sex of 165 fur seals that had been phenotypically sexed in a previous study and 40 sea lions sampled in a non-invasive way. Additionally, we evaluated the accuracy of each index in sequences with varying average coverage depths, with Index Y proving greater reliability and robustness in assigning sex to individuals with low-depth coverage. We believe that the approach presented here can be extended to any animal taxa with known heteromorphic XY/ZW sex chromosome systems and that it can tolerate various qualities of GBS sequencing data.
Collapse
Affiliation(s)
- Diego M. Peralta
- Grupo de Investigación en Ecología Molecular, Instituto de Ecología y Desarrollo Sustentable (INEDES-CONICET-UNLu-CIC), Luján, Argentina
- Departamento de Ecología de la Diversidad, Instituto de Ecología, Universidad Nacional Autónoma de México, Ciudad de México, México
| | - Juan I. Túnez
- Grupo de Investigación en Ecología Molecular, Instituto de Ecología y Desarrollo Sustentable (INEDES-CONICET-UNLu-CIC), Luján, Argentina
- Departamento de Ciencias Básicas, Universidad Nacional de Luján, Luján, Argentina
| | - Ulises E. Rodríguez Cruz
- Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México, Ciudad de México, México
| | - Santiago G. Ceballos
- Instituto de Ciencias Polares, Ambiente y Recursos Naturales, Universidad Nacional de Tierra del Fuego, Ushuaia, Argentina
- Centro Austral de Investigaciones Científicas (CADIC-CONICET), Ushuaia, Argentina
| |
Collapse
|
27
|
Mikhaylova V, Rzepka M, Kawamura T, Xia Y, Chang PL, Zhou S, Paasch A, Pham L, Modi N, Yao L, Perez-Agustin A, Pagans S, Boles TC, Lei M, Wang Y, Garcia-Bassets I, Chen Z. Targeted phasing of 2-200 kilobase DNA fragments with a short-read sequencer and a single-tube linked-read library method. Sci Rep 2024; 14:7988. [PMID: 38580715 PMCID: PMC10997766 DOI: 10.1038/s41598-024-58733-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 04/02/2024] [Indexed: 04/07/2024] Open
Abstract
In the human genome, heterozygous sites refer to genomic positions with a different allele or nucleotide variant on the maternal and paternal chromosomes. Resolving these allelic differences by chromosomal copy, also known as phasing, is achievable on a short-read sequencer when using a library preparation method that captures long-range genomic information. TELL-Seq is a library preparation that captures long-range genomic information with the aid of molecular identifiers (barcodes). The same barcode is used to tag the reads derived from the same long DNA fragment within a range of up to 200 kilobases (kb), generating linked-reads. This strategy can be used to phase an entire genome. Here, we introduce a TELL-Seq protocol developed for targeted applications, enabling the phasing of enriched loci of varying sizes, purity levels, and heterozygosity. To validate this protocol, we phased 2-200 kb loci enriched with different methods: CRISPR/Cas9-mediated excision coupled with pulse-field electrophoresis for the longest fragments, CRISPR/Cas9-mediated protection from exonuclease digestion for mid-size fragments, and long PCR for the shortest fragments. All selected loci have known clinical relevance: BRCA1, BRCA2, MLH1, MSH2, MSH6, APC, PMS2, SCN5A-SCN10A, and PKI3CA. Collectively, the analyses show that TELL-Seq can accurately phase 2-200 kb targets using a short-read sequencer.
Collapse
Affiliation(s)
| | - Madison Rzepka
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | | | - Yu Xia
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | - Peter L Chang
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | | | - Amber Paasch
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | - Long Pham
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | - Naisarg Modi
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | - Likun Yao
- Department of Medicine, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Adrian Perez-Agustin
- Department of Medical Sciences, School of Medicine, University of Girona, Girona, Spain
| | - Sara Pagans
- Department of Medical Sciences, School of Medicine, University of Girona, Girona, Spain
| | | | - Ming Lei
- Universal Sequencing Technology Corp., Canton, MA, 02021, USA
| | - Yong Wang
- Universal Sequencing Technology Corp., Canton, MA, 02021, USA
| | | | - Zhoutao Chen
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA.
| |
Collapse
|
28
|
Karollus A, Hingerl J, Gankin D, Grosshauser M, Klemon K, Gagneur J. Species-aware DNA language models capture regulatory elements and their evolution. Genome Biol 2024; 25:83. [PMID: 38566111 PMCID: PMC10985990 DOI: 10.1186/s13059-024-03221-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 03/20/2024] [Indexed: 04/04/2024] Open
Abstract
BACKGROUND The rise of large-scale multi-species genome sequencing projects promises to shed new light on how genomes encode gene regulatory instructions. To this end, new algorithms are needed that can leverage conservation to capture regulatory elements while accounting for their evolution. RESULTS Here, we introduce species-aware DNA language models, which we trained on more than 800 species spanning over 500 million years of evolution. Investigating their ability to predict masked nucleotides from context, we show that DNA language models distinguish transcription factor and RNA-binding protein motifs from background non-coding sequence. Owing to their flexibility, DNA language models capture conserved regulatory elements over much further evolutionary distances than sequence alignment would allow. Remarkably, DNA language models reconstruct motif instances bound in vivo better than unbound ones and account for the evolution of motif sequences and their positional constraints, showing that these models capture functional high-order sequence and evolutionary context. We further show that species-aware training yields improved sequence representations for endogenous and MPRA-based gene expression prediction, as well as motif discovery. CONCLUSIONS Collectively, these results demonstrate that species-aware DNA language models are a powerful, flexible, and scalable tool to integrate information from large compendia of highly diverged genomes.
Collapse
Affiliation(s)
- Alexander Karollus
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Munich Center for Machine Learning, Munich, Germany
| | - Johannes Hingerl
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Dennis Gankin
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Martin Grosshauser
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Kristian Klemon
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Julien Gagneur
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Munich Center for Machine Learning, Munich, Germany.
- Institute of Human Genetics, School of Medicine and Health, Technical University of Munich, Munich, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
- Munich Data Science Institute, Technical University of Munich, Garching, Germany.
| |
Collapse
|
29
|
Upadhyay M, Pogorevc N, Medugorac I. scalepopgen: Bioinformatic Workflow Resources Implemented in Nextflow for Comprehensive Population Genomic Analyses. Mol Biol Evol 2024; 41:msae057. [PMID: 38507648 PMCID: PMC10994858 DOI: 10.1093/molbev/msae057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 02/07/2024] [Accepted: 03/04/2024] [Indexed: 03/22/2024] Open
Abstract
Population genomic analyses such as inference of population structure and identifying signatures of selection usually involve the application of a plethora of tools. The installation of tools and their dependencies, data transformation, or series of data preprocessing in a particular order sometimes makes the analyses challenging. While the usage of container-based technologies has significantly resolved the problems associated with the installation of tools and their dependencies, population genomic analyses requiring multistep pipelines or complex data transformation can greatly be facilitated by the application of workflow management systems such as Nextflow and Snakemake. Here, we present scalepopgen, a collection of fully automated workflows that can carry out widely used population genomic analyses on the biallelic single nucleotide polymorphism data stored in either variant calling format files or the plink-generated binary files. scalepopgen is developed in Nextflow and can be run locally or on high-performance computing systems using either Conda, Singularity, or Docker. The automated workflow includes procedures such as (i) filtering of individuals and genotypes; (ii) principal component analysis, admixture with identifying optimal K-values; (iii) running TreeMix analysis with or without bootstrapping and migration edges, followed by identification of an optimal number of migration edges; (iv) implementing single-population and pair-wise population comparison-based procedures to identify genomic signatures of selection. The pipeline uses various open-source tools; additionally, several Python and R scripts are also provided to collect and visualize the results. The tool is freely available at https://github.com/Popgen48/scalepopgen.
Collapse
Affiliation(s)
- Maulik Upadhyay
- Population Genomics Group, Department of Veterinary Sciences, LMU Munich, Martinsried 82152, Germany
| | - Neža Pogorevc
- Population Genomics Group, Department of Veterinary Sciences, LMU Munich, Martinsried 82152, Germany
| | - Ivica Medugorac
- Population Genomics Group, Department of Veterinary Sciences, LMU Munich, Martinsried 82152, Germany
| |
Collapse
|
30
|
Benham PM, Cicero C, Escalona M, Beraut E, Fairbairn C, Marimuthu MPA, Nguyen O, Sahasrabudhe R, King BL, Thomas WK, Kovach AI, Nachman MW, Bowie RCK. Remarkably High Repeat Content in the Genomes of Sparrows: The Importance of Genome Assembly Completeness for Transposable Element Discovery. Genome Biol Evol 2024; 16:evae067. [PMID: 38566597 PMCID: PMC11088854 DOI: 10.1093/gbe/evae067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/01/2024] [Accepted: 03/23/2024] [Indexed: 04/04/2024] Open
Abstract
Transposable elements (TE) play critical roles in shaping genome evolution. Highly repetitive TE sequences are also a major source of assembly gaps making it difficult to fully understand the impact of these elements on host genomes. The increased capacity of long-read sequencing technologies to span highly repetitive regions promises to provide new insights into patterns of TE activity across diverse taxa. Here we report the generation of highly contiguous reference genomes using PacBio long-read and Omni-C technologies for three species of Passerellidae sparrow. We compared these assemblies to three chromosome-level sparrow assemblies and nine other sparrow assemblies generated using a variety of short- and long-read technologies. All long-read based assemblies were longer (range: 1.12 to 1.41 Gb) than short-read assemblies (0.91 to 1.08 Gb) and assembly length was strongly correlated with the amount of repeat content. Repeat content for Bell's sparrow (31.2% of genome) was the highest level ever reported within the order Passeriformes, which comprises over half of avian diversity. The highest levels of repeat content (79.2% to 93.7%) were found on the W chromosome relative to other regions of the genome. Finally, we show that proliferation of different TE classes varied even among species with similar levels of repeat content. These patterns support a dynamic model of TE expansion and contraction even in a clade where TEs were once thought to be fairly depauperate and static. Our work highlights how the resolution of difficult-to-assemble regions of the genome with new sequencing technologies promises to transform our understanding of avian genome evolution.
Collapse
Affiliation(s)
- Phred M Benham
- Museum of Vertebrate Zoology, University of California Berkeley, Berkeley, CA 94720, USA
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Carla Cicero
- Museum of Vertebrate Zoology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Merly Escalona
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Eric Beraut
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Colin Fairbairn
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Mohan P A Marimuthu
- DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California-Davis, Davis, CA 95616, USA
| | - Oanh Nguyen
- DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California-Davis, Davis, CA 95616, USA
| | - Ruta Sahasrabudhe
- DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California-Davis, Davis, CA 95616, USA
| | - Benjamin L King
- Department of Molecular and Biomedical Sciences, University of Maine, Orono, ME 04469, USA
| | - W Kelley Thomas
- Department of Molecular, Cellular and Biomedical Sciences, University of New Hampshire, Durham, NH 03824, USA
| | - Adrienne I Kovach
- Department of Natural Resources and the Environment, University of New Hampshire, Durham, NH 03824, USA
| | - Michael W Nachman
- Museum of Vertebrate Zoology, University of California Berkeley, Berkeley, CA 94720, USA
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Rauri C K Bowie
- Museum of Vertebrate Zoology, University of California Berkeley, Berkeley, CA 94720, USA
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA
| |
Collapse
|
31
|
Halstead-Nussloch G, Signorini SG, Giulio M, Crocetta F, Munari M, Della Torre C, Weber AAT. The genome of the rayed Mediterranean limpet Patella caerulea (Linnaeus, 1758). Genome Biol Evol 2024; 16:evae070. [PMID: 38546725 PMCID: PMC11003540 DOI: 10.1093/gbe/evae070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/23/2024] [Indexed: 04/11/2024] Open
Abstract
Patella caerulea (Linnaeus, 1758) is a mollusc limpet species of the class Gastropoda. Endemic to the Mediterranean Sea, it is considered a keystone species due to its primary role in structuring and regulating the ecological balance of tidal and subtidal habitats. It is currently being used as a bioindicator to assess the environmental quality of coastal marine waters and as a model species to understand adaptation to ocean acidification. Here, we provide a high-quality reference genome assembly and annotation for P. caerulea. We generated ∼30 Gb of Pacific Biosciences high-fidelity data from a single individual and provide a final 749.8 Mb assembly containing 62 contigs, including the mitochondrial genome (14,938 bp). With an N50 of 48.8 Mb and 98% of the assembly contained in the 18 largest contigs, this assembly is near chromosome-scale. Benchmarking Universal Single-Copy Orthologs scores were high (Mollusca, 87.8% complete; Metazoa, 97.2% complete) and similar to metrics observed for other chromosome-level Patella genomes, highlighting a possible bias in the Mollusca database for Patellids. We generated transcriptomic Illumina data from a second individual collected at the same locality and used it together with protein evidence to annotate the genome. A total of 23,938 protein-coding gene models were found. By comparing this annotation with other published Patella annotations, we found that the distribution and median values of exon and gene lengths was comparable with other Patella species despite different annotation approaches. The present high-quality P. caerulea reference genome, available on GenBank (BioProject: PRJNA1045377; assembly: GCA_036850965.1), is an important resource for future ecological and evolutionary studies.
Collapse
Affiliation(s)
| | - Silvia Giorgia Signorini
- Department of Aquatic Ecology, Swiss Federal Institute of Aquatic Science and Technology (Eawag), Dübendorf, Switzerland
- Department of Biosciences, University of Milan, Milan, Italy
- Department of Integrative Marine Ecology, Stazione Zoologica Anton Dohrn, Naples, Italy
| | - Marco Giulio
- Department of Aquatic Ecology, Swiss Federal Institute of Aquatic Science and Technology (Eawag), Dübendorf, Switzerland
| | - Fabio Crocetta
- Department of Integrative Marine Ecology, Stazione Zoologica Anton Dohrn, Naples, Italy
- National Biodiversity Future Center (NBFC), Palermo, Italy
| | - Marco Munari
- Department of Integrative Marine Ecology, Stazione Zoologica Anton Dohrn, Naples, Italy
- Department of Biology, Stazione Idrobiologica ‘Umberto d’Ancona’, University of Padova, Chioggia, Italy
| | - Camilla Della Torre
- Department of Biosciences, University of Milan, Milan, Italy
- Department of Integrative Marine Ecology, Stazione Zoologica Anton Dohrn, Naples, Italy
| | - Alexandra Anh-Thu Weber
- Department of Aquatic Ecology, Swiss Federal Institute of Aquatic Science and Technology (Eawag), Dübendorf, Switzerland
| |
Collapse
|
32
|
da Roza PA, Muller H, Sullivan GJ, Walker RSK, Goold HD, Willows RD, Palenik B, Paulsen IT. Chromosome-scale assembly of the streamlined picoeukaryote Picochlorum sp. SENEW3 genome reveals Rabl-like chromatin structure and potential for C 4 photosynthesis. Microb Genom 2024; 10. [PMID: 38625719 DOI: 10.1099/mgen.0.001223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024] Open
Abstract
Genome sequencing and assembly of the photosynthetic picoeukaryotic Picochlorum sp. SENEW3 revealed a compact genome with a reduced gene set, few repetitive sequences, and an organized Rabl-like chromatin structure. Hi-C chromosome conformation capture revealed evidence of possible chromosomal translocations, as well as putative centromere locations. Maintenance of a relatively few selenoproteins, as compared to similarly sized marine picoprasinophytes Mamiellales, and broad halotolerance compared to others in Trebouxiophyceae, suggests evolutionary adaptation to variable salinity environments. Such adaptation may have driven size and genome minimization and have been enabled by the retention of a high number of membrane transporters. Identification of required pathway genes for both CAM and C4 photosynthetic carbon fixation, known to exist in the marine mamiellale pico-prasinophytes and seaweed Ulva, but few other chlorophyte species, further highlights the unique adaptations of this robust alga. This high-quality assembly provides a significant advance in the resources available for genomic investigations of this and other photosynthetic picoeukaryotes.
Collapse
Affiliation(s)
- Patrick A da Roza
- ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, NSW 2109, Australia
- School of Natural Sciences, Macquarie University, Sydney, Australia
| | - Héloïse Muller
- Institut Curie, PSL University, Sorbonne Université, CNRS, Nuclear Dynamics, 75005 Paris, France
| | - Geraldine J Sullivan
- ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, NSW 2109, Australia
- School of Natural Sciences, Macquarie University, Sydney, Australia
| | - Roy S K Walker
- ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, NSW 2109, Australia
- School of Natural Sciences, Macquarie University, Sydney, Australia
| | - Hugh D Goold
- ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, NSW 2109, Australia
- New South Wales Department of Primary Industries, Orange, NSW 2800, Australia
| | - Robert D Willows
- School of Natural Sciences, Macquarie University, Sydney, Australia
| | - Brian Palenik
- Scripps Institution of Oceanography, University of California, San Diego, La Jolla, California 92093-0202, USA
| | - Ian T Paulsen
- ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, NSW 2109, Australia
- School of Natural Sciences, Macquarie University, Sydney, Australia
| |
Collapse
|
33
|
Uno Y, Matsubara K. Unleashing diversity through flexibility: The evolutionary journey of sex chromosomes in amphibians and reptiles. JOURNAL OF EXPERIMENTAL ZOOLOGY. PART A, ECOLOGICAL AND INTEGRATIVE PHYSIOLOGY 2024; 341:230-241. [PMID: 38155517 DOI: 10.1002/jez.2776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 12/01/2023] [Accepted: 12/05/2023] [Indexed: 12/30/2023]
Abstract
Sex determination systems have greatly diversified between amphibians and reptiles, with such as the different sex chromosome compositions within a single species and transition between temperature-dependent sex determination (TSD) and genetic sex determination (GSD). In most sex chromosome studies on amphibians and reptiles, the whole-genome sequence of Xenopous tropicalis and chicken have been used as references to compare the chromosome homology of sex chromosomes among each of these taxonomic groups, respectively. In the present study, we reviewed existing reports on sex chromosomes, including karyotypes, in amphibians and reptiles. Furthermore, we compared the identified genetic linkages of sex chromosomes in amphibians and reptiles with the chicken genome as a reference, which is believed to resemble the ancestral tetrapod karyotype. Our findings revealed that sex chromosomes in amphibians are derived from genetic linkages homologous to various chicken chromosomes, even among several frogs within single families, such as Ranidae and Pipidae. In contrast, sex chromosomes in reptiles exhibit conserved genetic linkages with chicken chromosomes, not only across most species within a single family, but also within closely related families. The diversity of sex chromosomes in amphibians and reptiles may be attributed to the flexibility of their sex determination systems, including the ease of sex reversal in these animals.
Collapse
Affiliation(s)
- Yoshinobu Uno
- Department of Life Sciences, Graduate School of Arts and Sciences, The University of Tokyo, Tokyo, Japan
| | - Kazumi Matsubara
- Department of Bioscience and Biotechnology, Graduate School of Bioscience and Biotechnology, Chubu University, Kasugai, Aichi, Japan
| |
Collapse
|
34
|
Falk S, Monks J. The genome sequence of the common green furrow bee, Lasioglossum morio (Fabricius, 1793). Wellcome Open Res 2024; 8:28. [PMID: 38699201 PMCID: PMC11063680 DOI: 10.12688/wellcomeopenres.18715.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/21/2024] [Indexed: 05/05/2024] Open
Abstract
We present a genome assembly from an individual male Lasioglossum morio (the common green furrow bee; Arthropoda; Insecta; Hymenoptera; Halictidae). The genome sequence is 547 megabases in span. Over half of the assembly (55.79%) is scaffolded into 12 chromosomal pseudomolecules. The mitochondrial genome was also assembled, and is 16.8 kilobases in length. Gene annotation of this assembly on Ensembl identified 11,460 protein coding genes.
Collapse
Affiliation(s)
- Steven Falk
- Independent Researcher, Kenilworth, Warwickshire, UK
| | | | | | | | | | | | - Joseph Monks
- Department of Life Sciences- Hymenoptera section, Natural History Museum, London, UK
| | | |
Collapse
|
35
|
Yu H, Li Y, Han W, Bao L, Liu F, Ma Y, Pu Z, Zeng Q, Zhang L, Bao Z, Wang S. Pan-evolutionary and regulatory genome architecture delineated by an integrated macro- and microsynteny approach. Nat Protoc 2024:10.1038/s41596-024-00966-4. [PMID: 38514839 DOI: 10.1038/s41596-024-00966-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 12/20/2023] [Indexed: 03/23/2024]
Abstract
The forthcoming massive genome data generated by the Earth BioGenome Project will open up a new era of comparative genomics, for which genome synteny analysis provides an important framework. Profiling genome synteny represents an essential step in elucidating genome architecture, regulatory blocks/elements and their evolutionary history. Here we describe PanSyn, ( https://github.com/yhw320/PanSyn ), the most comprehensive and up-to-date genome synteny pipeline, providing step-by-step instructions and application examples to demonstrate its usage. PanSyn inherits both basic and advanced functions from existing popular tools, offering a user-friendly, highly customized approach for genome macrosynteny analysis and integrated pan-evolutionary and regulatory analysis of genome architecture, which are not yet available in public synteny software or tools. The advantages of PanSyn include: (i) advanced microsynteny analysis by functional profiling of microsynteny genes and associated regulatory elements; (ii) comprehensive macrosynteny analysis, including the inference of karyotype evolution from ancestors to extant species; and (iii) functional integration of microsynteny and macrosynteny for pan-evolutionary profiling of genome architecture and regulatory blocks, as well as integration with external functional genomics datasets from three- or four-dimensional genome and ENCODE projects. PanSyn requires basic knowledge of the Linux environment and Perl programming language and the ability to access a computer cluster, especially for large-scale genomic comparisons. Our protocol can be easily implemented by a competent graduate student or postdoc and takes several days to weeks to execute for dozens to hundreds of genomes. PanSyn provides yet the most comprehensive and powerful tool for integrated evolutionary and functional genomics.
Collapse
Affiliation(s)
- Hongwei Yu
- Fang Zongxi Center for Marine Evo-Devo & MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao, China
| | - Yuli Li
- Fang Zongxi Center for Marine Evo-Devo & MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao, China.
- Laboratory for Marine Biology and Biotechnology, Laoshan Laboratory, Qingdao, China.
| | - Wentao Han
- Fang Zongxi Center for Marine Evo-Devo & MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao, China
| | - Lisui Bao
- Institute of Evolution & Marine Biodiversity, Ocean University of China, Qingdao, China
| | - Fuyun Liu
- Fang Zongxi Center for Marine Evo-Devo & MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao, China
| | - Yuanting Ma
- Fang Zongxi Center for Marine Evo-Devo & MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao, China
| | - Zhongqi Pu
- Fang Zongxi Center for Marine Evo-Devo & MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao, China
| | - Qifan Zeng
- Fang Zongxi Center for Marine Evo-Devo & MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao, China
| | - Lingling Zhang
- Fang Zongxi Center for Marine Evo-Devo & MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao, China
- Laboratory for Marine Biology and Biotechnology, Laoshan Laboratory, Qingdao, China
| | - Zhenmin Bao
- Fang Zongxi Center for Marine Evo-Devo & MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao, China
- Southern Marine Science and Engineering Guangdong Laboratory, Guangzhou, China
- Key Laboratory of Tropical Aquatic Germplasm of Hainan Province, Sanya Oceanographic Institution, Ocean University of China, Sanya, China
- Laboratory for Marine Fisheries and Aquaculture, Laoshan Laboratory, Qingdao, China
| | - Shi Wang
- Fang Zongxi Center for Marine Evo-Devo & MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao, China.
- Laboratory for Marine Biology and Biotechnology, Laoshan Laboratory, Qingdao, China.
- Southern Marine Science and Engineering Guangdong Laboratory, Guangzhou, China.
- Key Laboratory of Tropical Aquatic Germplasm of Hainan Province, Sanya Oceanographic Institution, Ocean University of China, Sanya, China.
| |
Collapse
|
36
|
Nanni A, Titus-McQuillan J, Bankole KS, Pardo-Palacios F, Signor S, Vlaho S, Moskalenko O, Morse A, Rogers RL, Conesa A, McIntyre LM. Nucleotide-level distance metrics to quantify alternative splicing implemented in TranD. Nucleic Acids Res 2024; 52:e28. [PMID: 38340337 PMCID: PMC10954468 DOI: 10.1093/nar/gkae056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 11/29/2023] [Accepted: 01/18/2024] [Indexed: 02/12/2024] Open
Abstract
Advances in affordable transcriptome sequencing combined with better exon and gene prediction has motivated many to compare transcription across the tree of life. We develop a mathematical framework to calculate complexity and compare transcript models. Structural features, i.e. intron retention (IR), donor/acceptor site variation, alternative exon cassettes, alternative 5'/3' UTRs, are compared and the distance between transcript models is calculated with nucleotide level precision. All metrics are implemented in a PyPi package, TranD and output can be used to summarize splicing patterns for a transcriptome (1GTF) and between transcriptomes (2GTF). TranD output enables quantitative comparisons between: annotations augmented by empirical RNA-seq data and the original transcript models; transcript model prediction tools for longread RNA-seq (e.g. FLAIR versus Isoseq3); alternate annotations for a species (e.g. RefSeq vs Ensembl); and between closely related species. In C. elegans, Z. mays, D. melanogaster, D. simulans and H. sapiens, alternative exons were observed more frequently in combination with an alternative donor/acceptor than alone. Transcript models in RefSeq and Ensembl are linked and both have unique transcript models with empirical support. D. melanogaster and D. simulans, share many transcript models and long-read RNAseq data suggests that both species are under-annotated. We recommend combined references.
Collapse
Affiliation(s)
- Adalena Nanni
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32611, USA
- University of Florida Genetics Institute, University of Florida, Gainesville, FL 32611, USA
| | - James Titus-McQuillan
- University of North Carolina at Charlotte Department of Bioinformatics and Genomics Charlotte, NC, USA
| | - Kinfeosioluwa S Bankole
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32611, USA
- University of Florida Genetics Institute, University of Florida, Gainesville, FL 32611, USA
| | | | - Sarah Signor
- Department of Biological Sciences, North Dakota State University, Fargo, ND, USA
| | - Srna Vlaho
- Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA
| | - Oleksandr Moskalenko
- University of Florida Research Computing, University of Florida, Gainesville, FL 32611, USA
| | - Alison M Morse
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32611, USA
- University of Florida Genetics Institute, University of Florida, Gainesville, FL 32611, USA
| | - Rebekah L Rogers
- University of North Carolina at Charlotte Department of Bioinformatics and Genomics Charlotte, NC, USA
| | - Ana Conesa
- Institute for Integrative Systems Biology. Spanish National Research Council, Paterna, Spain
| | - Lauren M McIntyre
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32611, USA
- University of Florida Genetics Institute, University of Florida, Gainesville, FL 32611, USA
| |
Collapse
|
37
|
Olbrich M, Bartels L, Wohlers I. Sequencing technologies and hardware-accelerated parallel computing transform computational genomics research. FRONTIERS IN BIOINFORMATICS 2024; 4:1384497. [PMID: 38567256 PMCID: PMC10985184 DOI: 10.3389/fbinf.2024.1384497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 03/07/2024] [Indexed: 04/04/2024] Open
Affiliation(s)
- Michael Olbrich
- Center for Biotechnology, Khalifa University for Science and Technology, Abu Dhabi, United Arab Emirates
| | - Lennart Bartels
- Biomolecular Data Science in Pneumology, Research Center Borstel, Borstel, Germany
| | - Inken Wohlers
- Biomolecular Data Science in Pneumology, Research Center Borstel, Borstel, Germany
- University of Lübeck, Lübeck, Germany
| |
Collapse
|
38
|
Li R, Li J, Lopez JV, Oatley G, Clayton-Lucey IA, Sinclair E, Aunin E, Gettle N, Santos C, Paulini M, Niu H, McKenna V, O’Brien R. The genome sequence of the giant clam, Tridacna gigas (Linnaeus, 1758). Wellcome Open Res 2024; 9:145. [PMID: 38800516 PMCID: PMC11116938 DOI: 10.12688/wellcomeopenres.21136.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/28/2024] [Indexed: 05/29/2024] Open
Abstract
We present a chromosomal-level genome assembly from an individual Tridacna gigas (the giant clam; Mollusca; Bivalvia; Veneroida; Cardiidae). The genome sequence is 1,175.9 megabases in span. Most of the assembly is scaffolded into 17 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 25.34 kilobases in length. Gene annotation of this assembly on Ensembl identified 18,177 protein coding genes.
Collapse
Affiliation(s)
- Ruiqi Li
- Ecology & Evolutionary Biology, University of Colorado Boulder, Boulder, Colorado, USA
| | - Jingchun Li
- Ecology & Evolutionary Biology, University of Colorado Boulder, Boulder, Colorado, USA
- Museum of Natural History, University of Colorado Boulder, Boulder, Colorado, USA
| | - Jose Victor Lopez
- Department of Biological Sciences, Nova Southeastern University, Dania Beach, Florida, USA
| | - Graeme Oatley
- Tree of Life, Wellcome Sanger Institute, Hinxton, England, UK
| | | | | | - Eerik Aunin
- Tree of Life, Wellcome Sanger Institute, Hinxton, England, UK
| | - Noah Gettle
- Tree of Life, Wellcome Sanger Institute, Hinxton, England, UK
| | - Camilla Santos
- Tree of Life, Wellcome Sanger Institute, Hinxton, England, UK
| | - Michael Paulini
- Tree of Life, Wellcome Sanger Institute, Hinxton, England, UK
| | - Haoyu Niu
- Tree of Life, Wellcome Sanger Institute, Hinxton, England, UK
| | | | - Rebecca O’Brien
- Tree of Life, Wellcome Sanger Institute, Hinxton, England, UK
| | - Wellcome Sanger Institute Tree of Life Management, Samples and Laboratory Team
- Ecology & Evolutionary Biology, University of Colorado Boulder, Boulder, Colorado, USA
- Museum of Natural History, University of Colorado Boulder, Boulder, Colorado, USA
- Department of Biological Sciences, Nova Southeastern University, Dania Beach, Florida, USA
- Tree of Life, Wellcome Sanger Institute, Hinxton, England, UK
| | - Wellcome Sanger Institute Scientific Operations: Sequencing Operations
- Ecology & Evolutionary Biology, University of Colorado Boulder, Boulder, Colorado, USA
- Museum of Natural History, University of Colorado Boulder, Boulder, Colorado, USA
- Department of Biological Sciences, Nova Southeastern University, Dania Beach, Florida, USA
- Tree of Life, Wellcome Sanger Institute, Hinxton, England, UK
| | - Wellcome Sanger Institute Tree of Life Core Informatics Team
- Ecology & Evolutionary Biology, University of Colorado Boulder, Boulder, Colorado, USA
- Museum of Natural History, University of Colorado Boulder, Boulder, Colorado, USA
- Department of Biological Sciences, Nova Southeastern University, Dania Beach, Florida, USA
- Tree of Life, Wellcome Sanger Institute, Hinxton, England, UK
| | - EBI Aquatic Symbiosis Genomics Data Portal Team
- Ecology & Evolutionary Biology, University of Colorado Boulder, Boulder, Colorado, USA
- Museum of Natural History, University of Colorado Boulder, Boulder, Colorado, USA
- Department of Biological Sciences, Nova Southeastern University, Dania Beach, Florida, USA
- Tree of Life, Wellcome Sanger Institute, Hinxton, England, UK
| | | |
Collapse
|
39
|
Koren S, Bao Z, Guarracino A, Ou S, Goodwin S, Jenike KM, Lucas J, McNulty B, Park J, Rautiainen M, Rhie A, Roelofs D, Schneiders H, Vrijenhoek I, Nijbroek K, Ware D, Schatz MC, Garrison E, Huang S, McCombie WR, Miga KH, Wittenberg AH, Phillippy AM. Gapless assembly of complete human and plant chromosomes using only nanopore sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.15.585294. [PMID: 38529488 PMCID: PMC10962732 DOI: 10.1101/2024.03.15.585294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
The combination of ultra-long Oxford Nanopore (ONT) sequencing reads with long, accurate PacBio HiFi reads has enabled the completion of a human genome and spurred similar efforts to complete the genomes of many other species. However, this approach for complete, "telomere-to-telomere" genome assembly relies on multiple sequencing platforms, limiting its accessibility. ONT "Duplex" sequencing reads, where both strands of the DNA are read to improve quality, promise high per-base accuracy. To evaluate this new data type, we generated ONT Duplex data for three widely-studied genomes: human HG002, Solanum lycopersicum Heinz 1706 (tomato), and Zea mays B73 (maize). For the diploid, heterozygous HG002 genome, we also used "Pore-C" chromatin contact mapping to completely phase the haplotypes. We found the accuracy of Duplex data to be similar to HiFi sequencing, but with read lengths tens of kilobases longer, and the Pore-C data to be compatible with existing diploid assembly algorithms. This combination of read length and accuracy enables the construction of a high-quality initial assembly, which can then be further resolved using the ultra-long reads, and finally phased into chromosome-scale haplotypes with Pore-C. The resulting assemblies have a base accuracy exceeding 99.999% (Q50) and near-perfect continuity, with most chromosomes assembled as single contigs. We conclude that ONT sequencing is a viable alternative to HiFi sequencing for de novo genome assembly, and has the potential to provide a single-instrument solution for the reconstruction of complete genomes.
Collapse
Affiliation(s)
- Sergey Koren
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Zhigui Bao
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, BadenWürttemberg, Germany
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, USA
- Human Technopole, Milan, Italy
| | - Shujun Ou
- Ohio State University, Columbus, OH, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Katharine M. Jenike
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Julian Lucas
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Brandy McNulty
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Jimin Park
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Arang Rhie
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Dick Roelofs
- KeyGene, Agro Business Park 90, 6708 PW Wageningen, Netherlands
| | | | - Ilse Vrijenhoek
- KeyGene, Agro Business Park 90, 6708 PW Wageningen, Netherlands
| | - Koen Nijbroek
- KeyGene, Agro Business Park 90, 6708 PW Wageningen, Netherlands
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, USA
| | - Sanwen Huang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- State Key Laboratory of Tropical Crop Breeding, Chinese Academy of Tropical Agricultural Sciences, Haikou, Hainan, China
| | | | - Karen H. Miga
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Adam M. Phillippy
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
40
|
Wu S, Wang K, Dou T, Yuan S, Yan S, Xu Z, Liu Y, Jian Z, Zhao J, Zhao R, Zi X, Gu D, Liu L, Li Q, Wu DD, Jia J, Su Z, Ge C. High quality assemblies of four indigenous chicken genomes and related functional data resources. Sci Data 2024; 11:300. [PMID: 38490983 PMCID: PMC10942973 DOI: 10.1038/s41597-024-03126-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Accepted: 03/05/2024] [Indexed: 03/18/2024] Open
Abstract
Many lines of evidence indicate that red jungle fowl (RJF) is the primary ancestor of domestic chickens. Although multiple versions of RJF (galgal2-galgal5 and GRCg6a) and commercial chickens (GRCg7b/w and Huxu) genomes have been assembled since 2004, no high-quality indigenous chicken genomes have been assembled, hampering the understanding of chicken domestication and evolution. To fill the gap, we sequenced the genomes of four indigenous chickens with distinct morphological traits in southwest China, using a combination of short, long and Hi-C reads. We assembled each genome (~1.0 Gb) into 42 chromosomes with chromosome N50 90.5-90.9 Mb, amongst the highest quality of chicken genome assemblies. To provide resources for gene annotation and functional analysis, we also sequenced transcriptomes of 10 tissues for each of the four chickens. Moreover, we corrected many mis-assemblies and assembled missing micro-chromosomes 29 and 34-39 for GRCg6a. Our assemblies, sequencing data and the correction of GRCg6a can be valuable resources for studying chicken domestication and evolution.
Collapse
Affiliation(s)
- Siwen Wu
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, 650201, China
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Kun Wang
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, 650201, China
| | - Tengfei Dou
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, 650201, China
| | - Sisi Yuan
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Shixiong Yan
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, 650201, China
| | - Zhiqiang Xu
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, 650201, China
| | - Yong Liu
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, 650201, China
| | - Zonghui Jian
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, 650201, China
| | - Jingying Zhao
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, 650201, China
| | - Rouhan Zhao
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, 650201, China
| | - Xiannian Zi
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, 650201, China
| | - Dahai Gu
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, 650201, China
| | - Lixian Liu
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, 650201, China
| | - Qihua Li
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, 650201, China
| | - Dong-Dong Wu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| | - Junjing Jia
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, 650201, China.
| | - Zhengchang Su
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA.
| | - Changrong Ge
- Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, Yunnan, 650201, China.
| |
Collapse
|
41
|
Capel SLR, Hamilton NM, Fraser D, Escalona M, Nguyen O, Sacco S, Sahasrabudhe R, Seligmann W, Vazquez JM, Sudmant PH, Morrison ML, Wayne RK, Buchalski MR. Reference genome of Townsend's big-eared bat, Corynorhinus townsendii. J Hered 2024; 115:203-211. [PMID: 38092381 PMCID: PMC10936552 DOI: 10.1093/jhered/esad078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Accepted: 12/11/2023] [Indexed: 03/14/2024] Open
Abstract
Townsend's big-eared bat, Corynorhinus townsendii, is a cave- and mine-roosting species found largely in western North America. Considered a species of conservation concern throughout much of its range, protection efforts would greatly benefit from understanding patterns of population structure, genetic diversity, and local adaptation. To facilitate such research, we present the first de novo genome assembly of C. townsendii as part of the California Conservation Genomics Project (CCGP). Pacific Biosciences HiFi long reads and Omni-C chromatin-proximity sequencing technologies were used to produce a de novo genome assembly, consistent with the standard CCGP reference genome protocol. This assembly comprises 391 scaffolds spanning 2.1 Gb, represented by a scaffold N50 of 174.6 Mb, a contig N50 of 23.4 Mb, and a benchmarking universal single-copy ortholog (BUSCO) completeness score of 96.6%. This high-quality genome will be a key tool for informed conservation and management of this vulnerable species in California and across its range.
Collapse
Affiliation(s)
- Samantha L R Capel
- Wildlife Genetics Research Unit, Wildlife Health Laboratory, California Department of Fish and Wildlife, Sacramento, CA, United States
| | - Natalie M Hamilton
- Department of Rangeland Wildlife and Fisheries Management, Texas A&M University, College Station, TX, United States
| | - Devaughn Fraser
- Connecticut Department of Energy and Environmental Protection, Hartford, CT, United States
| | - Merly Escalona
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States
| | - Oanh Nguyen
- DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California Davis, Davis, CA, United States
| | - Samuel Sacco
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, United States
| | - Ruta Sahasrabudhe
- DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California Davis, Davis, CA, United States
| | - William Seligmann
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, United States
| | - Juan M Vazquez
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA, United States
| | - Peter H Sudmant
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA, United States
| | - Michael L Morrison
- Department of Rangeland Wildlife and Fisheries Management, Texas A&M University, College Station, TX, United States
| | - Robert K Wayne
- Department of Ecology and Evolution, University of California Los Angeles, Los Angeles, CA, United States
| | - Michael R Buchalski
- Wildlife Genetics Research Unit, Wildlife Health Laboratory, California Department of Fish and Wildlife, Sacramento, CA, United States
| |
Collapse
|
42
|
Baker DN, Abueg L, Escalona M, Farquharson KA, Lanyon JM, Le Duc D, Schöneberg T, Absolon D, Sims Y, Fedrigo O, Jarvis ED, Belov K, Hogg CJ, Shapiro B. A chromosome-level genome assembly for the dugong (Dugong dugon). J Hered 2024; 115:212-220. [PMID: 38245832 PMCID: PMC10936554 DOI: 10.1093/jhered/esae003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 01/16/2024] [Indexed: 01/22/2024] Open
Abstract
The dugong (Dugong dugon) is a marine mammal widely distributed throughout the Indo-Pacific and the Red Sea, with a Vulnerable conservation status, and little is known about many of the more peripheral populations, some of which are thought to be close to extinction. We present a de novo high-quality genome assembly for the dugong from an individual belonging to the well-monitored Moreton Bay population in Queensland, Australia. Our assembly uses long-read PacBio HiFi sequencing and Omni-C data following the Vertebrate Genome Project pipeline to reach chromosome-level contiguity (24 chromosome-level scaffolds; 3.16 Gbp) and high completeness (97.9% complete BUSCOs). We observed relatively high genome-wide heterozygosity, which likely reflects historical population abundance before the last interglacial period, approximately 125,000 yr ago. Demographic inference suggests that dugong populations began declining as sea levels fell after the last interglacial period, likely a result of population fragmentation and habitat loss due to the exposure of seagrass meadows. We find no evidence for ongoing recent inbreeding in this individual. However, runs of homozygosity indicate some past inbreeding. Our draft genome assembly will enable range-wide assessments of genetic diversity and adaptation, facilitate effective management of dugong populations, and allow comparative genomics analyses including with other sirenians, the oldest marine mammal lineage.
Collapse
Affiliation(s)
- Dorothy Nevé Baker
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, United States
| | - Linelle Abueg
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, United States
| | - Merly Escalona
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States
| | - Katherine A Farquharson
- Faculty of Science, School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW, Australia
- Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, The University of Sydney, NSW, Australia
| | - Janet M Lanyon
- School of Biological Sciences, The University of Queensland, St Lucia, QLD, Australia
| | - Diana Le Duc
- Institute of Human Genetics, University Medical Center Leipzig, Leipzig, Germany
| | - Torsten Schöneberg
- Medical Faculty, Rudolf Schönheimer Institute of Biochemistry, University of Leipzig, Leipzig, Germany
- School of Medicine, University of Global Health Equity, Kigali, Rwanda
| | - Dominic Absolon
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| | - Ying Sims
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| | | | - Erich D Jarvis
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, United States
- Howard Hughes Medical Institute, Chevy Chase, MD, United States
| | - Katherine Belov
- Faculty of Science, School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW, Australia
- Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, The University of Sydney, NSW, Australia
| | - Carolyn J Hogg
- Faculty of Science, School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW, Australia
- Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, The University of Sydney, NSW, Australia
| | - Beth Shapiro
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, United States
- Howard Hughes Medical Institute, Chevy Chase, MD, United States
| |
Collapse
|
43
|
Mead A, Fitz-Gibbon ST, Escalona M, Beraut E, Sacco S, Marimuthu MPA, Nguyen O, Sork VL. The genome assembly of Island Oak (Quercus tomentella), a relictual island tree species. J Hered 2024; 115:221-229. [PMID: 38305464 PMCID: PMC10936553 DOI: 10.1093/jhered/esae002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 02/01/2024] [Indexed: 02/03/2024] Open
Abstract
Island oak (Quercus tomentella) is a rare relictual island tree species that exists only on six islands off the coast of California and Mexico, but was once widespread throughout mainland California. Currently, this species is endangered by threats such as non-native plants, grazing animals, and human removal. Efforts for conservation and restoration of island oak currently underway could benefit from information about its range-wide genetic structure and evolutionary history. Here we present a high-quality genome assembly for Q. tomentella, assembled using PacBio HiFi and Omni-C sequencing, developed as part of the California Conservation Genomics Project (CCGP). The resulting assembly has a length of 781 Mb, with a contig N50 of 22.0 Mb and a scaffold N50 of 63.4 Mb. This genome assembly will provide a resource for genomics-informed conservation of this rare oak species. Additionally, this reference genome will be the first one available for a species in Quercus section Protobalanus, a unique oak clade present only in western North America.
Collapse
Affiliation(s)
- Alayna Mead
- Department of Ecology and Evolutionary Biology, University of California Los Angeles, Los Angeles CA 90095-7239, United States
| | - Sorel T Fitz-Gibbon
- Department of Ecology and Evolutionary Biology, University of California Los Angeles, Los Angeles CA 90095-7239, United States
| | - Merly Escalona
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, United States
| | - Eric Beraut
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, United States
| | - Samuel Sacco
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, United States
| | - Mohan P A Marimuthu
- DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California, Davis, CA 95616, United States
| | - Oanh Nguyen
- DNA Technologies and Expression Analysis Core Laboratory, Genome Center, University of California, Davis, CA 95616, United States
| | - Victoria L Sork
- Department of Ecology and Evolutionary Biology, University of California Los Angeles, Los Angeles CA 90095-7239, United States
- Institute of the Environment and Sustainability, University of California Los Angeles, Los Angeles CA 90095, United States
| |
Collapse
|
44
|
Harshan P, Sukumaran S, Gopalakrishnan A. De novo transcriptome for Chiloscyllium griseum, a long-tail carpet shark of the Indian waters. Sci Data 2024; 11:285. [PMID: 38461175 PMCID: PMC10924892 DOI: 10.1038/s41597-024-03093-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 02/27/2024] [Indexed: 03/11/2024] Open
Abstract
Sharks have thrived in the oceans for 400 million years, experienced five extinctions and evolved into today's apex predators. However, enormous genome size, poor karyotyping and limited tissue sampling options are the bottlenecks in shark research. Sharks of the family Orectolobiformes act as model species in transcriptome research with exceptionally high reproductive fecundity, catch prominence and oviparity. The present study illustrates a de novo transcriptome for an adult grey bamboo shark, Chiloscyllium griseum (Chondrichthyes; Hemiscyllidae) using paired-end RNA sequencing. Around 150 million short Illumina reads were obtained from five different tissues and assembled using the Trinity assembler. 70,647 hits on Uniprot by BLASTX was obtained after the transcriptome annotation. The data generated serve as a basis for transcriptome-based population genetic studies and open up new avenues in the field of comparative transcriptomics and conservation biology.
Collapse
Affiliation(s)
- Pooja Harshan
- Marine Biotechnology, Fish Nutrition and Health Division, ICAR-Central Marine Fisheries Research Institute, Ernakulam North P.O., Kochi, Kerala, 682018, India.
- Cochin University of Science and Technology, South Kalamassery, Ernakulam, Kerala, 682022, India.
| | - Sandhya Sukumaran
- Marine Biotechnology, Fish Nutrition and Health Division, ICAR-Central Marine Fisheries Research Institute, Ernakulam North P.O., Kochi, Kerala, 682018, India
| | - A Gopalakrishnan
- Marine Biotechnology, Fish Nutrition and Health Division, ICAR-Central Marine Fisheries Research Institute, Ernakulam North P.O., Kochi, Kerala, 682018, India
| |
Collapse
|
45
|
Boyes D, Mulhair PO. The genome sequence of the Water Veneer, Acentria ephemerella (Denis & Schiffermüller, 1775). Wellcome Open Res 2024; 9:134. [PMID: 38779149 PMCID: PMC11109561 DOI: 10.12688/wellcomeopenres.21099.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/19/2024] [Indexed: 05/25/2024] Open
Abstract
We present a genome assembly from an individual male Acentria ephemerella (the Water Veneer; Arthropoda; Insecta; Lepidoptera; Crambidae). The genome sequence is 340.8 megabases in span. Most of the assembly is scaffolded into 31 chromosomal pseudomolecules, including the Z sex chromosome. The mitochondrial genome has also been assembled and is 15.35 kilobases in length. Gene annotation of this assembly on Ensembl identified 17,748 protein coding genes.
Collapse
Affiliation(s)
- Douglas Boyes
- UK Centre for Ecology & Hydrology, Wallingford, England, UK
| | | | | | | | | | | | | | | | | |
Collapse
|
46
|
Boyes D, Crowley LM, Holland PW. The genome sequence of the Summer Chafer, Amphimallon solstitiale (Linnaeus, 1758). Wellcome Open Res 2024; 9:138. [PMID: 38784435 PMCID: PMC11112308 DOI: 10.12688/wellcomeopenres.21100.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/19/2024] [Indexed: 05/25/2024] Open
Abstract
We present a genome assembly from an individual male Amphimallon solstitiale (the Summer Chafer; Arthropoda; Insecta; Coleoptera; Scarabaeidae). The genome sequence is 1,584.1 megabases in span. Most of the assembly is scaffolded into 11 chromosomal pseudomolecules, including the X and Y sex chromosomes. The mitochondrial genome has also been assembled and is 19.29 kilobases in length.
Collapse
Affiliation(s)
- Douglas Boyes
- UK Centre for Ecology & Hydrology, Wallingford, England, UK
| | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Falk S, Crowley LM, Clements DK. The genome sequence of the Four-banded Bee-grabber, Conops quadrifasciatus De Geer, 1776. Wellcome Open Res 2024; 9:136. [PMID: 38784436 PMCID: PMC11112309 DOI: 10.12688/wellcomeopenres.21106.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/23/2024] [Indexed: 05/25/2024] Open
Abstract
We present a genome assembly from an individual male Conops quadrifasciatus (the Four-banded Bee-grabber; Arthropoda; Insecta; Diptera; Conopidae). The genome sequence is 210.4 megabases in span. Most of the assembly is scaffolded into 7 chromosomal pseudomolecules, including the X and Y sex chromosomes. The mitochondrial genome has also been assembled and is 18.07 kilobases in length. Gene annotation of this assembly on Ensembl identified 23,090 protein coding genes.
Collapse
Affiliation(s)
- Steven Falk
- Independent researcher, Kenilworth, England, UK
| | | | | | | | | | - Wellcome Sanger Institute Tree of Life Management, Samples and Laboratory team
- Independent researcher, Kenilworth, England, UK
- University of Oxford, Oxford, England, UK
- Independent researcher, Cardiff, Wales, UK
| | | | | | | | | |
Collapse
|
48
|
Robson ES, Ioannidis NM. GUANinE v1.0: Benchmark Datasets for Genomic AI Sequence-to-Function Models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.12.562113. [PMID: 37904945 PMCID: PMC10614795 DOI: 10.1101/2023.10.12.562113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/01/2023]
Abstract
Computational genomics increasingly relies on machine learning methods for genome interpretation, and the recent adoption of neural sequence-to-function models highlights the need for rigorous model specification and controlled evaluation, problems familiar to other fields of AI. Research strategies that have greatly benefited other fields - including benchmarking, auditing, and algorithmic fairness - are also needed to advance the field of genomic AI and to facilitate model development. Here we propose a genomic AI benchmark, GUANinE, for evaluating model generalization across a number of distinct genomic tasks. Compared to existing task formulations in computational genomics, GUANinE is large-scale, de-noised, and suitable for evaluating pretrained models. GUANinE v1.0 primarily focuses on functional genomics tasks such as functional element annotation and gene expression prediction, and it also draws upon connections to evolutionary biology through sequence conservation tasks. The current GUANinE tasks provide insight into the performance of existing genomic AI models and non-neural baselines, with opportunities to be refined, revisited, and broadened as the field matures. Finally, the GUANinE benchmark allows us to evaluate new self-supervised T5 models and explore the tradeoffs between tokenization and model performance, while showcasing the potential for self-supervision to complement existing pretraining procedures.
Collapse
Affiliation(s)
- Eyes S Robson
- Center for Computational Biology, UC Berkeley, Berkeley, CA 94720
| | - Nilah M Ioannidis
- Department of Electrical Engineering and Computer Sciences, UC Berkeley, Berkeley, CA 94720
| |
Collapse
|
49
|
Hirabayashi K, Debnath SC, Owens GL. Unveiling the evolutionary history of lingonberry (Vaccinium vitis-idaea L.) through genome sequencing and assembly of European and North American subspecies. G3 (BETHESDA, MD.) 2024; 14:jkad294. [PMID: 38142435 PMCID: PMC10917501 DOI: 10.1093/g3journal/jkad294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 10/23/2023] [Accepted: 12/18/2023] [Indexed: 12/26/2023]
Abstract
Lingonberry (Vaccinium vitis-idaea L.) produces tiny red berries that are tart and nutty in flavor. It grows widely in the circumpolar region, including Scandinavia, northern parts of Eurasia, Alaska, and Canada. Although cultivation is currently limited, the plant has a long history of cultural use among indigenous communities. Given its potential as a food source, genomic resources for lingonberry are significantly lacking. To advance genomic knowledge, the genomes for 2 subspecies of lingonberry (V. vitis-idaea ssp. minus and ssp. vitis-idaea var. 'Red Candy') were sequenced and de novo assembled into contig-level assemblies. The assemblies were scaffolded using the bilberry genome (Vaccinium myrtillus) to generate a chromosome-anchored reference genome consisting of 12 chromosomes each with a total length of 548.07 Mb [contig N50 = 1.17 Mb, BUSCO (C%) = 96.5%] for ssp. vitis-idaea and 518.70 Mb [contig N50 = 1.40 Mb, BUSCO (C%) = 96.9%] for ssp. minus. RNA-seq-based gene annotation identified 27,243 and 25,718 genes on the respective assembly, and transposable element detection methods found that 45.82 and 44.58% of the genome were repeats. Phylogenetic analysis confirmed that lingonberry was most closely related to bilberry and was more closely related to blueberries than cranberries. Estimates of past effective population size suggested a continuous decline over the past 1-3 MYA, possibly due to the impacts of repeated glacial cycles during the Pleistocene leading to frequent population fragmentation. The genomic resource created in this study can be used to identify industry-relevant genes (e.g. anthocyanin production), infer phylogeny, and call sequence-level variants (e.g. SNPs) in future research.
Collapse
Affiliation(s)
- Kaede Hirabayashi
- Department of Biology, University of Victoria, 3800 Finnerty Road, Victoria, BC V8W 2Y2, Canada
| | - Samir C Debnath
- Agriculture and Agri-Food Canada, St.John's Research and Development Centre, 204 Brookfield Road, St. John’s, Newfoundland and Labrador L A1E 0B2, Canada
| | - Gregory L Owens
- Department of Biology, University of Victoria, 3800 Finnerty Road, Victoria, BC V8W 2Y2, Canada
| |
Collapse
|
50
|
Franciskovic E, Thörnqvist L, Greiff L, Gasset M, Ohlin M. Linear epitopes of bony fish β-parvalbumins. Front Immunol 2024; 15:1293793. [PMID: 38504976 PMCID: PMC10948427 DOI: 10.3389/fimmu.2024.1293793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 02/13/2024] [Indexed: 03/21/2024] Open
Abstract
Introduction Fish β-parvalbumins are common targets of allergy-causing immunity. The nature of antibody responses to such allergens determines the biological outcome following exposure to fish. Specific epitopes on these allergens recognised by antibodies are incompletely characterised. Methods High-content peptide microarrays offer a solution to the identification of linear epitopes recognised by antibodies. We characterized IgG and IgG4 recognition of linear epitopes of fish β-parvalbumins defined in the WHO/IUIS allergen database as such responses hold the potential to counter an allergic reaction to these allergens. Peripheral blood samples, collected over three years, of 15 atopic but not fish-allergic subjects were investigated using a microarray platform that carried every possible 16-mer peptide of known isoforms and isoallergens of these and other allergens. Results Interindividual differences in epitope recognition patterns were observed. In contrast, reactivity patterns in a given individual were by comparison more stable during the 3 years-course of the study. Nevertheless, evidence of the induction of novel specificities over time was identified across multiple regions of the allergens. Particularly reactive epitopes were identified in the D helix of Cyp c 1 and in the C-terminus of Gad c 1 and Gad m 1.02. Residues important for the recognition of certain linear epitopes were identified. Patterns of differential recognition of isoallergens were observed in some subjects. Conclusions Altogether, comprehensive analysis of antibody recognition of linear epitopes of multiple allergens enables characterisation of the nature of the antibody responses targeting this important set of food allergens.
Collapse
Affiliation(s)
| | | | - Lennart Greiff
- Department of Otorhinolaryngology, Head & Neck Surgery, Skåne University Hospital, Lund, Sweden
- Department of Clinical Sciences, Lund University, Lund, Sweden
| | - Maria Gasset
- Institute of Physical-Chemistry Blas Cabrera, Spanish National Research Council, Madrid, Spain
| | - Mats Ohlin
- Department of Immunotechnology, Lund University, Lund, Sweden
- SciLifeLab, Lund University, Lund, Sweden
| |
Collapse
|