1
|
A reference genome for the Andean cavefish Trichomycterus rosablanca (Siluriformes, Trichomycteridae): Building genomic resources to study evolution in cave environments. J Hered 2024; 115:311-316. [PMID: 38513109 DOI: 10.1093/jhered/esae019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Accepted: 03/19/2024] [Indexed: 03/23/2024] Open
Abstract
Animals living in caves are of broad relevance to evolutionary biologists interested in understanding the mechanisms underpinning convergent evolution. In the Eastern Andes of Colombia, populations from at least two distinct clades of Trichomycterus catfishes (Siluriformes) independently colonized cave environments and converged in phenotype by losing their eyes and pigmentation. We are pursuing several research questions using genomics to understand the evolutionary forces and molecular mechanisms responsible for repeated morphological changes in this system. As a foundation for such studies, here we describe a diploid, chromosome-scale, long-read reference genome for Trichomycterus rosablanca, a blind, depigmented species endemic to the karstic system of the department of Santander. The nuclear genome comprises 1 Gb in 27 chromosomes, with a 40.0× HiFi long-read genome coverage having an N50 scaffold of 40.4 Mb and N50 contig of 13.1 Mb, with 96.9% (Eukaryota) and 95.4% (Actinopterygii) universal single-copy orthologs (BUSCO). This assembly provides the first reference genome for the speciose genus Trichomycterus, serving as a key resource for research on the genomics of phenotypic evolution.
Collapse
|
2
|
Evolution and genetic architecture of sex-limited polymorphism in cuckoos. SCIENCE ADVANCES 2024; 10:eadl5255. [PMID: 38657058 PMCID: PMC11042743 DOI: 10.1126/sciadv.adl5255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 03/20/2024] [Indexed: 04/26/2024]
Abstract
Sex-limited polymorphism has evolved in many species including our own. Yet, we lack a detailed understanding of the underlying genetic variation and evolutionary processes at work. The brood parasitic common cuckoo (Cuculus canorus) is a prime example of female-limited color polymorphism, where adult males are monochromatic gray and females exhibit either gray or rufous plumage. This polymorphism has been hypothesized to be governed by negative frequency-dependent selection whereby the rarer female morph is protected against harassment by males or from mobbing by parasitized host species. Here, we show that female plumage dichromatism maps to the female-restricted genome. We further demonstrate that, consistent with balancing selection, ancestry of the rufous phenotype is shared with the likewise female dichromatic sister species, the oriental cuckoo (Cuculus optatus). This study shows that sex-specific polymorphism in trait variation can be resolved by genetic variation residing on a sex-limited chromosome and be maintained across species boundaries.
Collapse
|
3
|
Chromosome-scale Genome Assembly of the Rough Periwinkle Littorina saxatilis. Genome Biol Evol 2024; 16:evae076. [PMID: 38584387 PMCID: PMC11050657 DOI: 10.1093/gbe/evae076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 03/26/2024] [Accepted: 03/29/2024] [Indexed: 04/09/2024] Open
Abstract
The intertidal gastropod Littorina saxatilis is a model system to study speciation and local adaptation. The repeated occurrence of distinct ecotypes showing different levels of genetic divergence makes L. saxatilis particularly suited to study different stages of the speciation continuum in the same lineage. A major finding is the presence of several large chromosomal inversions associated with the divergence of ecotypes and, specifically, the species offers a system to study the role of inversions in this divergence. The genome of L. saxatilis is 1.35 Gb and composed of 17 chromosomes. The first reference genome of the species was assembled using Illumina data, was highly fragmented (N50 of 44 kb), and was quite incomplete, with a BUSCO completeness of 80.1% on the Metazoan dataset. A linkage map of one full-sibling family enabled the placement of 587 Mbp of the genome into 17 linkage groups corresponding to the haploid number of chromosomes, but the fragmented nature of this reference genome limited the understanding of the interplay between divergent selection and gene flow during ecotype formation. Here, we present a newly generated reference genome that is highly contiguous, with a N50 of 67 Mb and 90.4% of the total assembly length placed in 17 super-scaffolds. It is also highly complete with a BUSCO completeness of 94.1% of the Metazoa dataset. This new reference will allow for investigations into the genomic regions implicated in ecotype formation as well as better characterization of the inversions and their role in speciation.
Collapse
|
4
|
A chromosomal reference genome sequence for the malaria mosquito, Anopheles gambiae, Giles, 1902, Ifakara strain. Wellcome Open Res 2024; 8:74. [PMID: 37424773 PMCID: PMC10326452 DOI: 10.12688/wellcomeopenres.18854.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/15/2024] [Indexed: 04/01/2024] Open
Abstract
We present a genome assembly from an individual female Anopheles gambiae (the malaria mosquito; Arthropoda; Insecta; Diptera; Culicidae), Ifakara strain. The genome sequence is 264 megabases in span. Most of the assembly is scaffolded into three chromosomal pseudomolecules with the X sex chromosome assembled. The complete mitochondrial genome was also assembled and is 15.4 kilobases in length.
Collapse
|
5
|
A chromosome-level genome assembly for the dugong (Dugong dugon). J Hered 2024; 115:212-220. [PMID: 38245832 PMCID: PMC10936554 DOI: 10.1093/jhered/esae003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 01/16/2024] [Indexed: 01/22/2024] Open
Abstract
The dugong (Dugong dugon) is a marine mammal widely distributed throughout the Indo-Pacific and the Red Sea, with a Vulnerable conservation status, and little is known about many of the more peripheral populations, some of which are thought to be close to extinction. We present a de novo high-quality genome assembly for the dugong from an individual belonging to the well-monitored Moreton Bay population in Queensland, Australia. Our assembly uses long-read PacBio HiFi sequencing and Omni-C data following the Vertebrate Genome Project pipeline to reach chromosome-level contiguity (24 chromosome-level scaffolds; 3.16 Gbp) and high completeness (97.9% complete BUSCOs). We observed relatively high genome-wide heterozygosity, which likely reflects historical population abundance before the last interglacial period, approximately 125,000 yr ago. Demographic inference suggests that dugong populations began declining as sea levels fell after the last interglacial period, likely a result of population fragmentation and habitat loss due to the exposure of seagrass meadows. We find no evidence for ongoing recent inbreeding in this individual. However, runs of homozygosity indicate some past inbreeding. Our draft genome assembly will enable range-wide assessments of genetic diversity and adaptation, facilitate effective management of dugong populations, and allow comparative genomics analyses including with other sirenians, the oldest marine mammal lineage.
Collapse
|
6
|
Chromosome level genome assembly of the Etruscan shrew Suncus etruscus. Sci Data 2024; 11:176. [PMID: 38326333 PMCID: PMC10850158 DOI: 10.1038/s41597-024-03011-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 01/26/2024] [Indexed: 02/09/2024] Open
Abstract
Suncus etruscus is one of the world's smallest mammals, with an average body mass of about 2 grams. The Etruscan shrew's small body is accompanied by a very high energy demand and numerous metabolic adaptations. Here we report a chromosome-level genome assembly using PacBio long read sequencing, 10X Genomics linked short reads, optical mapping, and Hi-C linked reads. The assembly is partially phased, with the 2.472 Gbp primary pseudohaplotype and 1.515 Gbp alternate. We manually curated the primary assembly and identified 22 chromosomes, including X and Y sex chromosomes. The NCBI genome annotation pipeline identified 39,091 genes, 19,819 of them protein-coding. We also identified segmental duplications, inferred GO term annotations, and computed orthologs of human and mouse genes. This reference-quality genome will be an important resource for research on mammalian development, metabolism, and body size control.
Collapse
|
7
|
A chromosomal reference genome sequence for the malaria mosquito, Anopheles moucheti, Evans, 1925. Wellcome Open Res 2023; 8:507. [PMID: 38046191 PMCID: PMC10690039 DOI: 10.12688/wellcomeopenres.20259.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/18/2023] [Indexed: 12/05/2023] Open
Abstract
We present a genome assembly from an individual male Anopheles moucheti (the malaria mosquito; Arthropoda; Insecta; Diptera; Culicidae), from a wild population in Cameroon. The genome sequence is 271 megabases in span. The majority of the assembly is scaffolded into three chromosomal pseudomolecules with the X sex chromosome assembled. The complete mitochondrial genome was also assembled and is 15.5 kilobases in length.
Collapse
|
8
|
Prioritizing Endangered Species in Genome Sequencing: Conservation Genomics in Action with the First Platinum-Standard Reference-Quality Genome of the Critically Endangered European Mink Mustela lutreola L., 1761. Int J Mol Sci 2023; 24:14816. [PMID: 37834264 PMCID: PMC10573602 DOI: 10.3390/ijms241914816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 09/23/2023] [Accepted: 09/26/2023] [Indexed: 10/15/2023] Open
Abstract
The European mink Mustela lutreola (Mustelidae) ranks among the most endangered mammalian species globally, experiencing a rapid and severe decline in population size, density, and distribution. Given the critical need for effective conservation strategies, understanding its genomic characteristics becomes paramount. To address this challenge, the platinum-quality, chromosome-level reference genome assembly for the European mink was successfully generated under the project of the European Mink Centre consortium. Leveraging PacBio HiFi long reads, we obtained a 2586.3 Mbp genome comprising 25 scaffolds, with an N50 length of 154.1 Mbp. Through Hi-C data, we clustered and ordered the majority of the assembly (>99.9%) into 20 chromosomal pseudomolecules, including heterosomes, ranging from 6.8 to 290.1 Mbp. The newly sequenced genome displays a GC base content of 41.9%. Additionally, we successfully assembled the complete mitochondrial genome, spanning 16.6 kbp in length. The assembly achieved a BUSCO (Benchmarking Universal Single-Copy Orthologs) completeness score of 98.2%. This high-quality reference genome serves as a valuable genomic resource for future population genomics studies concerning the European mink and related taxa. Furthermore, the newly assembled genome holds significant potential in addressing key conservation challenges faced by M. lutreola. Its applications encompass potential revision of management units, assessment of captive breeding impacts, resolution of phylogeographic questions, and facilitation of monitoring and evaluating the efficiency and effectiveness of dedicated conservation strategies for the European mink. This species serves as an example that highlights the paramount importance of prioritizing endangered species in genome sequencing projects due to the race against time, which necessitates the comprehensive exploration and characterization of their genomic resources before their populations face extinction.
Collapse
|
9
|
A Chromosome-Level Reference Genome for the Black-Legged Kittiwake (Rissa tridactyla), a Declining Circumpolar Seabird. Genome Biol Evol 2023; 15:evad153. [PMID: 37590950 PMCID: PMC10457150 DOI: 10.1093/gbe/evad153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 08/02/2023] [Accepted: 08/09/2023] [Indexed: 08/19/2023] Open
Abstract
Amidst the current biodiversity crisis, the availability of genomic resources for declining species can provide important insights into the factors driving population decline. In the early 1990s, the black-legged kittiwake (Rissa tridactyla), a pelagic gull widely distributed across the arctic, subarctic, and temperate zones, suffered a steep population decline following an abrupt warming of sea surface temperature across its distribution range and is currently listed as Vulnerable by the International Union for the Conservation of Nature. Kittiwakes have long been the focus for field studies of physiology, ecology, and ecotoxicology and are primary indicators of fluctuating ecological conditions in arctic and subarctic marine ecosystems. We present a high-quality chromosome-level reference genome and annotation for the black-legged kittiwake using a combination of Pacific Biosciences HiFi sequencing, Bionano optical maps, Hi-C reads, and RNA-Seq data. The final assembly spans 1.35 Gb across 32 chromosomes, with a scaffold N50 of 88.21 Mb and a BUSCO completeness of 97.4%. This genome assembly substantially improves the quality of a previous draft genome, showing an approximately 5× increase in contiguity and a more complete annotation. Using this new chromosome-level reference genome and three more chromosome-level assemblies of Charadriiformes, we uncover several lineage-specific chromosome fusions and fissions, but find no shared rearrangements, suggesting that interchromosomal rearrangements have been commonplace throughout the diversification of Charadriiformes. This new high-quality genome assembly will enable population genomic, transcriptomic, and phenotype-genotype association studies in a widely studied sentinel species, which may provide important insights into the impacts of global change on marine systems.
Collapse
|
10
|
Genomics of cold adaptations in the Antarctic notothenioid fish radiation. Nat Commun 2023; 14:3412. [PMID: 37296119 PMCID: PMC10256766 DOI: 10.1038/s41467-023-38567-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 05/05/2023] [Indexed: 06/12/2023] Open
Abstract
Numerous novel adaptations characterise the radiation of notothenioids, the dominant fish group in the freezing seas of the Southern Ocean. To improve understanding of the evolution of this iconic fish group, here we generate and analyse new genome assemblies for 24 species covering all major subgroups of the radiation, including five long-read assemblies. We present a new estimate for the onset of the radiation at 10.7 million years ago, based on a time-calibrated phylogeny derived from genome-wide sequence data. We identify a two-fold variation in genome size, driven by expansion of multiple transposable element families, and use the long-read data to reconstruct two evolutionarily important, highly repetitive gene family loci. First, we present the most complete reconstruction to date of the antifreeze glycoprotein gene family, whose emergence enabled survival in sub-zero temperatures, showing the expansion of the antifreeze gene locus from the ancestral to the derived state. Second, we trace the loss of haemoglobin genes in icefishes, the only vertebrates lacking functional haemoglobins, through complete reconstruction of the two haemoglobin gene clusters across notothenioid families. Both the haemoglobin and antifreeze genomic loci are characterised by multiple transposon expansions that may have driven the evolutionary history of these genes.
Collapse
|
11
|
Caecilian genomes reveal the molecular basis of adaptation, and convergent evolution of limblessness in snakes and caecilians. Mol Biol Evol 2023:7166927. [PMID: 37194566 PMCID: PMC10195157 DOI: 10.1093/molbev/msad102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 04/18/2023] [Accepted: 05/03/2023] [Indexed: 05/18/2023] Open
Abstract
We present genome sequences for the caecilians Geotrypetes seraphini (3.8Gb) and Microcaecilia unicolor (4.7Gb), representatives of a limbless, mostly soil-dwelling amphibian clade with reduced eyes, and unique putatively chemosensory tentacles. More than 69% of both genomes are composed of repeats, with retrotransposons the most abundant. We identify 1,150 orthogroups which are unique to caecilians and enriched for functions in olfaction and detection of chemical signals. There are 379 orthogroups with signatures of positive selection on caecilian lineages with roles in organ development and morphogenesis, sensory perception, and immunity amongst others. We discover that caecilian genomes are missing the ZRS enhancer of Sonic Hedgehog which is also mutated in snakes. In vivo deletions have shown ZRS is required for limb development in mice, thus revealing a shared molecular target implicated in the independent evolution of limblessness in snakes and caecilians.
Collapse
|
12
|
An improved germline genome assembly for the sea lamprey Petromyzon marinus illuminates the evolution of germline-specific chromosomes. Cell Rep 2023; 42:112263. [PMID: 36930644 PMCID: PMC10166183 DOI: 10.1016/j.celrep.2023.112263] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 10/17/2022] [Accepted: 02/28/2023] [Indexed: 03/17/2023] Open
Abstract
Programmed DNA loss is a gene silencing mechanism that is employed by several vertebrate and nonvertebrate lineages, including all living jawless vertebrates and songbirds. Reconstructing the evolution of somatically eliminated (germline-specific) sequences in these species has proven challenging due to a high content of repeats and gene duplications in eliminated sequences and a corresponding lack of highly accurate and contiguous assemblies for these regions. Here, we present an improved assembly of the sea lamprey (Petromyzon marinus) genome that was generated using recently standardized methods that increase the contiguity and accuracy of vertebrate genome assemblies. This assembly resolves highly contiguous, somatically retained chromosomes and at least one germline-specific chromosome, permitting new analyses that reconstruct the timing, mode, and repercussions of recruitment of genes to the germline-specific fraction. These analyses reveal major roles of interchromosomal segmental duplication, intrachromosomal duplication, and positive selection for germline functions in the long-term evolution of germline-specific chromosomes.
Collapse
|
13
|
Chromosome-level Genome Assembly of Acanthopagrus latus Provides Insights into Salinity Stress Adaptation of Sparidae. MARINE BIOTECHNOLOGY (NEW YORK, N.Y.) 2022; 24:655-660. [PMID: 35394576 DOI: 10.1007/s10126-022-10119-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2022] [Accepted: 03/18/2022] [Indexed: 06/14/2023]
Abstract
The yellowfin seabream, Acanthopagrus latus, is widely distributed throughout the Indo-West Pacific. This species, as a euryhaline Sparidae fish, inhabits in coastal environments with large and frequent salinity fluctuation. So the A. latus can be considered as an ideal species for elucidating the evolutionary mechanism of salinity stress adaption on teleost fish species. Here, a chromosome-scale assembly of A. latus was obtained with PacBio and Hi-C hybrid sequencing strategy. The final assembly genome of A. latus is 685.14 Mbp. The values of contig N50 and scaffold N50 are 14.88 Mbp and 30.72 Mbp, respectively. 29,227 genes were successfully predicted for A. latus in total. Then, the comparative genomics and phylogenetic analysis were employed for investigating the different osmoregulation strategies of salinity stress adaption on multiple whole genome scale of Sparidae species. The highly accurate chromosomal information provides the important genome resources for understanding the osmoregulation evolutionary pattern of the euryhaline Sparidae species.
Collapse
|
14
|
Abstract
Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.
Collapse
|
15
|
The genome sequence of the malaria mosquito, Anopheles funestus, Giles, 1900. Wellcome Open Res 2022; 7:287. [PMID: 36874567 PMCID: PMC9975407.2 DOI: 10.12688/wellcomeopenres.18445.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/15/2023] [Indexed: 03/29/2023] Open
Abstract
We present a genome assembly from an individual female Anopheles funestus (the malaria mosquito; Arthropoda; Insecta; Diptera; Culicidae). The genome sequence is 251 megabases in span. The majority of the assembly is scaffolded into three chromosomal pseudomolecules with the X sex chromosome assembled. The complete mitochondrial genome was also assembled and is 15.4 kilobases in length.
Collapse
|
16
|
The genome sequence of the malaria mosquito, Anopheles funestus, Giles, 1900. Wellcome Open Res 2022; 7:287. [PMID: 36874567 PMCID: PMC9975407 DOI: 10.12688/wellcomeopenres.18445.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/20/2022] [Indexed: 11/27/2022] Open
Abstract
We present a genome assembly from an individual female Anopheles funestus (the malaria mosquito; Arthropoda; Insecta; Diptera; Culicidae). The genome sequence is 251 megabases in span. The majority of the assembly is scaffolded into three chromosomal pseudomolecules with the X sex chromosome assembled. The complete mitochondrial genome was also assembled and is 15.4 kilobases in length.
Collapse
|
17
|
The complete genome sequence of Eimeria tenella (Tyzzer 1929), a common gut parasite of chickens. Wellcome Open Res 2021; 6:225. [PMID: 34703904 PMCID: PMC8515493 DOI: 10.12688/wellcomeopenres.17100.1] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/02/2021] [Indexed: 11/20/2022] Open
Abstract
We present a genome assembly from a clonal population of Eimeria tenella Houghton parasites (Apicomplexa; Conoidasida; Eucoccidiorida; Eimeriidae). The genome sequence is 53.25 megabases in span. The entire assembly is scaffolded into 15 chromosomal pseudomolecules, with complete mitochondrion and apicoplast organellar genomes also present.
Collapse
|
18
|
Abstract
We present a genome assembly from an individual female Salmo trutta (the brown trout; Chordata; Actinopteri; Salmoniformes; Salmonidae). The genome sequence is 2.37 gigabases in span. The majority of the assembly is scaffolded into 40 chromosomal pseudomolecules. Gene annotation of this assembly on Ensembl has identified 43,935 protein coding genes.
Collapse
|
19
|
Hi-C scaffolded short- and long-read genome assemblies of the California sea lion are broadly consistent for syntenic inference across 45 million years of evolution. Mol Ecol Resour 2021; 21:2455-2470. [PMID: 34097816 PMCID: PMC9732816 DOI: 10.1111/1755-0998.13443] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 05/06/2021] [Accepted: 05/26/2021] [Indexed: 12/13/2022]
Abstract
With the advent of chromatin-interaction maps, chromosome-level genome assemblies have become a reality for a wide range of organisms. Scaffolding quality is, however, difficult to judge. To explore this gap, we generated multiple chromosome-scale genome assemblies of an emerging wild animal model for carcinogenesis, the California sea lion (Zalophus californianus). Short-read assemblies were scaffolded with two independent chromatin interaction mapping data sets (Hi-C and Chicago), and long-read assemblies with three data types (Hi-C, optical maps and 10X linked reads) following the "Vertebrate Genomes Project (VGP)" pipeline. In both approaches, 18 major scaffolds recovered the karyotype (2n = 36), with scaffold N50s of 138 and 147 Mb, respectively. Synteny relationships at the chromosome level with other pinniped genomes (2n = 32-36), ferret (2n = 34), red panda (2n = 36) and domestic dog (2n = 78) were consistent across approaches and recovered known fissions and fusions. Comparative chromosome painting and multicolour chromosome tiling with a panel of 264 genome-integrated single-locus canine bacterial artificial chromosome probes provided independent evaluation of genome organization. Broad-scale discrepancies between the approaches were observed within chromosomes, most commonly in translocations centred around centromeres and telomeres, which were better resolved in the VGP assembly. Genomic and cytological approaches agreed on near-perfect synteny of the X chromosome, and in combination allowed detailed investigation of autosomal rearrangements between dog and sea lion. This study presents high-quality genomes of an emerging cancer model and highlights that even highly fragmented short-read assemblies scaffolded with Hi-C can yield reliable chromosome-level scaffolds suitable for comparative genomic analyses.
Collapse
|
20
|
Abstract
We present a genome assembly from an individual female Streptopelia turtur (the European turtle dove; Chordata; Aves; Columbidae). The genome sequence is 1.18 gigabases in span. The majority of the assembly is scaffolded into 35 chromosomal pseudomolecules, with the W and Z sex chromosomes assembled.
Collapse
|
21
|
Abstract
We present a genome assembly from an individual female Erithacus rubecula (the European robin; Chordata; Aves; Passeriformes; Turdidae). The genome sequence is 1.09 gigabases in span. The majority of the assembly is scaffolded into 36 chromosomal pseudomolecules, with both W and Z sex chromosomes assembled.
Collapse
|
22
|
Abstract
We present a genome assembly from an individual male Arvicola amphibius (the European water vole; Chordata; Mammalia; Rodentia; Cricetidae). The genome sequence is 2.30 gigabases in span. The majority of the assembly is scaffolded into 18 chromosomal pseudomolecules, including the X sex chromosome. Gene annotation of this assembly on Ensembl has identified 21,394 protein coding genes.
Collapse
|
23
|
Abstract
We present a genome assembly from an individual male Rattus norvegicus (the Norway rat; Chordata; Mammalia; Rodentia; Muridae). The genome sequence is 2.44 gigabases in span. The majority of the assembly is scaffolded into 20 chromosomal pseudomolecules, with both X and Y sex chromosomes assembled. This genome assembly, mRatBN7.2, represents the new reference genome for R. norvegicus and has been adopted by the Genome Reference Consortium.
Collapse
|
24
|
Abstract
We present a genome assembly from an individual female Pipistrellus pipistrellus (the common pipistrelle; Chordata; Mammalia; Chiroptera; Vespertilionidae). The genome sequence is 1.76 gigabases in span. The majority of the assembly is scaffolded into 21 chromosomal pseudomolecules, with the X sex chromosome assembled.
Collapse
|
25
|
A high-quality, chromosome-level genome assembly of the Black Soldier Fly (Hermetia illucens L.). G3 (BETHESDA, MD.) 2021; 11:jkab085. [PMID: 33734373 PMCID: PMC8104945 DOI: 10.1093/g3journal/jkab085] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 03/09/2021] [Indexed: 01/15/2023]
Abstract
Hermetia illucens L. (Diptera: Stratiomyidae), the Black Soldier Fly (BSF) is an increasingly important species for bioconversion of organic material into animal feed. We generated a high-quality chromosome-scale genome assembly of the BSF using Pacific Bioscience, 10X Genomics linked read and high-throughput chromosome conformation capture sequencing technology. Scaffolding the final assembly with Hi-C data produced a highly contiguous 1.01 Gb genome with 99.75% of scaffolds assembled into pseudochromosomes representing seven chromosomes with 16.01 Mb contig and 180.46 Mb scaffold N50 values. The highly complete genome obtained a Benchmarking Universal Single-Copy Orthologs (BUSCO) completeness of 98.6%. We masked 67.32% of the genome as repetitive sequences and annotated a total of 16,478 protein-coding genes using the BRAKER2 pipeline. We analyzed an established lab population to investigate the genomic variation and architecture of the BSF revealing six autosomes and an X chromosome. Additionally, we estimated the inbreeding coefficient (1.9%) of the lab population by assessing runs of homozygosity. This provided evidence for inbreeding events including long runs of homozygosity on chromosome 5. The release of this novel chromosome-scale BSF genome assembly will provide an improved resource for further genomic studies, functional characterization of genes of interest and genetic modification of this economically important species.
Collapse
|
26
|
Genetic perturbation of PU.1 binding and chromatin looping at neutrophil enhancers associates with autoimmune disease. Nat Commun 2021; 12:2298. [PMID: 33863903 PMCID: PMC8052402 DOI: 10.1038/s41467-021-22548-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Accepted: 03/17/2021] [Indexed: 02/06/2023] Open
Abstract
Neutrophils play fundamental roles in innate immune response, shape adaptive immunity, and are a potentially causal cell type underpinning genetic associations with immune system traits and diseases. Here, we profile the binding of myeloid master regulator PU.1 in primary neutrophils across nearly a hundred volunteers. We show that variants associated with differential PU.1 binding underlie genetically-driven differences in cell count and susceptibility to autoimmune and inflammatory diseases. We integrate these results with other multi-individual genomic readouts, revealing coordinated effects of PU.1 binding variants on the local chromatin state, enhancer-promoter contacts and downstream gene expression, and providing a functional interpretation for 27 genes underlying immune traits. Collectively, these results demonstrate the functional role of PU.1 and its target enhancers in neutrophil transcriptional control and immune disease susceptibility.
Collapse
|
27
|
Abstract
Egg-laying mammals (monotremes) are the only extant mammalian outgroup to therians (marsupial and eutherian animals) and provide key insights into mammalian evolution1,2. Here we generate and analyse reference genomes of the platypus (Ornithorhynchus anatinus) and echidna (Tachyglossus aculeatus), which represent the only two extant monotreme lineages. The nearly complete platypus genome assembly has anchored almost the entire genome onto chromosomes, markedly improving the genome continuity and gene annotation. Together with our echidna sequence, the genomes of the two species allow us to detect the ancestral and lineage-specific genomic changes that shape both monotreme and mammalian evolution. We provide evidence that the monotreme sex chromosome complex originated from an ancestral chromosome ring configuration. The formation of such a unique chromosome complex may have been facilitated by the unusually extensive interactions between the multi-X and multi-Y chromosomes that are shared by the autosomal homologues in humans. Further comparative genomic analyses unravel marked differences between monotremes and therians in haptoglobin genes, lactation genes and chemosensory receptor genes for smell and taste that underlie the ecological adaptation of monotremes.
Collapse
|
28
|
Abstract
High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1-4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
Collapse
|
29
|
Significantly improving the quality of genome assemblies through curation. Gigascience 2021; 10:giaa153. [PMID: 33420778 PMCID: PMC7794651 DOI: 10.1093/gigascience/giaa153] [Citation(s) in RCA: 386] [Impact Index Per Article: 128.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Revised: 11/17/2020] [Accepted: 11/30/2020] [Indexed: 11/29/2022] Open
Abstract
Genome sequence assemblies provide the basis for our understanding of biology. Generating error-free assemblies is therefore the ultimate, but sadly still unachieved goal of a multitude of research projects. Despite the ever-advancing improvements in data generation, assembly algorithms and pipelines, no automated approach has so far reliably generated near error-free genome assemblies for eukaryotes. Whilst working towards improved datasets and fully automated pipelines, assembly evaluation and curation is actively used to bridge this shortcoming and significantly reduce the number of assembly errors. In addition to this increase in product value, the insights gained from assembly curation are fed back into the automated assembly strategy and contribute to notable improvements in genome assembly quality. We describe our tried and tested approach for assembly curation using gEVAL, the genome evaluation browser. We outline the procedures applied to genome curation using gEVAL and also our recommendations for assembly curation in a gEVAL-independent context to facilitate the uptake of genome curation in the wider community.
Collapse
|
30
|
Abstract
We present a genome assembly from an individual male
Sciurus carolinensis (the eastern grey squirrel; Vertebrata; Mammalia; Eutheria; Rodentia; Sciuridae). The genome sequence is 2.82 gigabases in span. The majority of the assembly (92.3%) is scaffolded into 21 chromosomal-level scaffolds, with both X and Y sex chromosomes assembled.
Collapse
|
31
|
Abstract
BACKGROUND The king scallop, Pecten maximus, is distributed in shallow waters along the Atlantic coast of Europe. It forms the basis of a valuable commercial fishery and plays a key role in coastal ecosystems and food webs. Like other filter feeding bivalves it can accumulate potent phytotoxins, to which it has evolved some immunity. The molecular origins of this immunity are of interest to evolutionary biologists, pharmaceutical companies, and fisheries management. FINDINGS Here we report the genome assembly of this species, conducted as part of the Wellcome Sanger 25 Genomes Project. This genome was assembled from PacBio reads and scaffolded with 10X Chromium and Hi-C data. Its 3,983 scaffolds have an N50 of 44.8 Mb (longest scaffold 60.1 Mb), with 92% of the assembly sequence contained in 19 scaffolds, corresponding to the 19 chromosomes found in this species. The total assembly spans 918.3 Mb and is the best-scaffolded marine bivalve genome published to date, exhibiting 95.5% recovery of the metazoan BUSCO set. Gene annotation resulted in 67,741 gene models. Analysis of gene content revealed large numbers of gene duplicates, as previously seen in bivalves, with little gene loss, in comparison with the sequenced genomes of other marine bivalve species. CONCLUSIONS The genome assembly of P. maximus and its annotated gene set provide a high-quality platform for studies on such disparate topics as shell biomineralization, pigmentation, vision, and resistance to algal toxins. As a result of our findings we highlight the sodium channel gene Nav1, known to confer resistance to saxitoxin and tetrodotoxin, as a candidate for further studies investigating immunity to domoic acid.
Collapse
|
32
|
Abstract
We present a genome assembly from an individual male Lutra lutra (the Eurasian river otter; Vertebrata; Mammalia; Eutheria; Carnivora; Mustelidae). The genome sequence is 2.44 gigabases in span. The majority of the assembly is scaffolded into 20 chromosomal pseudomolecules, with both X and Y sex chromosomes assembled.
Collapse
|
33
|
Abstract
We present a genome assembly from an individual male Sciurus vulgaris (the Eurasian red squirrel; Vertebrata; Mammalia; Eutheria; Rodentia; Sciuridae). The genome sequence is 2.88 gigabases in span. The majority of the assembly is scaffolded into 21 chromosomal-level scaffolds, with both X and Y sex chromosomes assembled.
Collapse
|
34
|
Multiple lineages of antigenically and genetically diverse influenza A virus co-circulate in the United States swine population. Virus Res 2004; 103:67-73. [PMID: 15163491 DOI: 10.1016/j.virusres.2004.02.015] [Citation(s) in RCA: 101] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Before the isolation of H3N2 viruses in 1998, swine influenza in the United States was an endemic disease caused exclusively by classical-swine H1N1 viruses. In this study we determined the antigenic and phylogenetic composition of a selection of currently circulating strains and revealed that, in contrast to the situation pre-1998, the swine population in the United States is now a dynamic viral reservoir containing multiple viral lineages. H3N2 viruses still circulate and representatives of each of two previously identified phylogenetic groups were isolated. H1N1 and H1N2 viruses were also identified. In addition to the genotypic diversity present, there was also considerable antigenic diversity seen. At least three antigenic profiles of H1 viruses were noted and all of the recent H3N2 viruses reacted poorly, if at all, to the index A/swine/Texas/4199-2/98 H3N2 antiserum in hemagglutination inhibition assays. The influenza reservoir in the United States swine population has thus gone from a stable single viral lineage to one where genetically and antigenically heterogenic viruses co-circulate. The growing complexity of influenza at this animal-human interface and the presence of viruses with a seemingly high affinity for reassortment makes the United States swine population an increasingly important reservoir of viruses with human pandemic potential.
Collapse
|