1
|
Kyriacou RG, Mulhair PO, Holland PWH. GC Content Across Insect Genomes: Phylogenetic Patterns, Causes and Consequences. J Mol Evol 2024; 92:138-152. [PMID: 38491221 PMCID: PMC10978632 DOI: 10.1007/s00239-024-10160-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 02/06/2024] [Indexed: 03/18/2024]
Abstract
The proportions of A:T and G:C nucleotide pairs are often unequal and can vary greatly between animal species and along chromosomes. The causes and consequences of this variation are incompletely understood. The recent release of high-quality genome sequences from the Darwin Tree of Life and other large-scale genome projects provides an opportunity for GC heterogeneity to be compared across a large number of insect species. Here we analyse GC content along chromosomes, and within protein-coding genes and codons, of 150 insect species from four holometabolous orders: Coleoptera, Diptera, Hymenoptera, and Lepidoptera. We find that protein-coding sequences have higher GC content than the genome average, and that Lepidoptera generally have higher GC content than the other three insect orders examined. GC content is higher in small chromosomes in most Lepidoptera species, but this pattern is less consistent in other orders. GC content also increases towards subtelomeric regions within protein-coding genes in Diptera, Coleoptera and Lepidoptera. Two species of Diptera, Bombylius major and B. discolor, have very atypical genomes with ubiquitous increase in AT content, especially at third codon positions. Despite dramatic AT-biased codon usage, we find no evidence that this has driven divergent protein evolution. We argue that the GC landscape of Lepidoptera, Diptera and Coleoptera genomes is influenced by GC-biased gene conversion, strongest in Lepidoptera, with some outlier taxa affected drastically by counteracting processes.
Collapse
Affiliation(s)
- Riccardo G Kyriacou
- Department of Biology, University of Oxford, 11a Mansfield Road, Oxford, OX1 3SZ, UK
| | - Peter O Mulhair
- Department of Biology, University of Oxford, 11a Mansfield Road, Oxford, OX1 3SZ, UK
| | - Peter W H Holland
- Department of Biology, University of Oxford, 11a Mansfield Road, Oxford, OX1 3SZ, UK.
| |
Collapse
|
2
|
Boyes D, Mulhair PO. The genome sequence of the Water Veneer, Acentria ephemerella (Denis & Schiffermüller, 1775). Wellcome Open Res 2024; 9:134. [PMID: 38779149 PMCID: PMC11109561 DOI: 10.12688/wellcomeopenres.21099.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/19/2024] [Indexed: 05/25/2024] Open
Abstract
We present a genome assembly from an individual male Acentria ephemerella (the Water Veneer; Arthropoda; Insecta; Lepidoptera; Crambidae). The genome sequence is 340.8 megabases in span. Most of the assembly is scaffolded into 31 chromosomal pseudomolecules, including the Z sex chromosome. The mitochondrial genome has also been assembled and is 15.35 kilobases in length. Gene annotation of this assembly on Ensembl identified 17,748 protein coding genes.
Collapse
Affiliation(s)
- Douglas Boyes
- UK Centre for Ecology & Hydrology, Wallingford, England, UK
| | | | | | | | | | | | | | | | | |
Collapse
|
3
|
Mulhair PO, Holland PWH. Evolution of the insect Hox gene cluster: Comparative analysis across 243 species. Semin Cell Dev Biol 2024; 152-153:4-15. [PMID: 36526530 PMCID: PMC10914929 DOI: 10.1016/j.semcdb.2022.11.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 11/28/2022] [Accepted: 11/30/2022] [Indexed: 12/23/2022]
Abstract
The Hox gene cluster is an iconic example of evolutionary conservation between divergent animal lineages, providing evidence for ancient similarities in the genetic control of embryonic development. However, there are differences between taxa in gene order, gene number and genomic organisation implying conservation is not absolute. There are also examples of radical functional change of Hox genes; for example, the ftz, zen and bcd genes in insects play roles in segmentation, extraembryonic membrane formation and body polarity, rather than specification of anteroposterior position. There have been detailed descriptions of Hox genes and Hox gene clusters in several insect species, including important model systems, but a large-scale overview has been lacking. Here we extend these studies using the publicly-available complete genome sequences of 243 insect species from 13 orders. We show that the insect Hox cluster is characterised by large intergenic distances, consistently extreme in Odonata, Orthoptera, Hemiptera and Trichoptera, and always larger between the 'posterior' Hox genes. We find duplications of ftz and zen in many species and multiple independent cluster breaks, although certain modules of neighbouring genes are rarely broken apart suggesting some organisational constraints. As more high-quality genomes are obtained, a challenge will be to relate structural genomic changes to phenotypic change across insect phylogeny.
Collapse
Affiliation(s)
- Peter O Mulhair
- Department of Biology, University of Oxford, 11a Mansfield Road, Oxford OX1 3SZ, UK.
| | - Peter W H Holland
- Department of Biology, University of Oxford, 11a Mansfield Road, Oxford OX1 3SZ, UK.
| |
Collapse
|
4
|
Mulhair PO, Crowley L, Boyes DH, Lewis OT, Holland PWH. Opsin Gene Duplication in Lepidoptera: Retrotransposition, Sex Linkage, and Gene Expression. Mol Biol Evol 2023; 40:msad241. [PMID: 37935057 PMCID: PMC10642689 DOI: 10.1093/molbev/msad241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 10/20/2023] [Accepted: 10/26/2023] [Indexed: 11/09/2023] Open
Abstract
Color vision in insects is determined by signaling cascades, central to which are opsin proteins, resulting in sensitivity to light at different wavelengths. In certain insect groups, lineage-specific evolution of opsin genes, in terms of copy number, shifts in expression patterns, and functional amino acid substitutions, has resulted in changes in color vision with subsequent behavioral and niche adaptations. Lepidoptera are a fascinating model to address whether evolutionary change in opsin content and sequence evolution are associated with changes in vision phenotype. Until recently, the lack of high-quality genome data representing broad sampling across the lepidopteran phylogeny has greatly limited our ability to accurately address this question. Here, we annotate opsin genes in 219 lepidopteran genomes representing 33 families, reconstruct their evolutionary history, and analyze shifts in selective pressures and expression between genes and species. We discover 44 duplication events in opsin genes across ∼300 million years of lepidopteran evolution. While many duplication events are species or family specific, we find retention of an ancient long-wavelength-sensitive (LW) opsin duplication derived by retrotransposition within the speciose superfamily Noctuoidea (in the families Nolidae, Erebidae, and Noctuidae). This conserved LW retrogene shows life stage-specific expression suggesting visual sensitivities or other sensory functions specific to the early larval stage. This study provides a comprehensive order-wide view of opsin evolution across Lepidoptera, showcasing high rates of opsin duplications and changes in expression patterns.
Collapse
Affiliation(s)
- Peter O Mulhair
- Department of Biology, University of Oxford, Oxford OX1 3SZ, UK
| | - Liam Crowley
- Department of Biology, University of Oxford, Oxford OX1 3SZ, UK
| | | | - Owen T Lewis
- Department of Biology, University of Oxford, Oxford OX1 3SZ, UK
| | | |
Collapse
|
5
|
Boyes D, Mulhair PO. The genome sequence of the Scarce Umber, Agriopis aurantiaria (Hübner, 1799). Wellcome Open Res 2023; 8:463. [PMID: 38779060 PMCID: PMC11109570 DOI: 10.12688/wellcomeopenres.19922.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/17/2023] [Indexed: 05/25/2024] Open
Abstract
We present a genome assembly from an individual male Agriopis aurantiaria (the Scarce Umber; Arthropoda; Insecta; Lepidoptera; Geometridae). The genome sequence is 485.4 megabases in span. The whole assembly is scaffolded into 30 chromosomal pseudomolecules, including the Z sex chromosome. The mitochondrial genome has also been assembled and is 15.44 kilobases in length. Gene annotation of this assembly on Ensembl identified 16,963 protein coding genes.
Collapse
Affiliation(s)
- Douglas Boyes
- UK Centre for Ecology & Hydrology, Wallingford, England, UK
| | | | | | | | | | | | | | | |
Collapse
|
6
|
Crowley L, Allen H, Barnes I, Boyes D, Broad GR, Fletcher C, Holland PW, Januszczak I, Lawniczak M, Lewis OT, Macadam CR, Mulhair PO, Pereira da Conceicoa L, Price BW, Raper C, Sivell O, Sivess L. A sampling strategy for genome sequencing the British terrestrial arthropod fauna. Wellcome Open Res 2023. [DOI: 10.12688/wellcomeopenres.18925.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/31/2023] Open
Abstract
The Darwin Tree of Life (DToL) project aims to sequence and assemble high-quality genomes from all eukaryote species in Britain and Ireland, with the first phase of the project concentrating on family-level coverage plus species of particular ecological, biomedical or evolutionary interest. We summarise the processes involved in (1) assessing the UK arthropod fauna and the status of individual species on UK lists; (2) prioritising and collecting species for initial genome sequencing; (3) handling methods to ensure that high-quality genomic DNA is preserved; and (4) compiling standard operating procedures for processing specimens for genome sequencing, identification verification and voucher specimen curation. We briefly explore some lessons learned from the pilot phase of DToL and the impact of the Covid-19 pandemic.
Collapse
|
7
|
McCarthy CGP, Mulhair PO, Siu-Ting K, Creevey CJ, O’Connell MJ. Improving Orthologous Signal and Model Fit in Datasets Addressing the Root of the Animal Phylogeny. Mol Biol Evol 2023; 40:6989790. [PMID: 36649189 PMCID: PMC9848061 DOI: 10.1093/molbev/msac276] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 12/19/2022] [Accepted: 12/23/2022] [Indexed: 01/18/2023] Open
Abstract
There is conflicting evidence as to whether Porifera (sponges) or Ctenophora (comb jellies) comprise the root of the animal phylogeny. Support for either a Porifera-sister or Ctenophore-sister tree has been extensively examined in the context of model selection, taxon sampling, and outgroup selection. The influence of dataset construction is comparatively understudied. We re-examine five animal phylogeny datasets that have supported either root hypothesis using an approach designed to enrich orthologous signal in phylogenomic datasets. We find that many component orthogroups in animal datasets fail to recover major lineages as monophyletic with the exception of Ctenophora, regardless of the supported root. Enriching these datasets to retain orthogroups recovering ≥3 major lineages reduces dataset size by up to 50% while retaining underlying phylogenetic information and taxon sampling. Site-heterogeneous phylogenomic analysis of these enriched datasets recovers both Porifera-sister and Ctenophora-sister positions, even with additional constraints on outgroup sampling. Two datasets which previously supported Ctenophora-sister support Porifera-sister upon enrichment. All enriched datasets display improved model fitness under posterior predictive analysis. While not conclusively rooting animals at either Porifera or Ctenophora, we do see an increase in signal for Porifera-sister and a decrease in signal for Ctenophore-sister when data are filtered for orthologous signal. Our results indicate that dataset size and construction as well as model fit influence animal root inference.
Collapse
Affiliation(s)
| | | | - Karen Siu-Ting
- Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, Belfast BT9 5DL, United Kingdom
| | - Christopher J Creevey
- Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, Belfast BT9 5DL, United Kingdom
| | | |
Collapse
|
8
|
Mulhair PO, Crowley L, Boyes DH, Harper A, Lewis OT, Holland PW. Diversity, duplication, and genomic organization of homeobox genes in Lepidoptera. Genome Res 2023; 33:32-44. [PMID: 36617663 PMCID: PMC9977156 DOI: 10.1101/gr.277118.122] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 11/29/2022] [Indexed: 12/14/2022]
Abstract
Homeobox genes encode transcription factors with essential roles in patterning and cell fate in developing animal embryos. Many homeobox genes, including Hox and NK genes, are arranged in gene clusters, a feature likely related to transcriptional control. Sparse taxon sampling and fragmentary genome assemblies mean that little is known about the dynamics of homeobox gene evolution across Lepidoptera or about how changes in homeobox gene number and organization relate to diversity in this large order of insects. Here we analyze an extensive data set of high-quality genomes to characterize the number and organization of all homeobox genes in 123 species of Lepidoptera from 23 taxonomic families. We find most Lepidoptera have around 100 homeobox loci, including an unusual Hox gene cluster in which the lab gene is repositioned and the ro gene is next to pb A topologically associating domain spans much of the gene cluster, suggesting deep regulatory conservation of the Hox cluster arrangement in this insect order. Most Lepidoptera have four Shx genes, divergent zen-derived loci, but these loci underwent dramatic duplication in several lineages, with some moths having over 165 homeobox loci in the Hox gene cluster; this expansion is associated with local LINE element density. In contrast, the NK gene cluster content is more stable, although there are differences in organization compared with other insects, as well as major rearrangements within butterflies. Our analysis represents the first description of homeobox gene content across the order Lepidoptera, exemplifying the potential of newly generated genome assemblies for understanding genome and gene family evolution.
Collapse
Affiliation(s)
- Peter O. Mulhair
- Department of Biology, University of Oxford, Oxford OX1 3SZ, United Kingdom
| | - Liam Crowley
- Department of Biology, University of Oxford, Oxford OX1 3SZ, United Kingdom
| | - Douglas H. Boyes
- Department of Biology, University of Oxford, Oxford OX1 3SZ, United Kingdom;,UK Centre for Ecology and Hydrology, Wallingford OX10 8BB, United Kingdom
| | - Amber Harper
- Department of Biology, University of Oxford, Oxford OX1 3SZ, United Kingdom
| | - Owen T. Lewis
- Department of Biology, University of Oxford, Oxford OX1 3SZ, United Kingdom
| | | | - Peter W.H. Holland
- Department of Biology, University of Oxford, Oxford OX1 3SZ, United Kingdom
| |
Collapse
|
9
|
Mulhair PO, McCarthy CGP, Siu-Ting K, Creevey CJ, O'Connell MJ. Filtering artifactual signal increases support for Xenacoelomorpha and Ambulacraria sister relationship in the animal tree of life. Curr Biol 2022; 32:5180-5188.e3. [PMID: 36356574 DOI: 10.1016/j.cub.2022.10.036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 08/09/2022] [Accepted: 10/18/2022] [Indexed: 11/10/2022]
Abstract
Conflicting studies place a group of bilaterian invertebrates containing xenoturbellids and acoelomorphs, the Xenacoelomorpha, as either the primary emerging bilaterian phylum1,2,3,4,5,6 or within Deuterostomia, sister to Ambulacraria.7,8,9,10,11 Although their placement as sister to the rest of Bilateria supports relatively simple morphology in the ancestral bilaterian, their alternative placement within Deuterostomia suggests a morphologically complex ancestral bilaterian along with extensive loss of major phenotypic traits in the Xenacoelomorpha. Recent studies have questioned whether Deuterostomia should be considered monophyletic at all.10,12,13 Hidden paralogy and poor phylogenetic signal present a major challenge for reconstructing species phylogenies.14,15,16,17,18 Here, we assess whether these issues have contributed to the conflict over the placement of Xenacoelomorpha. We reanalyzed published datasets, enriching for orthogroups whose gene trees support well-resolved clans elsewhere in the animal tree.16 We find that most genes in previously published datasets violate incontestable clans, suggesting that hidden paralogy and low phylogenetic signal affect the ability to reconstruct branching patterns at deep nodes in the animal tree. We demonstrate that removing orthogroups that cannot recapitulate incontestable relationships alters the final topology that is inferred, while simultaneously improving the fit of the model to the data. We discover increased, but ultimately not conclusive, support for the existence of Xenambulacraria in our set of filtered orthogroups. At a time when we are progressing toward sequencing all life on the planet, we argue that long-standing contentious issues in the tree of life will be resolved using smaller amounts of better quality data that can be modeled adequately.19.
Collapse
Affiliation(s)
- Peter O Mulhair
- Computational and Molecular Evolutionary Biology Research Group, School of Life Sciences, Faculty of Medicine and Health Sciences, University of Nottingham, Nottingham NG7 2RD, UK; Computational and Molecular Evolutionary Biology Research Group, School of Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, UK
| | - Charley G P McCarthy
- Computational and Molecular Evolutionary Biology Research Group, School of Life Sciences, Faculty of Medicine and Health Sciences, University of Nottingham, Nottingham NG7 2RD, UK
| | - Karen Siu-Ting
- Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, Belfast BT9 5DL, UK
| | - Christopher J Creevey
- Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, Belfast BT9 5DL, UK
| | - Mary J O'Connell
- Computational and Molecular Evolutionary Biology Research Group, School of Life Sciences, Faculty of Medicine and Health Sciences, University of Nottingham, Nottingham NG7 2RD, UK; Computational and Molecular Evolutionary Biology Research Group, School of Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, UK.
| |
Collapse
|