1
|
Powell A, Heckenhauer J, Pauls SU, Ríos-Touma B, Kuranishi RB, Holzenthal RW, Razuri-Gonzales E, Bybee S, Frandsen PB. Evolution of Opsin Genes in Caddisflies (Insecta: Trichoptera). Genome Biol Evol 2024; 16:evae185. [PMID: 39176990 PMCID: PMC11381090 DOI: 10.1093/gbe/evae185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 08/12/2024] [Accepted: 08/19/2024] [Indexed: 08/24/2024] Open
Abstract
Insects have evolved complex and diverse visual systems in which light-sensing protein molecules called "opsins" couple with a chromophore to form photopigments. Insect photopigments group into three major gene families based on wavelength sensitivity: long wavelength (LW), short wavelength (SW), and ultraviolet wavelength (UV). In this study, we identified 123 opsin sequences from whole-genome assemblies across 25 caddisfly species (Insecta: Trichoptera). We discovered the LW opsins have the most diversity across species and form two separate clades in the opsin gene tree. Conversely, we observed a loss of the SW opsin in half of the trichopteran species in this study, which might be associated with the fact that caddisflies are active during low-light conditions. Lastly, we found a single copy of the UV opsin in all the species in this study, with one exception: Athripsodes cinereus has two copies of the UV opsin and resides within a clade of caddisflies with colorful wing patterns.
Collapse
Affiliation(s)
- Ashlyn Powell
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT, USA
| | - Jacqueline Heckenhauer
- LOEWE Centre for Translational Biodiversity Genomics, Frankfurt, Germany
- Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt, Germany
| | - Steffen U Pauls
- LOEWE Centre for Translational Biodiversity Genomics, Frankfurt, Germany
- Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt, Germany
| | - Blanca Ríos-Touma
- Facultad de Ingenierías y Ciencias Aplicadas, Ingeniería Ambiental, Grupo de Investigación en Biodiversidad, Medio Ambiente y Salud, Universidad de Las Américas, Quito, Ecuador
| | - Ryoichi B Kuranishi
- Graduate School of Science, Chiba University, Chiba, Japan
- Kanagawa Institute of Technology, Kanagawa, Japan
| | | | | | - Seth Bybee
- Department of Biology, Brigham Young University, Provo, UT, USA
| | - Paul B Frandsen
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT, USA
| |
Collapse
|
2
|
Bhowmik O, Rahman T, Kalyanaraman A. Maptcha: an efficient parallel workflow for hybrid genome scaffolding. BMC Bioinformatics 2024; 25:263. [PMID: 39118013 PMCID: PMC11313021 DOI: 10.1186/s12859-024-05878-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 07/22/2024] [Indexed: 08/10/2024] Open
Abstract
BACKGROUND Genome assembly, which involves reconstructing a target genome, relies on scaffolding methods to organize and link partially assembled fragments. The rapid evolution of long read sequencing technologies toward more accurate long reads, coupled with the continued use of short read technologies, has created a unique need for hybrid assembly workflows. The construction of accurate genomic scaffolds in hybrid workflows is complicated due to scale, sequencing technology diversity (e.g., short vs. long reads, contigs or partial assemblies), and repetitive regions within a target genome. RESULTS In this paper, we present a new parallel workflow for hybrid genome scaffolding that would allow combining pre-constructed partial assemblies with newly sequenced long reads toward an improved assembly. More specifically, the workflow, called Maptcha, is aimed at generating long scaffolds of a target genome, from two sets of input sequences-an already constructed partial assembly of contigs, and a set of newly sequenced long reads. Our scaffolding approach internally uses an alignment-free mapping step to build a ⟨ contig,contig ⟩ graph using long reads as linking information. Subsequently, this graph is used to generate scaffolds. We present and evaluate a graph-theoretic "wiring" heuristic to perform this scaffolding step. To enable efficient workload management in a parallel setting, we use a batching technique that partitions the scaffolding tasks so that the more expensive alignment-based assembly step at the end can be efficiently parallelized. This step also allows the use of any standalone assembler for generating the final scaffolds. CONCLUSIONS Our experiments with Maptcha on a variety of input genomes, and comparison against two state-of-the-art hybrid scaffolders demonstrate that Maptcha is able to generate longer and more accurate scaffolds substantially faster. In almost all cases, the scaffolds produced by Maptcha are at least an order of magnitude longer (in some cases two orders) than the scaffolds produced by state-of-the-art tools. Maptcha runs significantly faster too, reducing time-to-solution from hours to minutes for most input cases. We also performed a coverage experiment by varying the sequencing coverage depth for long reads, which demonstrated the potential of Maptcha to generate significantly longer scaffolds in low coverage settings ( 1 × - 10 × ).
Collapse
Affiliation(s)
- Oieswarya Bhowmik
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, 99164, USA.
| | - Tazin Rahman
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, 99164, USA
| | - Ananth Kalyanaraman
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, 99164, USA
| |
Collapse
|
3
|
Feldmeyer B, Bornberg-Bauer E, Dohmen E, Fouks B, Heckenhauer J, Huylmans AK, Jones ARC, Stolle E, Harrison MC. Comparative Evolutionary Genomics in Insects. Methods Mol Biol 2024; 2802:473-514. [PMID: 38819569 DOI: 10.1007/978-1-0716-3838-5_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Genome sequencing quality, in terms of both read length and accuracy, is constantly improving. By combining long-read sequencing technologies with various scaffolding techniques, chromosome-level genome assemblies are now achievable at an affordable price for non-model organisms. Insects represent an exciting taxon for studying the genomic underpinnings of evolutionary innovations, due to ancient origins, immense species-richness, and broad phenotypic diversity. Here we summarize some of the most important methods for carrying out a comparative genomics study on insects. We describe available tools and offer concrete tips on all stages of such an endeavor from DNA extraction through genome sequencing, annotation, and several evolutionary analyses. Along the way we describe important insect-specific aspects, such as DNA extraction difficulties or gene families that are particularly difficult to annotate, and offer solutions. We describe results from several examples of comparative genomics analyses on insects to illustrate the fascinating questions that can now be addressed in this new age of genomics research.
Collapse
Affiliation(s)
- Barbara Feldmeyer
- Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Molecular Ecology, Frankfurt, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Elias Dohmen
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Bertrand Fouks
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Jacqueline Heckenhauer
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany
- Department of Terrestrial Zoology, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt, Germany
| | - Ann Kathrin Huylmans
- Institute of Organismic and Molecular Evolution, Johannes Gutenberg University, Mainz, Germany
| | - Alun R C Jones
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Eckart Stolle
- Museum Koenig, Leibniz Institute for the Analysis of Biodiversity Change (LIB), Bonn, Germany
| | - Mark C Harrison
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany.
| |
Collapse
|
4
|
Sproul JS, Hotaling S, Heckenhauer J, Powell A, Marshall D, Larracuente AM, Kelley JL, Pauls SU, Frandsen PB. Analyses of 600+ insect genomes reveal repetitive element dynamics and highlight biodiversity-scale repeat annotation challenges. Genome Res 2023; 33:1708-1717. [PMID: 37739812 PMCID: PMC10691545 DOI: 10.1101/gr.277387.122] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 09/20/2023] [Indexed: 09/24/2023]
Abstract
Repetitive elements (REs) are integral to the composition, structure, and function of eukaryotic genomes, yet remain understudied in most taxonomic groups. We investigated REs across 601 insect species and report wide variation in RE dynamics across groups. Analysis of associations between REs and protein-coding genes revealed dynamic evolution at the interface between REs and coding regions across insects, including notably elevated RE-gene associations in lineages with abundant long interspersed nuclear elements (LINEs). We leveraged this large, empirical data set to quantify impacts of long-read technology on RE detection and investigate fundamental challenges to RE annotation in diverse groups. In long-read assemblies, we detected ∼36% more REs than short-read assemblies, with long terminal repeats (LTRs) showing 162% increased detection, whereas DNA transposons and LINEs showed less respective technology-related bias. In most insect lineages, 25%-85% of repetitive sequences were "unclassified" following automated annotation, compared with only ∼13% in Drosophila species. Although the diversity of available insect genomes has rapidly expanded, we show the rate of community contributions to RE databases has not kept pace, preventing efficient annotation and high-resolution study of REs in most groups. We highlight the tremendous opportunity and need for the biodiversity genomics field to embrace REs and suggest collective steps for making progress toward this goal.
Collapse
Affiliation(s)
- John S Sproul
- Department of Biology, Brigham Young University, Provo, Utah 84602, USA;
- Department of Biology, University of Nebraska Omaha, Omaha, Nebraska 68182, USA
- Department of Biology, University of Rochester, Rochester, New York 14627, USA
| | - Scott Hotaling
- School of Biological Sciences, Washington State University, Pullman, Washington 99163, USA
- Department of Watershed Sciences, Utah State University, Logan, Utah 84322, USA
| | - Jacqueline Heckenhauer
- LOEWE Center for Translational Biodiversity Genomics (LOEWE-TBG), 60325 Frankfurt, Germany
- Senckenberg Research Institute and Natural History Museum Frankfurt, 60325 Frankfurt, Germany
| | - Ashlyn Powell
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, Utah 84602, USA
| | - Dez Marshall
- Department of Biology, University of Nebraska Omaha, Omaha, Nebraska 68182, USA
| | | | - Joanna L Kelley
- School of Biological Sciences, Washington State University, Pullman, Washington 99163, USA
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, California 95064, USA
| | - Steffen U Pauls
- LOEWE Center for Translational Biodiversity Genomics (LOEWE-TBG), 60325 Frankfurt, Germany
- Senckenberg Research Institute and Natural History Museum Frankfurt, 60325 Frankfurt, Germany
- Department of Insect Biotechnology, Justus-Liebig-University Gießen, 35392 Gießen, Germany
| | - Paul B Frandsen
- LOEWE Center for Translational Biodiversity Genomics (LOEWE-TBG), 60325 Frankfurt, Germany
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, Utah 84602, USA
- Data Science Lab, Smithsonian Institution, Washington, District of Columbia 20560, USA
| |
Collapse
|
5
|
Hotaling S, Wilcox ER, Heckenhauer J, Stewart RJ, Frandsen PB. Highly accurate long reads are crucial for realizing the potential of biodiversity genomics. BMC Genomics 2023; 24:117. [PMID: 36927511 PMCID: PMC10018877 DOI: 10.1186/s12864-023-09193-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 02/17/2023] [Indexed: 03/18/2023] Open
Abstract
BACKGROUND Generating the most contiguous, accurate genome assemblies given available sequencing technologies is a long-standing challenge in genome science. With the rise of long-read sequencing, assembly challenges have shifted from merely increasing contiguity to correctly assembling complex, repetitive regions of interest, ideally in a phased manner. At present, researchers largely choose between two types of long read data: longer, but less accurate sequences, or highly accurate, but shorter reads (i.e., >Q20 or 99% accurate). To better understand how these types of long-read data as well as scale of data (i.e., mean length and sequencing depth) influence genome assembly outcomes, we compared genome assemblies for a caddisfly, Hesperophylax magnus, generated with longer, but less accurate, Oxford Nanopore (ONT) R9.4.1 and highly accurate PacBio HiFi (HiFi) data. Next, we expanded this comparison to consider the influence of highly accurate long-read sequence data on genome assemblies across 6750 plant and animal genomes. For this broader comparison, we used HiFi data as a surrogate for highly accurate long-reads broadly as we could identify when they were used from GenBank metadata. RESULTS HiFi reads outperformed ONT reads in all assembly metrics tested for the caddisfly data set and allowed for accurate assembly of the repetitive ~ 20 Kb H-fibroin gene. Across plants and animals, genome assemblies that incorporated HiFi reads were also more contiguous. For plants, the average HiFi assembly was 501% more contiguous (mean contig N50 = 20.5 Mb) than those generated with any other long-read data (mean contig N50 = 4.1 Mb). For animals, HiFi assemblies were 226% more contiguous (mean contig N50 = 20.9 Mb) versus other long-read assemblies (mean contig N50 = 9.3 Mb). In plants, we also found limited evidence that HiFi may offer a unique solution for overcoming genomic complexity that scales with assembly size. CONCLUSIONS Highly accurate long-reads generated with HiFi or analogous technologies represent a key tool for maximizing genome assembly quality for a wide swath of plants and animals. This finding is particularly important when resources only allow for one type of sequencing data to be generated. Ultimately, to realize the promise of biodiversity genomics, we call for greater uptake of highly accurate long-reads in future studies.
Collapse
Affiliation(s)
- Scott Hotaling
- Department of Watershed Sciences, Utah State University, Logan, UT, USA.
| | - Edward R Wilcox
- DNA Sequencing Center, Department of Biology, Brigham Young University, Provo, UT, USA
| | - Jacqueline Heckenhauer
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany
- Department of Terrestrial Zoology, Senckenberg Research Institute and Natural History Museum Frankfurt, 60325, Frankfurt, Germany
| | - Russell J Stewart
- Department of Biomedical Engineering, University of Utah, Salt Lake City, UT, USA
| | - Paul B Frandsen
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany.
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT, USA.
- Data Science Lab, Smithsonian Institution, Washington, DC, USA.
| |
Collapse
|
6
|
Deng X, Frandsen PB, Dikow RB, Favre A, Shah DN, Shah RDT, Schneider JV, Heckenhauer J, Pauls SU. The impact of sequencing depth and relatedness of the reference genome in population genomic studies: A case study with two caddisfly species (Trichoptera, Rhyacophilidae, Himalopsyche). Ecol Evol 2022; 12:e9583. [PMID: 36523526 PMCID: PMC9745013 DOI: 10.1002/ece3.9583] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 11/10/2022] [Accepted: 11/16/2022] [Indexed: 12/15/2022] Open
Abstract
Whole genome sequencing for generating SNP data is increasingly used in population genetic studies. However, obtaining genomes for massive numbers of samples is still not within the budgets of many researchers. It is thus imperative to select an appropriate reference genome and sequencing depth to ensure the accuracy of the results for a specific research question, while balancing cost and feasibility. To evaluate the effect of the choice of the reference genome and sequencing depth on downstream analyses, we used five confamilial reference genomes of variable relatedness and three levels of sequencing depth (3.5×, 7.5× and 12×) in a population genomic study on two caddisfly species: Himalopsyche digitata and H. tibetana. Using these 30 datasets (five reference genomes × three depths × two target species), we estimated population genetic indices (inbreeding coefficient, nucleotide diversity, pairwise F ST, and genome-wide distribution of F ST) based on variants and population structure (PCA and admixture) based on genotype likelihood estimates. The results showed that both distantly related reference genomes and lower sequencing depth lead to degradation of resolution. In addition, choosing a more closely related reference genome may significantly remedy the defects caused by low depth. Therefore, we conclude that population genetic studies would benefit from closely related reference genomes, especially as the costs of obtaining a high-quality reference genome continue to decrease. However, to determine a cost-efficient strategy for a specific population genomic study, a trade-off between reference genome relatedness and sequencing depth can be considered.
Collapse
Affiliation(s)
- Xi‐Ling Deng
- Senckenberg Research Institute and Natural History MuseumFrankfurt/MainGermany
- Institute of Insect BiotechnologyJustus‐Liebig‐University GießenGießenGermany
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE‐TBG)Frankfurt/MainGermany
| | - Paul B. Frandsen
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE‐TBG)Frankfurt/MainGermany
- Department of Plant & Wildlife SciencesBrigham Young UniversityProvoUtahUSA
- Data Science Lab, Office of the Chief Information OfficerSmithsonian InstitutionWashingtonDCUSA
| | - Rebecca B. Dikow
- Data Science Lab, Office of the Chief Information OfficerSmithsonian InstitutionWashingtonDCUSA
| | - Adrien Favre
- Senckenberg Research Institute and Natural History MuseumFrankfurt/MainGermany
- Regional Nature Park of the Trient ValleySalvanSwitzerland
| | - Deep Narayan Shah
- Central Department of Environmental ScienceTribhuvan UniversityKirtipurNepal
| | - Ram Devi Tachamo Shah
- Aquatic Ecology Centre, School of ScienceKathmandu UniversityDhulikhelNepal
- Department of Life SciencesSchool of Science, Kathmandu UniversityDhulikhelNepal
| | - Julio V. Schneider
- Senckenberg Research Institute and Natural History MuseumFrankfurt/MainGermany
| | - Jacqueline Heckenhauer
- Senckenberg Research Institute and Natural History MuseumFrankfurt/MainGermany
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE‐TBG)Frankfurt/MainGermany
| | - Steffen U. Pauls
- Senckenberg Research Institute and Natural History MuseumFrankfurt/MainGermany
- Institute of Insect BiotechnologyJustus‐Liebig‐University GießenGießenGermany
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE‐TBG)Frankfurt/MainGermany
| |
Collapse
|
7
|
Kawahara AY, Storer CG, Markee A, Heckenhauer J, Powell A, Plotkin D, Hotaling S, Cleland TP, Dikow RB, Dikow T, Kuranishi RB, Messcher R, Pauls SU, Stewart RJ, Tojo K, Frandsen PB. Long-read HiFi sequencing correctly assembles repetitive heavy fibroin silk genes in new moth and caddisfly genomes. GIGABYTE 2022; 2022:gigabyte64. [PMID: 36824508 PMCID: PMC9693786 DOI: 10.46471/gigabyte.64] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 06/24/2022] [Indexed: 11/09/2022] Open
Abstract
Insect silk is a versatile biomaterial. Lepidoptera and Trichoptera display some of the most diverse uses of silk, with varying strength, adhesive qualities, and elastic properties. Silk fibroin genes are long (>20 Kbp), with many repetitive motifs that make them challenging to sequence. Most research thus far has focused on conserved N- and C-terminal regions of fibroin genes because a full comparison of repetitive regions across taxa has not been possible. Using the PacBio Sequel II system and SMRT sequencing, we generated high fidelity (HiFi) long-read genomic and transcriptomic sequences for the Indianmeal moth (Plodia interpunctella) and genomic sequences for the caddisfly Eubasilissa regina. Both genomes were highly contiguous (N50 = 9.7 Mbp/32.4 Mbp, L50 = 13/11) and complete (BUSCO complete = 99.3%/95.2%), with complete and contiguous recovery of silk heavy fibroin gene sequences. We show that HiFi long-read sequencing is helpful for understanding genes with long, repetitive regions.
Collapse
Affiliation(s)
- Akito Y. Kawahara
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA
| | - Caroline G. Storer
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA
- Pacific Biosciences, 1305 O’Brien Dr., Menlo Park, CA 94025, USA
| | - Amanda Markee
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA
- School of Natural Resources and the Environment, University of Florida, Gainesville, FL 32611, USA
| | - Jacqueline Heckenhauer
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt 60325, Germany
- Department of Terrestrial Zoology, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt 60325, Germany
| | - Ashlyn Powell
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT 84602, USA
| | - David Plotkin
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA
| | - Scott Hotaling
- School of Biological Sciences, Washington State University, Pullman, WA, USA
| | - Timothy P. Cleland
- Museum Conservation Institute, Smithsonian Institution, Suitland, MD 20746, USA
| | - Rebecca B. Dikow
- Data Science Lab, Office of the Chief Information Officer, Smithsonian Institution, Washington, DC 20002, USA
| | - Torsten Dikow
- Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - Ryoichi B. Kuranishi
- Graduate School of Science, Chiba University, Chiba 263-8522, Japan
- Kanagawa Institute of Technology, Kanagawa 243-0292, Japan
| | - Rebeccah Messcher
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA
| | - Steffen U. Pauls
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt 60325, Germany
- Department of Terrestrial Zoology, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt 60325, Germany
- Institute for Insect Biotechnology, Justus-Liebig-University, Gießen 35390, Germany
| | - Russell J. Stewart
- Department of Biomedical Engineering, University of Utah, Salt Lake City, UT 84112, USA
| | - Koji Tojo
- Department of Biology, Shinshu University, Matsumoto, Nagano 390-8621, Japan
| | - Paul B. Frandsen
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT 84602, USA
- Data Science Lab, Office of the Chief Information Officer, Smithsonian Institution, Washington, DC 20002, USA
| |
Collapse
|
8
|
Heckenhauer J, Frandsen PB, Sproul JS, Li Z, Paule J, Larracuente AM, Maughan PJ, Barker MS, Schneider JV, Stewart RJ, Pauls SU. Genome size evolution in the diverse insect order Trichoptera. Gigascience 2022; 11:giac011. [PMID: 35217860 PMCID: PMC8881205 DOI: 10.1093/gigascience/giac011] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2021] [Revised: 11/25/2021] [Accepted: 01/21/2022] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Genome size is implicated in the form, function, and ecological success of a species. Two principally different mechanisms are proposed as major drivers of eukaryotic genome evolution and diversity: polyploidy (i.e., whole-genome duplication) or smaller duplication events and bursts in the activity of repetitive elements. Here, we generated de novo genome assemblies of 17 caddisflies covering all major lineages of Trichoptera. Using these and previously sequenced genomes, we use caddisflies as a model for understanding genome size evolution in diverse insect lineages. RESULTS We detect a ∼14-fold variation in genome size across the order Trichoptera. We find strong evidence that repetitive element expansions, particularly those of transposable elements (TEs), are important drivers of large caddisfly genome sizes. Using an innovative method to examine TEs associated with universal single-copy orthologs (i.e., BUSCO genes), we find that TE expansions have a major impact on protein-coding gene regions, with TE-gene associations showing a linear relationship with increasing genome size. Intriguingly, we find that expanded genomes preferentially evolved in caddisfly clades with a higher ecological diversity (i.e., various feeding modes, diversification in variable, less stable environments). CONCLUSION Our findings provide a platform to test hypotheses about the potential evolutionary roles of TE activity and TE-gene associations, particularly in groups with high species, ecological, and functional diversities.
Collapse
Affiliation(s)
- Jacqueline Heckenhauer
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt 60325, Germany
- Department of Terrestrial Zoology, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt 60325, Germany
| | - Paul B Frandsen
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt 60325, Germany
- Department of Plant & Wildlife Sciences, Brigham Young University, Provo, UT 84602, USA
- Data Science Lab, Smithsonian Institution, Washington, DC 20560, USA
| | - John S Sproul
- Department of Biology, University of Rochester, Rochester, NY 14620, USA
- Department of Biology, University of Nebraska Omaha, Omaha, NE 68182, USA
| | - Zheng Li
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Juraj Paule
- Department of Botany and Molecular Evolution, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt 60325, Germany
| | | | - Peter J Maughan
- Department of Plant & Wildlife Sciences, Brigham Young University, Provo, UT 84602, USA
| | - Michael S Barker
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Julio V Schneider
- Department of Terrestrial Zoology, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt 60325, Germany
| | - Russell J Stewart
- Department of Biomedical Engineering, University of Utah, Salt Lake City, UT 84112, USA
| | - Steffen U Pauls
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt 60325, Germany
- Department of Terrestrial Zoology, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt 60325, Germany
- Institute for Insect Biotechnology, Justus-Liebig-University, Gießen 35390, Germany
| |
Collapse
|
9
|
Ríos-Touma B, Holzenthal RW, Rázuri-Gonzales E, Heckenhauer J, Pauls SU, Storer CG, Frandsen PB. De Novo Genome Assembly and Annotation of an Andean Caddisfly, Atopsyche davidsoni Sykora, 1991, a Model for Genome Research of High-Elevation Adaptations. Genome Biol Evol 2022; 14:evab286. [PMID: 34962985 PMCID: PMC8767365 DOI: 10.1093/gbe/evab286] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/20/2021] [Indexed: 11/13/2022] Open
Abstract
We sequence, assemble, and annotate the genome of Atopsyche davidsoni Sykora, 1991, the first whole-genome assembly for the caddisfly family Hydrobiosidae. This free-living and predatory caddisfly inhabits streams in the high-elevation Andes and is separated by more than 200 Myr of evolutionary history from the most closely related caddisfly species with genome assemblies available. We demonstrate the promise of PacBio HiFi reads by assembling the most contiguous caddisfly genome assembly to date with a contig N50 of 14 Mb, which is more than 6× more contiguous than the current most contiguous assembly for a caddisfly (Hydropsyche tenuis). We recover 98.8% of insect BUSCO genes indicating a high level of gene completeness. We also provide a genome annotation of 12,232 annotated proteins. This new genome assembly provides an important new resource for studying genomic adaptation of aquatic insects to harsh, high-altitude environments.
Collapse
Affiliation(s)
- Blanca Ríos-Touma
- Facultad de Ingenierías y Ciencias Aplicadas, Ingeniería Ambiental, Grupo de Investigación en Biodiversidad, Medio Ambiente y Salud (BIOMAS), Universidad de las Américas, Quito, Ecuador
| | - Ralph W Holzenthal
- Department of Entomology, University of Minnesota, St. Paul, Minnesota, USA
| | - Ernesto Rázuri-Gonzales
- Department of Entomology, University of Minnesota, St. Paul, Minnesota, USA
- Department of Terrestrial Zoology, Entomology III, Senckenberg Research Institute and Natural History Museum Frankfurt, Germany
| | - Jacqueline Heckenhauer
- Department of Terrestrial Zoology, Entomology III, Senckenberg Research Institute and Natural History Museum Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany
| | - Steffen U Pauls
- Department of Terrestrial Zoology, Entomology III, Senckenberg Research Institute and Natural History Museum Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany
- Institute of Insect Biotechnology, Justus-Liebig University, Gießen, Germany
| | - Caroline G Storer
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, Florida, USA
| | - Paul B Frandsen
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, Utah, USA
| |
Collapse
|
10
|
Li X, Ellis E, Plotkin D, Imada Y, Yago M, Heckenhauer J, Cleland TP, Dikow RB, Dikow T, Storer CG, Kawahara AY, Frandsen PB. First Annotated Genome of a Mandibulate Moth, Neomicropteryx cornuta, Generated Using PacBio HiFi Sequencing. Genome Biol Evol 2021; 13:6380144. [PMID: 34599325 PMCID: PMC8557830 DOI: 10.1093/gbe/evab229] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/27/2021] [Indexed: 11/14/2022] Open
Abstract
We provide a new, annotated genome assembly of Neomicropteryx cornuta, a species of the so-called mandibulate archaic moths (Lepidoptera: Micropterigidae). These moths belong to a lineage that is thought to have split from all other Lepidoptera more than 300 Ma and are consequently vital to understanding the early evolution of superorder Amphiesmenoptera, which contains the order Lepidoptera (butterflies and moths) and its sister order Trichoptera (caddisflies). Using PacBio HiFi sequencing reads, we assembled a highly contiguous genome with a contig N50 of nearly 17 Mb. The assembled genome length of 541,115,538 bp is about half the length of the largest published Amphiesmenoptera genome (Limnephilus lunatus, Trichoptera) and double the length of the smallest (Papilio polytes, Lepidoptera). We find high recovery of universal single copy orthologs with 98.1% of BUSCO genes present and provide a genome annotation of 15,643 genes aided by resolved isoforms from PacBio IsoSeq data. This high-quality genome assembly provides an important resource for studying ecological and evolutionary transitions in the early evolution of Amphiesmenoptera.
Collapse
Affiliation(s)
- Xuankun Li
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, USA
| | - Emily Ellis
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, USA
| | - David Plotkin
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, USA
| | - Yume Imada
- Graduate School of Science and Engineering, Ehime University, Matsuyama, Japan
| | - Masaya Yago
- The University Museum, The University of Tokyo, Hongo, Bunkyo-ku, Japan
| | - Jacqueline Heckenhauer
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany.,Department of Terrestrial Zoology, Entomology III, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt, Germany
| | - Timothy P Cleland
- Museum Conservation Institute, Smithsonian Institution, Suitland, Maryland, USA
| | - Rebecca B Dikow
- Data Science Lab, Office of the Chief Information Officer, Smithsonian Institution, Washington, District of Columbia, USA
| | - Torsten Dikow
- Department of Entomology, National Museum of Natural History (USNM), Smithsonian Institution, Washington, District of Columbia, USA
| | - Caroline G Storer
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, USA
| | - Akito Y Kawahara
- McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, USA
| | - Paul B Frandsen
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany.,Data Science Lab, Office of the Chief Information Officer, Smithsonian Institution, Washington, District of Columbia, USA.,Department of Plant and Wildlife Sciences, Brigham Young University, USA
| |
Collapse
|