1
|
Blanco-Melo D, Campbell MA, Zhu H, Dennis TPW, Modha S, Lytras S, Hughes J, Gatseva A, Gifford RJ. A novel approach to exploring the dark genome and its application to mapping of the vertebrate virus fossil record. Genome Biol 2024; 25:120. [PMID: 38741126 DOI: 10.1186/s13059-024-03258-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 04/22/2024] [Indexed: 05/16/2024] Open
Abstract
BACKGROUND Genomic regions that remain poorly understood, often referred to as the dark genome, contain a variety of functionally relevant and biologically informative features. These include endogenous viral elements (EVEs)-virus-derived sequences that can dramatically impact host biology and serve as a virus fossil record. In this study, we introduce a database-integrated genome screening (DIGS) approach to investigate the dark genome in silico, focusing on EVEs found within vertebrate genomes. RESULTS Using DIGS on 874 vertebrate genomes, we uncover approximately 1.1 million EVE sequences, with over 99% originating from endogenous retroviruses or transposable elements that contain EVE DNA. We show that the remaining 6038 sequences represent over a thousand distinct horizontal gene transfer events across 10 virus families, including some that have not previously been reported as EVEs. We explore the genomic and phylogenetic characteristics of non-retroviral EVEs and determine their rates of acquisition during vertebrate evolution. Our study uncovers novel virus diversity, broadens knowledge of virus distribution among vertebrate hosts, and provides new insights into the ecology and evolution of vertebrate viruses. CONCLUSIONS We comprehensively catalog and analyze EVEs within 874 vertebrate genomes, shedding light on the distribution, diversity, and long-term evolution of viruses and reveal their extensive impact on vertebrate genome evolution. Our results demonstrate the power of linking a relational database management system to a similarity search-based screening pipeline for in silico exploration of the dark genome.
Collapse
Affiliation(s)
- Daniel Blanco-Melo
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, Seattle, WA, 98109, USA
- Herbold Computational Biology Program, Public Health Sciences Division, Fred Hutchinson Cancer Center, 1100 Fairview Ave N, Seattle, WA, 98109, USA
| | | | - Henan Zhu
- MRC-University of Glasgow Centre for Virus Research, 464 Bearsden Rd, Bearsden, Glasgow, G61 1QH, UK
| | - Tristan P W Dennis
- MRC-University of Glasgow Centre for Virus Research, 464 Bearsden Rd, Bearsden, Glasgow, G61 1QH, UK
| | - Sejal Modha
- MRC-University of Glasgow Centre for Virus Research, 464 Bearsden Rd, Bearsden, Glasgow, G61 1QH, UK
| | - Spyros Lytras
- MRC-University of Glasgow Centre for Virus Research, 464 Bearsden Rd, Bearsden, Glasgow, G61 1QH, UK
| | - Joseph Hughes
- MRC-University of Glasgow Centre for Virus Research, 464 Bearsden Rd, Bearsden, Glasgow, G61 1QH, UK
| | - Anna Gatseva
- MRC-University of Glasgow Centre for Virus Research, 464 Bearsden Rd, Bearsden, Glasgow, G61 1QH, UK
| | - Robert J Gifford
- MRC-University of Glasgow Centre for Virus Research, 464 Bearsden Rd, Bearsden, Glasgow, G61 1QH, UK.
- Centre for Epidemic Response and Innovation (CERI), School of Data Science and Computational Thinking, Stellenbosch University, Stellenbosch, South Africa.
| |
Collapse
|
2
|
Brait N, Hackl T, Morel C, Exbrayat A, Gutierrez S, Lequime S. A tale of caution: How endogenous viral elements affect virus discovery in transcriptomic data. Virus Evol 2023; 10:vead088. [PMID: 38516656 PMCID: PMC10956553 DOI: 10.1093/ve/vead088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Revised: 11/24/2023] [Accepted: 12/22/2023] [Indexed: 03/23/2024] Open
Abstract
Large-scale metagenomic and -transcriptomic studies have revolutionized our understanding of viral diversity and abundance. In contrast, endogenous viral elements (EVEs), remnants of viral sequences integrated into host genomes, have received limited attention in the context of virus discovery, especially in RNA-Seq data. EVEs resemble their original viruses, a challenge that makes distinguishing between active infections and integrated remnants difficult, affecting virus classification and biases downstream analyses. Here, we systematically assess the effects of EVEs on a prototypical virus discovery pipeline, evaluate their impact on data integrity and classification accuracy, and provide some recommendations for better practices. We examined EVEs and exogenous viral sequences linked to Orthomyxoviridae, a diverse family of negative-sense segmented RNA viruses, in 13 genomic and 538 transcriptomic datasets of Culicinae mosquitoes. Our analysis revealed a substantial number of viral sequences in transcriptomic datasets. However, a significant portion appeared not to be exogenous viruses but transcripts derived from EVEs. Distinguishing between transcribed EVEs and exogenous virus sequences was especially difficult in samples with low viral abundance. For example, three transcribed EVEs showed full-length segments, devoid of frameshift and nonsense mutations, exhibiting sufficient mean read depths that qualify them as exogenous virus hits. Mapping reads on a host genome containing EVEs before assembly somewhat alleviated the EVE burden, but it led to a drastic reduction of viral hits and reduced quality of assemblies, especially in regions of the viral genome relatively similar to EVEs. Our study highlights that our knowledge of the genetic diversity of viruses can be altered by the underestimated presence of EVEs in transcriptomic datasets, leading to false positives and altered or missing sequence information. Thus, recognizing and addressing the influence of EVEs in virus discovery pipelines will be key in enhancing our ability to capture the full spectrum of viral diversity.
Collapse
Affiliation(s)
- Nadja Brait
- Cluster of Microbial Ecology, Groningen Institute for Evolutionary Life Sciences, University of Groningen, Groningen 9747 AG, The Netherlands
| | | | - Côme Morel
- ASTRE research unit, Cirad, INRAe, Université de Montpellier, Montpellier 34398, France
| | - Antoni Exbrayat
- ASTRE research unit, Cirad, INRAe, Université de Montpellier, Montpellier 34398, France
| | - Serafin Gutierrez
- ASTRE research unit, Cirad, INRAe, Université de Montpellier, Montpellier 34398, France
| | - Sebastian Lequime
- Cluster of Microbial Ecology, Groningen Institute for Evolutionary Life Sciences, University of Groningen, Groningen 9747 AG, The Netherlands
| |
Collapse
|
3
|
Parry RH, Slonchak A, Campbell LJ, Newton ND, Debat HJ, Gifford RJ, Khromykh AA. A novel tamanavirus ( Flaviviridae) of the European common frog ( Rana temporaria) from the UK. J Gen Virol 2023; 104:001927. [PMID: 38059479 PMCID: PMC10770923 DOI: 10.1099/jgv.0.001927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 11/19/2023] [Indexed: 12/08/2023] Open
Abstract
Flavivirids are small, enveloped, positive-sense RNA viruses from the family Flaviviridae with genomes of ~9-13 kb. Metatranscriptomic analyses of metazoan organisms have revealed a diversity of flavivirus-like or flavivirid viral sequences in fish and marine invertebrate groups. However, no flavivirus-like virus has been identified in amphibians. To remedy this, we investigated the virome of the European common frog (Rana temporaria) in the UK, utilizing high-throughput sequencing at six catch locations. De novo assembly revealed a coding-complete virus contig of a novel flavivirid ~11.2 kb in length. The virus encodes a single ORF of 3456 aa and 5' and 3' untranslated regions (UTRs) of 227 and 666 nt, respectively. We named this virus Rana tamanavirus (RaTV), as BLASTp analysis of the polyprotein showed the closest relationships to Tamana bat virus (TABV) and Cyclopterus lumpus virus from Pteronotus parnellii and Cyclopterus lumpus, respectively. Phylogenetic analysis of the RaTV polyprotein compared to Flavivirus and Flavivirus-like members indicated that RaTV was sufficiently divergent and basal to the vertebrate Tamanavirus clade. In addition to the Mitcham strain, partial but divergent RaTV, sharing 95.64-97.39 % pairwise nucleotide identity, were also obtained from the Poole and Deal samples, indicating that RaTV is widespread in UK frog samples. Bioinformatic analyses of predicted secondary structures in the 3'UTR of RaTV showed the presence of an exoribonuclease-resistant RNA (xrRNA) structure standard in flaviviruses and TABV. To examine this biochemically, we conducted an in vitro Xrn1 digestion assay showing that RaTV probably forms a functional Xrn1-resistant xrRNA.
Collapse
Affiliation(s)
- Rhys H. Parry
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia
| | - Andrii Slonchak
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia
- Australian Infectious Diseases Research Centre (AIDRC), Brisbane, QLD, Australia
| | - Lewis J. Campbell
- Department of Pathobiological Sciences, University of Wisconsin-Madison, Madison, WI, USA
- Institute of Zoology, Zoological Society of London, London, UK
- Environment and Sustainability Institute, University of Exeter, Penryn, UK
| | - Natalee D. Newton
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia
- Australian Infectious Diseases Research Centre (AIDRC), Brisbane, QLD, Australia
| | - Humberto J. Debat
- Instituto de Patología Vegetal, Centro de Investigaciones Agropecuarias, Instituto Nacional de Tecnología Agropecuaria (IPAVE-CIAP-INTA), Córdoba X5020ICA, Argentina
- Unidad de Fitopatología y Modelización Agrícola (UFYMA), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Córdoba X5020ICA, Argentina
| | | | - Alexander A. Khromykh
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia
- Australian Infectious Diseases Research Centre (AIDRC), Brisbane, QLD, Australia
- AIDRC Global Virus Network Centre of Excellence, Brisbane, QLD, Australia
| |
Collapse
|
4
|
Mifsud JCO, Costa VA, Petrone ME, Marzinelli EM, Holmes EC, Harvey E. Transcriptome mining extends the host range of the Flaviviridae to non-bilaterians. Virus Evol 2022; 9:veac124. [PMID: 36694816 PMCID: PMC9854234 DOI: 10.1093/ve/veac124] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Revised: 12/20/2022] [Accepted: 12/26/2022] [Indexed: 12/27/2022] Open
Abstract
The flavivirids (family Flaviviridae) are a group of positive-sense RNA viruses that include well-documented agents of human disease. Despite their importance and ubiquity, the timescale of flavivirid evolution is uncertain. An ancient origin, spanning millions of years, is supported by their presence in both vertebrates and invertebrates and by the identification of a flavivirus-derived endogenous viral element in the peach blossom jellyfish genome (Craspedacusta sowerbii, phylum Cnidaria), implying that the flaviviruses arose early in the evolution of the Metazoa. To date, however, no exogenous flavivirid sequences have been identified in these hosts. To help resolve the antiquity of the Flaviviridae, we mined publicly available transcriptome data across the Metazoa. From this, we expanded the diversity within the family through the identification of 32 novel viral sequences and extended the host range of the pestiviruses to include amphibians, reptiles, and ray-finned fish. Through co-phylogenetic analysis we found cross-species transmission to be the predominate macroevolutionary event across the non-vectored flavivirid genera (median, 68 per cent), including a cross-species transmission event between bats and rodents, although long-term virus-host co-divergence was still a regular occurrence (median, 23 per cent). Notably, we discovered flavivirus-like sequences in basal metazoan species, including the first associated with Cnidaria. This sequence formed a basal lineage to the genus Flavivirus and was closer to arthropod and crustacean flaviviruses than those in the tamanavirus group, which includes a variety of invertebrate and vertebrate viruses. Combined, these data attest to an ancient origin of the flaviviruses, likely close to the emergence of the metazoans 750-800 million years ago.
Collapse
Affiliation(s)
| | - Vincenzo A Costa
- Sydney Institute for Infectious Diseases, School of Medical Sciences, The University of Sydney, Sydney NSW 2006, Australia
| | - Mary E Petrone
- Sydney Institute for Infectious Diseases, School of Medical Sciences, The University of Sydney, Sydney NSW 2006, Australia
| | - Ezequiel M Marzinelli
- School of Life and Environmental Sciences, The University of Sydney, Sydney NSW 2006, Australia,Sydney Institute of Marine Science, 19 Chowder Bay Rd, Mosman, NSW 2088, Australia,Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore 637551 Singapore
| | | | | |
Collapse
|