1
|
Oliveira LS, Reyes A, Dutilh BE, Gruber A. Rational Design of Profile HMMs for Sensitive and Specific Sequence Detection with Case Studies Applied to Viruses, Bacteriophages, and Casposons. Viruses 2023; 15:519. [PMID: 36851733 PMCID: PMC9966878 DOI: 10.3390/v15020519] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 02/01/2023] [Accepted: 02/09/2023] [Indexed: 02/15/2023] Open
Abstract
Profile hidden Markov models (HMMs) are a powerful way of modeling biological sequence diversity and constitute a very sensitive approach to detecting divergent sequences. Here, we report the development of protocols for the rational design of profile HMMs. These methods were implemented on TABAJARA, a program that can be used to either detect all biological sequences of a group or discriminate specific groups of sequences. By calculating position-specific information scores along a multiple sequence alignment, TABAJARA automatically identifies the most informative sequence motifs and uses them to construct profile HMMs. As a proof-of-principle, we applied TABAJARA to generate profile HMMs for the detection and classification of two viral groups presenting different evolutionary rates: bacteriophages of the Microviridae family and viruses of the Flavivirus genus. We obtained conserved models for the generic detection of any Microviridae or Flavivirus sequence, and profile HMMs that can specifically discriminate Microviridae subfamilies or Flavivirus species. In another application, we constructed Cas1 endonuclease-derived profile HMMs that can discriminate CRISPRs and casposons, two evolutionarily related transposable elements. We believe that the protocols described here, and implemented on TABAJARA, constitute a generic toolbox for generating profile HMMs for the highly sensitive and specific detection of sequence classes.
Collapse
Affiliation(s)
- Liliane S. Oliveira
- Department of Parasitology, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo 05508-000, SP, Brazil
| | - Alejandro Reyes
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá 111711, Colombia
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, Saint Louis, MO 63108, USA
| | - Bas E. Dutilh
- Institute of Biodiversity, Faculty of Biological Sciences, Cluster of Excellence Balance of the Microverse, Friedrich-Schiller-University Jena, 07743 Jena, Germany
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
| | - Arthur Gruber
- Department of Parasitology, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo 05508-000, SP, Brazil
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
| |
Collapse
|
2
|
Munagala NVTS, Amanchi PK, Balasubramanian K, Panicker A, Nagaraj N. Compression-Complexity Measures for Analysis and Classification of Coronaviruses. ENTROPY (BASEL, SWITZERLAND) 2022; 25:81. [PMID: 36673224 PMCID: PMC9857615 DOI: 10.3390/e25010081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Revised: 12/10/2022] [Accepted: 12/18/2022] [Indexed: 06/17/2023]
Abstract
Finding a vaccine or specific antiviral treatment for a global pandemic of virus diseases (such as the ongoing COVID-19) requires rapid analysis, annotation and evaluation of metagenomic libraries to enable a quick and efficient screening of nucleotide sequences. Traditional sequence alignment methods are not suitable and there is a need for fast alignment-free techniques for sequence analysis. Information theory and data compression algorithms provide a rich set of mathematical and computational tools to capture essential patterns in biological sequences. In this study, we investigate the use of compression-complexity (Effort-to-Compress or ETC and Lempel-Ziv or LZ complexity) based distance measures for analyzing genomic sequences. The proposed distance measure is used to successfully reproduce the phylogenetic trees for a mammalian dataset consisting of eight species clusters, a set of coronaviruses belonging to group I, group II, group III, and SARS-CoV-1 coronaviruses, and a set of coronaviruses causing COVID-19 (SARS-CoV-2), and those not causing COVID-19. Having demonstrated the usefulness of these compression complexity measures, we employ them for the automatic classification of COVID-19-causing genome sequences using machine learning techniques. Two flavors of SVM (linear and quadratic) along with linear discriminant and fine K Nearest Neighbors classifer are used for classification. Using a data set comprising 1001 coronavirus sequences (causing COVID-19 and those not causing COVID-19), a classification accuracy of 98% is achieved with a sensitivity of 95% and a specificity of 99.8%. This work could be extended further to enable medical practitioners to automatically identify and characterize coronavirus strains and their rapidly growing mutants in a fast and efficient fashion.
Collapse
Affiliation(s)
- Naga Venkata Trinath Sai Munagala
- Department of Electronics and Communication Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Ettimadai 641112, Tamil Nadu, India
| | - Prem Kumar Amanchi
- Department of Electronics and Communication Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Ettimadai 641112, Tamil Nadu, India
| | - Karthi Balasubramanian
- Department of Electronics and Communication Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Ettimadai 641112, Tamil Nadu, India
| | - Athira Panicker
- Department of Electronics and Communication Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Ettimadai 641112, Tamil Nadu, India
| | - Nithin Nagaraj
- Consciousness Studies Programme, National Institute of Advanced Studies, Bengaluru 560012, Karnataka, India
| |
Collapse
|
3
|
Lu M, Schneider D, Daniel R. Metagenomic Screening for Lipolytic Genes Reveals an Ecology-Clustered Distribution Pattern. Front Microbiol 2022; 13:851969. [PMID: 35756004 PMCID: PMC9226776 DOI: 10.3389/fmicb.2022.851969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 04/28/2022] [Indexed: 12/02/2022] Open
Abstract
Lipolytic enzymes are one of the most important enzyme types for application in various industrial processes. Despite the continuously increasing demand, only a small portion of the so far encountered lipolytic enzymes exhibit adequate stability and activities for biotechnological applications. To explore novel and/or extremophilic lipolytic enzymes, microbial consortia in two composts at thermophilic stage were analyzed using function-driven and sequence-based metagenomic approaches. Analysis of community composition by amplicon-based 16S rRNA genes and transcripts, and direct metagenome sequencing revealed that the communities of the compost samples were dominated by members of the phyla Actinobacteria, Proteobacteria, Firmicutes, Bacteroidetes, and Chloroflexi. Function-driven screening of the metagenomic libraries constructed from the two samples yielded 115 unique lipolytic enzymes. The family assignment of these enzymes was conducted by analyzing the phylogenetic relationship and generation of a protein sequence similarity network according to an integrated classification system. The sequence-based screening was performed by using a newly developed database, containing a set of profile Hidden Markov models, highly sensitive and specific for detection of lipolytic enzymes. By comparing the lipolytic enzymes identified through both approaches, we demonstrated that the activity-directed complements sequence-based detection, and vice versa. The sequence-based comparative analysis of lipolytic genes regarding diversity, function and taxonomic origin derived from 175 metagenomes indicated significant differences between habitats. Analysis of the prevalent and distinct microbial groups providing the lipolytic genes revealed characteristic patterns and groups driven by ecological factors. The here presented data suggests that the diversity and distribution of lipolytic genes in metagenomes of various habitats are largely constrained by ecological factors.
Collapse
Affiliation(s)
| | | | - Rolf Daniel
- Department of Genomic and Applied Microbiology, Institute of Microbiology and Genetics, Georg August University of Göttingen, Göttingen, Germany
| |
Collapse
|
4
|
Mazur FG, Morinisi LM, Martins JO, Guerra PPB, Freire CCM. Exploring Virome Diversity in Public Data in South America as an Approach for Detecting Viral Sources From Potentially Emerging Viruses. Front Genet 2022; 12:722857. [PMID: 35126446 PMCID: PMC8814814 DOI: 10.3389/fgene.2021.722857] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 11/29/2021] [Indexed: 11/13/2022] Open
Abstract
The South American continent presents a great diversity of biomes, whose ecosystems are constantly threatened by the expansion of human activity. The emergence and re-emergence of viral populations with impact on the human population and ecosystem have shown increases in the last decades. In deference to the growing accumulation of genomic data, we explore the potential of South American-related public databases to detect signals that contribute to virosphere research. Therefore, our study aims to investigate public databases with emphasis on the surveillance of viruses with medical and ecological relevance. Herein, we profiled 120 "sequence read archives" metagenomes from 19 independent projects from the last decade. In a coarse view, our analyses identified only 0.38% of the total number of sequences from viruses, showing a higher proportion of RNA viruses. The metagenomes with the most important viral sequences in the analyzed environmental models were 1) aquatic samples from the Amazon River, 2) sewage from Brasilia, and 3) soil from the state of São Paulo, while the models of animal transmission were detected in mosquitoes from Rio Janeiro and Bats from Amazonia. Also, the classification of viral signals into operational taxonomic units (OTUs) (family) allowed us to infer from metadata a probable host range in the virome detected in each sample analyzed. Further, several motifs and viral sequences are related to specific viruses with emergence potential from Togaviridae, Arenaviridae, and Flaviviridae families. In this context, the exploration of public databases allowed us to evaluate the scope and informative capacity of sequences from third-party public databases and to detect signals related to viruses of clinical or environmental importance, which allowed us to infer traits associated with probable transmission routes or signals of ecological disequilibrium. The evaluation of our results showed that in most cases the size and type of the reference database, the percentage of guanine-cytosine (GC), and the length of the query sequences greatly influence the taxonomic classification of the sequences. In sum, our findings describe how the exploration of public genomic data can be exploited as an approach for epidemiological surveillance and the understanding of the virosphere.
Collapse
Affiliation(s)
| | | | | | | | - Caio C. M. Freire
- Department Genetics and Evolution, UFSCar—Federal University of São Carlos, São Carlos, Brazil
| |
Collapse
|
5
|
Nelson DR, Hazzouri KM, Lauersen KJ, Jaiswal A, Chaiboonchoe A, Mystikou A, Fu W, Daakour S, Dohai B, Alzahmi A, Nobles D, Hurd M, Sexton J, Preston MJ, Blanchette J, Lomas MW, Amiri KMA, Salehi-Ashtiani K. Large-scale genome sequencing reveals the driving forces of viruses in microalgal evolution. Cell Host Microbe 2021; 29:250-266.e8. [PMID: 33434515 DOI: 10.1016/j.chom.2020.12.005] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 10/08/2020] [Accepted: 11/18/2020] [Indexed: 01/08/2023]
Abstract
Being integral primary producers in diverse ecosystems, microalgal genomes could be mined for ecological insights, but representative genome sequences are lacking for many phyla. We cultured and sequenced 107 microalgae species from 11 different phyla indigenous to varied geographies and climates. This collection was used to resolve genomic differences between saltwater and freshwater microalgae. Freshwater species showed domain-centric ontology enrichment for nuclear and nuclear membrane functions, while saltwater species were enriched in organellar and cellular membrane functions. Further, marine species contained significantly more viral families in their genomes (p = 8e-4). Sequences from Chlorovirus, Coccolithovirus, Pandoravirus, Marseillevirus, Tupanvirus, and other viruses were found integrated into the genomes of algal from marine environments. These viral-origin sequences were found to be expressed and code for a wide variety of functions. Together, this study comprehensively defines the expanse of protein-coding and viral elements in microalgal genomes and posits a unified adaptive strategy for algal halotolerance.
Collapse
Affiliation(s)
- David R Nelson
- Center for Genomics and Systems Biology, New York University Abu Dhabi, Abu Dhabi, UAE.
| | - Khaled M Hazzouri
- Khalifa Center for Genetic Engineering and Biotechnology (KCGEB), UAE University, Al Ain, Abu Dhabi, UAE; Biology Department, College of Science, UAE University, Al Ain, Abu Dhabi, UAE
| | - Kyle J Lauersen
- Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
| | - Ashish Jaiswal
- Division of Science and Math, New York University Abu Dhabi, Abu Dhabi, UAE
| | | | - Alexandra Mystikou
- Center for Genomics and Systems Biology, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Weiqi Fu
- Division of Science and Math, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Sarah Daakour
- Center for Genomics and Systems Biology, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Bushra Dohai
- Division of Science and Math, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Amnah Alzahmi
- Center for Genomics and Systems Biology, New York University Abu Dhabi, Abu Dhabi, UAE
| | - David Nobles
- UTEX Culture Collection of Algae at the University of Texas at Austin, Austin, TX, USA
| | - Mark Hurd
- National Center for Marine Algae and Microbiota, East Boothbay, ME, USA
| | - Julie Sexton
- National Center for Marine Algae and Microbiota, East Boothbay, ME, USA
| | - Michael J Preston
- National Center for Marine Algae and Microbiota, East Boothbay, ME, USA
| | - Joan Blanchette
- National Center for Marine Algae and Microbiota, East Boothbay, ME, USA
| | - Michael W Lomas
- National Center for Marine Algae and Microbiota, East Boothbay, ME, USA
| | - Khaled M A Amiri
- Khalifa Center for Genetic Engineering and Biotechnology (KCGEB), UAE University, Al Ain, Abu Dhabi, UAE; Biology Department, College of Science, UAE University, Al Ain, Abu Dhabi, UAE
| | - Kourosh Salehi-Ashtiani
- Center for Genomics and Systems Biology, New York University Abu Dhabi, Abu Dhabi, UAE; Division of Science and Math, New York University Abu Dhabi, Abu Dhabi, UAE.
| |
Collapse
|
6
|
Dasari CM, Bhukya R. Explainable deep neural networks for novel viral genome prediction. APPL INTELL 2021; 52:3002-3017. [PMID: 34764607 PMCID: PMC8232563 DOI: 10.1007/s10489-021-02572-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/26/2021] [Indexed: 11/27/2022]
Abstract
Viral infection causes a wide variety of human diseases including cancer and COVID-19. Viruses invade host cells and associate with host molecules, potentially disrupting the normal function of hosts that leads to fatal diseases. Novel viral genome prediction is crucial for understanding the complex viral diseases like AIDS and Ebola. While most existing computational techniques classify viral genomes, the efficiency of the classification depends solely on the structural features extracted. The state-of-the-art DNN models achieved excellent performance by automatic extraction of classification features, but the degree of model explainability is relatively poor. During model training for viral prediction, proposed CNN, CNN-LSTM based methods (EdeepVPP, EdeepVPP-hybrid) automatically extracts features. EdeepVPP also performs model interpretability in order to extract the most important patterns that cause viral genomes through learned filters. It is an interpretable CNN model that extracts vital biologically relevant patterns (features) from feature maps of viral sequences. The EdeepVPP-hybrid predictor outperforms all the existing methods by achieving 0.992 mean AUC-ROC and 0.990 AUC-PR on 19 human metagenomic contig experiment datasets using 10-fold cross-validation. We evaluate the ability of CNN filters to detect patterns across high average activation values. To further asses the robustness of EdeepVPP model, we perform leave-one-experiment-out cross-validation. It can work as a recommendation system to further analyze the raw sequences labeled as ‘unknown’ by alignment-based methods. We show that our interpretable model can extract patterns that are considered to be the most important features for predicting virus sequences through learned filters.
Collapse
Affiliation(s)
| | - Raju Bhukya
- National Institute of Technology, Warangal, Telangana 506004 India
| |
Collapse
|
7
|
Sahmi-Bounsiar D, Rolland C, Aherfi S, Boudjemaa H, Levasseur A, La Scola B, Colson P. Marseilleviruses: An Update in 2021. Front Microbiol 2021; 12:648731. [PMID: 34149639 PMCID: PMC8208085 DOI: 10.3389/fmicb.2021.648731] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2021] [Accepted: 04/12/2021] [Indexed: 01/19/2023] Open
Abstract
The family Marseilleviridae was the second family of giant viruses that was described in 2013, after the family Mimiviridae. Marseillevirus marseillevirus, isolated in 2007 by coculture on Acanthamoeba polyphaga, is the prototype member of this family. Afterward, the worldwide distribution of marseilleviruses was revealed through their isolation from samples of various types and sources. Thus, 62 were isolated from environmental water, one from soil, one from a dipteran, one from mussels, and two from asymptomatic humans, which led to the description of 67 marseillevirus isolates, including 21 by the IHU Méditerranée Infection in France. Recently, five marseillevirus genomes were assembled from deep sea sediment in Norway. Isolated marseilleviruses have ≈250 nm long icosahedral capsids and 348–404 kilobase long mosaic genomes that encode 386–545 predicted proteins. Comparative genomic analyses indicate that the family Marseilleviridae includes five lineages and possesses a pangenome composed of 3,082 clusters of genes. The detection of marseilleviruses in both symptomatic and asymptomatic humans in stool, blood, and lymph nodes, and an up-to-30-day persistence of marseillevirus in rats and mice, raise questions concerning their possible clinical significance that are still under investigation.
Collapse
Affiliation(s)
- Dehia Sahmi-Bounsiar
- IHU Méditerranée Infection, Marseille, France.,Institut de Recherche pour le Développement (IRD), Assistance Publique- Hôpitaux de Marseille (AP-HM), MEPHI, Aix-Marseille Université, Marseille, France
| | - Clara Rolland
- IHU Méditerranée Infection, Marseille, France.,Institut de Recherche pour le Développement (IRD), Assistance Publique- Hôpitaux de Marseille (AP-HM), MEPHI, Aix-Marseille Université, Marseille, France
| | - Sarah Aherfi
- IHU Méditerranée Infection, Marseille, France.,Institut de Recherche pour le Développement (IRD), Assistance Publique- Hôpitaux de Marseille (AP-HM), MEPHI, Aix-Marseille Université, Marseille, France
| | - Hadjer Boudjemaa
- IHU Méditerranée Infection, Marseille, France.,Department of Biology, Faculty of Natural Science and Life, Hassiba Benbouali University of Chlef, Chlef, Algeria
| | - Anthony Levasseur
- IHU Méditerranée Infection, Marseille, France.,Institut de Recherche pour le Développement (IRD), Assistance Publique- Hôpitaux de Marseille (AP-HM), MEPHI, Aix-Marseille Université, Marseille, France
| | - Bernard La Scola
- IHU Méditerranée Infection, Marseille, France.,Institut de Recherche pour le Développement (IRD), Assistance Publique- Hôpitaux de Marseille (AP-HM), MEPHI, Aix-Marseille Université, Marseille, France
| | - Philippe Colson
- IHU Méditerranée Infection, Marseille, France.,Institut de Recherche pour le Développement (IRD), Assistance Publique- Hôpitaux de Marseille (AP-HM), MEPHI, Aix-Marseille Université, Marseille, France
| |
Collapse
|
8
|
Kutnjak D, Tamisier L, Adams I, Boonham N, Candresse T, Chiumenti M, De Jonghe K, Kreuze JF, Lefebvre M, Silva G, Malapi-Wight M, Margaria P, Mavrič Pleško I, McGreig S, Miozzi L, Remenant B, Reynard JS, Rollin J, Rott M, Schumpp O, Massart S, Haegeman A. A Primer on the Analysis of High-Throughput Sequencing Data for Detection of Plant Viruses. Microorganisms 2021; 9:841. [PMID: 33920047 PMCID: PMC8071028 DOI: 10.3390/microorganisms9040841] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 04/09/2021] [Accepted: 04/10/2021] [Indexed: 12/12/2022] Open
Abstract
High-throughput sequencing (HTS) technologies have become indispensable tools assisting plant virus diagnostics and research thanks to their ability to detect any plant virus in a sample without prior knowledge. As HTS technologies are heavily relying on bioinformatics analysis of the huge amount of generated sequences, it is of utmost importance that researchers can rely on efficient and reliable bioinformatic tools and can understand the principles, advantages, and disadvantages of the tools used. Here, we present a critical overview of the steps involved in HTS as employed for plant virus detection and virome characterization. We start from sample preparation and nucleic acid extraction as appropriate to the chosen HTS strategy, which is followed by basic data analysis requirements, an extensive overview of the in-depth data processing options, and taxonomic classification of viral sequences detected. By presenting the bioinformatic tools and a detailed overview of the consecutive steps that can be used to implement a well-structured HTS data analysis in an easy and accessible way, this paper is targeted at both beginners and expert scientists engaging in HTS plant virome projects.
Collapse
Affiliation(s)
- Denis Kutnjak
- Department of Biotechnology and Systems Biology, National Institute of Biology, Večna pot 111, 1000 Ljubljana, Slovenia
| | - Lucie Tamisier
- Plant Pathology Laboratory, Université de Liège, Gembloux Agro-Bio Tech, TERRA, Passage des Déportés, 2, 5030 Gembloux, Belgium; (L.T.); (J.R.); (S.M.)
| | - Ian Adams
- Fera Science Limited, York YO41 1LZ, UK; (I.A.); (S.M.)
| | - Neil Boonham
- Institute for Agri-Food Research and Innovation, Newcastle University, King’s Rd, Newcastle Upon Tyne NE1 7RU, UK;
| | - Thierry Candresse
- UMR 1332 Biologie du Fruit et Pathologie, INRA, University of Bordeaux, 33140 Villenave d’Ornon, France; (T.C.); (M.L.)
| | - Michela Chiumenti
- Institute for Sustainable Plant Protection, National Research Council, Via Amendola, 122/D, 70126 Bari, Italy;
| | - Kris De Jonghe
- Plant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food, Burg. Van Gansberghelaan 96, 9820 Merelbeke, Belgium; (K.D.J.); (A.H.)
| | - Jan F. Kreuze
- International Potato Center (CIP), Avenida la Molina 1895, La Molina, Lima 15023, Peru;
| | - Marie Lefebvre
- UMR 1332 Biologie du Fruit et Pathologie, INRA, University of Bordeaux, 33140 Villenave d’Ornon, France; (T.C.); (M.L.)
| | - Gonçalo Silva
- Natural Resources Institute, University of Greenwich, Central Avenue, Chatham Maritime, Kent ME4 4TB, UK;
| | - Martha Malapi-Wight
- Biotechnology Risk Analysis Programs, Biotechnology Regulatory Services, Animal and Plant Health Inspection Service, U.S. Department of Agriculture, Riverdale, MD 20737, USA;
| | - Paolo Margaria
- Leibniz Institute-DSMZ, Inhoffenstrasse 7b, 38124 Braunschweig, Germany;
| | - Irena Mavrič Pleško
- Agricultural Institute of Slovenia, Hacquetova Ulica 17, 1000 Ljubljana, Slovenia;
| | - Sam McGreig
- Fera Science Limited, York YO41 1LZ, UK; (I.A.); (S.M.)
| | - Laura Miozzi
- Institute for Sustainable Plant Protection, National Research Council of Italy (IPSP-CNR), Strada delle Cacce 73, 10135 Torino, Italy;
| | - Benoit Remenant
- ANSES Plant Health Laboratory, 7 Rue Jean Dixméras, CEDEX 01, 49044 Angers, France;
| | | | - Johan Rollin
- Plant Pathology Laboratory, Université de Liège, Gembloux Agro-Bio Tech, TERRA, Passage des Déportés, 2, 5030 Gembloux, Belgium; (L.T.); (J.R.); (S.M.)
- DNAVision, 6041 Charleroi, Belgium
| | - Mike Rott
- Sidney Laboratory, Canadian Food Inspection Agency, 8801 East Saanich Rd, North Saanich, BC V8L 1H3, Canada;
| | - Olivier Schumpp
- Agroscope, Route de Duillier 50, 1260 Nyon, Switzerland; (J.-S.R.); (O.S.)
| | - Sébastien Massart
- Plant Pathology Laboratory, Université de Liège, Gembloux Agro-Bio Tech, TERRA, Passage des Déportés, 2, 5030 Gembloux, Belgium; (L.T.); (J.R.); (S.M.)
| | - Annelies Haegeman
- Plant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food, Burg. Van Gansberghelaan 96, 9820 Merelbeke, Belgium; (K.D.J.); (A.H.)
| |
Collapse
|
9
|
Rational Design of Profile Hidden Markov Models for Viral Classification and Discovery. Bioinformatics 2021. [DOI: 10.36255/exonpublications.bioinformatics.2021.ch9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] Open
|
10
|
Liu W, Mao Y, Ci L, Zhang F. A new approach of user-level intrusion detection with command sequence-to-sequence model. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2020. [DOI: 10.3233/jifs-179659] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Wei Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Yu Mao
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Linlin Ci
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Fuquan Zhang
- Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, Minjiang University, Fuzhou, China
| |
Collapse
|
11
|
Macera L, Spezia PG, Focosi D, Mazzetti P, Antonelli G, Pistello M, Maggi F. Lack of Marseillevirus DNA in immunocompetent and immunocompromised Italian patients. J Med Virol 2019; 92:187-190. [PMID: 31498443 DOI: 10.1002/jmv.25592] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Accepted: 09/04/2019] [Indexed: 01/21/2023]
Abstract
Marseilleviridae is a family of viruses which have only been propagated in acanthamoeba. Marseillevirus sequences have been recently detected in different human matrices by viral metagenomics. Single-center studies worldwide have estimated a low prevalence of marseillevirus both in symptomatic patients and in healthy donors but, to date, no informations are available on the prevalence of this giant virus in Italy. By a polymerase chain reaction targeting the ORF152 viral sequence, we tested sera from 197 immunosuppressed patients and 285 healthy donors, and 63 and 30 respiratory and cerebrospinal fluid samples, respectively, of patients with various clinical conditions and referring the Virology Division for diagnostic purposes. We observed no evidence of Marseillevirus DNA in all 575 samples tested. Marseillevirus probably does not cause infection in human.
Collapse
Affiliation(s)
- Lisa Macera
- Retrovirus Center and Virology Section, Department of Translational Research, University of Pisa, Pisa, Italy.,Virology Division, Pisa University Hospital, Pisa, Italy
| | - Pietro Giorgio Spezia
- Retrovirus Center and Virology Section, Department of Translational Research, University of Pisa, Pisa, Italy
| | - Daniele Focosi
- North-Western Tuscany Blood Bank, Pisa University Hospital, Pisa, Italy
| | - Paola Mazzetti
- Retrovirus Center and Virology Section, Department of Translational Research, University of Pisa, Pisa, Italy.,Virology Division, Pisa University Hospital, Pisa, Italy
| | - Guido Antonelli
- Department of Molecular Medicine, Laboratory of Virology and Pasteur Institute-Cenci Bolognetti Foundation, Sapienza University of Rome, Rome, Italy
| | - Mauro Pistello
- Retrovirus Center and Virology Section, Department of Translational Research, University of Pisa, Pisa, Italy.,Virology Division, Pisa University Hospital, Pisa, Italy
| | - Fabrizio Maggi
- Retrovirus Center and Virology Section, Department of Translational Research, University of Pisa, Pisa, Italy.,Virology Division, Pisa University Hospital, Pisa, Italy
| |
Collapse
|
12
|
Tampuu A, Bzhalava Z, Dillner J, Vicente R. ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples. PLoS One 2019; 14:e0222271. [PMID: 31509583 PMCID: PMC6738585 DOI: 10.1371/journal.pone.0222271] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Accepted: 08/22/2019] [Indexed: 11/23/2022] Open
Abstract
Despite its clinical importance, detection of highly divergent or yet unknown viruses is a major challenge. When human samples are sequenced, conventional alignments classify many assembled contigs as "unknown" since many of the sequences are not similar to known genomes. In this work, we developed ViraMiner, a deep learning-based method to identify viruses in various human biospecimens. ViraMiner contains two branches of Convolutional Neural Networks designed to detect both patterns and pattern-frequencies on raw metagenomics contigs. The training dataset included sequences obtained from 19 metagenomic experiments which were analyzed and labeled by BLAST. The model achieves significantly improved accuracy compared to other machine learning methods for viral genome classification. Using 300 bp contigs ViraMiner achieves 0.923 area under the ROC curve. To our knowledge, this is the first machine learning methodology that can detect the presence of viral sequences among raw metagenomic contigs from diverse human samples. We suggest that the proposed model captures different types of information of genome composition, and can be used as a recommendation system to further investigate sequences labeled as "unknown" by conventional alignment methods. Exploring these highly-divergent viruses, in turn, can enhance our knowledge of infectious causes of diseases.
Collapse
Affiliation(s)
- Ardi Tampuu
- Computational Neuroscience Lab, Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Zurab Bzhalava
- Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Joakim Dillner
- Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden
- Karolinska University Laboratory, Karolinska University Hospital, Stockholm, Sweden
| | - Raul Vicente
- Computational Neuroscience Lab, Institute of Computer Science, University of Tartu, Tartu, Estonia
| |
Collapse
|
13
|
Rolland C, Andreani J, Louazani AC, Aherfi S, Francis R, Rodrigues R, Silva LS, Sahmi D, Mougari S, Chelkha N, Bekliz M, Silva L, Assis F, Dornas F, Khalil JYB, Pagnier I, Desnues C, Levasseur A, Colson P, Abrahão J, La Scola B. Discovery and Further Studies on Giant Viruses at the IHU Mediterranee Infection That Modified the Perception of the Virosphere. Viruses 2019; 11:E312. [PMID: 30935049 PMCID: PMC6520786 DOI: 10.3390/v11040312] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 03/25/2019] [Accepted: 03/27/2019] [Indexed: 12/17/2022] Open
Abstract
The history of giant viruses began in 2003 with the identification of Acanthamoeba polyphaga mimivirus. Since then, giant viruses of amoeba enlightened an unknown part of the viral world, and every discovery and characterization of a new giant virus modifies our perception of the virosphere. This notably includes their exceptional virion sizes from 200 nm to 2 µm and their genomic complexity with length, number of genes, and functions such as translational components never seen before. Even more surprising, Mimivirus possesses a unique mobilome composed of virophages, transpovirons, and a defense system against virophages named Mimivirus virophage resistance element (MIMIVIRE). From the discovery and isolation of new giant viruses to their possible roles in humans, this review shows the active contribution of the University Hospital Institute (IHU) Mediterranee Infection to the growing knowledge of the giant viruses' field.
Collapse
Affiliation(s)
- Clara Rolland
- MEPHI, APHM, IRD 198, Aix Marseille Univ, Department of Medicine, IHU-Méditerranée Infection, 13005 Marseille, France.
| | - Julien Andreani
- MEPHI, APHM, IRD 198, Aix Marseille Univ, Department of Medicine, IHU-Méditerranée Infection, 13005 Marseille, France.
| | - Amina Cherif Louazani
- MEPHI, APHM, IRD 198, Aix Marseille Univ, Department of Medicine, IHU-Méditerranée Infection, 13005 Marseille, France.
| | - Sarah Aherfi
- MEPHI, APHM, IRD 198, Aix Marseille Univ, Department of Medicine, IHU-Méditerranée Infection, 13005 Marseille, France.
- IHU IHU-Méditerranée Infection, 13005 Marseille, France.
| | - Rania Francis
- MEPHI, APHM, IRD 198, Aix Marseille Univ, Department of Medicine, IHU-Méditerranée Infection, 13005 Marseille, France.
| | - Rodrigo Rodrigues
- MEPHI, APHM, IRD 198, Aix Marseille Univ, Department of Medicine, IHU-Méditerranée Infection, 13005 Marseille, France.
- Laboratório de Vírus, Instituto de Ciêncas Biológicas, Departamento de Microbiologia, Universidade Federal de Minas Gerais, 31270-901 Belo Horizonte, Brazil.
| | - Ludmila Santos Silva
- MEPHI, APHM, IRD 198, Aix Marseille Univ, Department of Medicine, IHU-Méditerranée Infection, 13005 Marseille, France.
| | - Dehia Sahmi
- MEPHI, APHM, IRD 198, Aix Marseille Univ, Department of Medicine, IHU-Méditerranée Infection, 13005 Marseille, France.
| | - Said Mougari
- MEPHI, APHM, IRD 198, Aix Marseille Univ, Department of Medicine, IHU-Méditerranée Infection, 13005 Marseille, France.
| | - Nisrine Chelkha
- MEPHI, APHM, IRD 198, Aix Marseille Univ, Department of Medicine, IHU-Méditerranée Infection, 13005 Marseille, France.
| | - Meriem Bekliz
- MEPHI, APHM, IRD 198, Aix Marseille Univ, Department of Medicine, IHU-Méditerranée Infection, 13005 Marseille, France.
| | - Lorena Silva
- MEPHI, APHM, IRD 198, Aix Marseille Univ, Department of Medicine, IHU-Méditerranée Infection, 13005 Marseille, France.
- Laboratório de Vírus, Instituto de Ciêncas Biológicas, Departamento de Microbiologia, Universidade Federal de Minas Gerais, 31270-901 Belo Horizonte, Brazil.
| | - Felipe Assis
- MEPHI, APHM, IRD 198, Aix Marseille Univ, Department of Medicine, IHU-Méditerranée Infection, 13005 Marseille, France.
| | - Fábio Dornas
- MEPHI, APHM, IRD 198, Aix Marseille Univ, Department of Medicine, IHU-Méditerranée Infection, 13005 Marseille, France.
| | | | - Isabelle Pagnier
- MEPHI, APHM, IRD 198, Aix Marseille Univ, Department of Medicine, IHU-Méditerranée Infection, 13005 Marseille, France.
- IHU IHU-Méditerranée Infection, 13005 Marseille, France.
| | - Christelle Desnues
- MEPHI, APHM, IRD 198, Aix Marseille Univ, Department of Medicine, IHU-Méditerranée Infection, 13005 Marseille, France.
| | - Anthony Levasseur
- MEPHI, APHM, IRD 198, Aix Marseille Univ, Department of Medicine, IHU-Méditerranée Infection, 13005 Marseille, France.
- IHU IHU-Méditerranée Infection, 13005 Marseille, France.
| | - Philippe Colson
- MEPHI, APHM, IRD 198, Aix Marseille Univ, Department of Medicine, IHU-Méditerranée Infection, 13005 Marseille, France.
- IHU IHU-Méditerranée Infection, 13005 Marseille, France.
| | - Jônatas Abrahão
- MEPHI, APHM, IRD 198, Aix Marseille Univ, Department of Medicine, IHU-Méditerranée Infection, 13005 Marseille, France.
- Laboratório de Vírus, Instituto de Ciêncas Biológicas, Departamento de Microbiologia, Universidade Federal de Minas Gerais, 31270-901 Belo Horizonte, Brazil.
| | - Bernard La Scola
- MEPHI, APHM, IRD 198, Aix Marseille Univ, Department of Medicine, IHU-Méditerranée Infection, 13005 Marseille, France.
- IHU IHU-Méditerranée Infection, 13005 Marseille, France.
| |
Collapse
|
14
|
Machine Learning for detection of viral sequences in human metagenomic datasets. BMC Bioinformatics 2018; 19:336. [PMID: 30249176 PMCID: PMC6154907 DOI: 10.1186/s12859-018-2340-x] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 08/28/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Detection of highly divergent or yet unknown viruses from metagenomics sequencing datasets is a major bioinformatics challenge. When human samples are sequenced, a large proportion of assembled contigs are classified as "unknown", as conventional methods find no similarity to known sequences. We wished to explore whether machine learning algorithms using Relative Synonymous Codon Usage frequency (RSCU) could improve the detection of viral sequences in metagenomic sequencing data. RESULTS We trained Random Forest and Artificial Neural Network using metagenomic sequences taxonomically classified into virus and non-virus classes. The algorithms achieved accuracies well beyond chance level, with area under ROC curve 0.79. Two codons (TCG and CGC) were found to have a particularly strong discriminative capacity. CONCLUSION RSCU-based machine learning techniques applied to metagenomic sequencing data can help identify a large number of putative viral sequences and provide an addition to conventional methods for taxonomic classification.
Collapse
|
15
|
Obbard DJ. Expansion of the metazoan virosphere: progress, pitfalls, and prospects. Curr Opin Virol 2018; 31:17-23. [PMID: 30237139 DOI: 10.1016/j.coviro.2018.08.008] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Revised: 07/15/2018] [Accepted: 08/17/2018] [Indexed: 12/22/2022]
Abstract
Metagenomic sequencing has led to a recent and rapid expansion of the animal virome. It has uncovered a multitude of new virus lineages from under-sampled host groups, including many that break up long branches in the virus tree, and many that display unexpected genome sizes and structures. Although there are challenges to inferring the existence of a virus from a `virus-like sequence', in the absence of an isolate the analysis of nucleic acid (including small RNAs) and sequence data can provide considerable confidence. As a consequence, this period of molecular natural history is helping to reshape our views of deep virus evolution.
Collapse
Affiliation(s)
- Darren J Obbard
- Institute of Evolutionary Biology, and Centre for Immunity, Infection and Evolution, The University of Edinburgh, Charlotte Auerbach Road, Edinburgh EH9 3FL, United Kingdom.
| |
Collapse
|
16
|
Hultin E, Mühr LSA, Bzhalava Z, Hortlund M, Lagheden C, Sundström P, Dillner J. Viremia preceding multiple sclerosis: Two nested case-control studies. Virology 2018; 520:21-29. [PMID: 29772404 DOI: 10.1016/j.virol.2018.04.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Revised: 03/26/2018] [Accepted: 04/10/2018] [Indexed: 11/28/2022]
Abstract
Infections have been suggested to be involved in Multiple Sclerosis (MS). We used metagenomic sequencing to detect both known and yet unknown microorganisms in 2 nested case control studies of MS. Two different cohorts were followed for MS using registry linkages. Serum samples taken before diagnosis as well as samples from matched control subjects were selected. In cohort1 with 75 cases and 75 controls, most viral reads were Anelloviridae-related and >95% detected among the cases. Among samples taken up to 2 years before MS diagnosis, Anellovirus species TTMV1, TTMV6 and TTV27 were significantly more common among cases. In cohort2, 93 cases and 93 controls were tested under the pre-specified hypothesis that the same association would be found. Although most viral reads were again related to Anelloviridae, no significant case-control differences were seen. We conclude that the Anelloviridae-MS association may be due to multiple hypothesis testing, but other explanations are possible.
Collapse
Affiliation(s)
- Emilie Hultin
- Department of Laboratory Medicine, Karolinska Institutet, Huddinge SE-141 86, Sweden
| | | | - Zurab Bzhalava
- Department of Laboratory Medicine, Karolinska Institutet, Huddinge SE-141 86, Sweden
| | - Maria Hortlund
- Department of Laboratory Medicine, Karolinska Institutet, Huddinge SE-141 86, Sweden
| | - Camilla Lagheden
- Department of Laboratory Medicine, Karolinska Institutet, Huddinge SE-141 86, Sweden
| | - Peter Sundström
- Department of Pharmacology and Clinical Neuroscience, Umeå University, Umeå SE-901 87, Sweden
| | - Joakim Dillner
- Department of Laboratory Medicine, Karolinska Institutet, Huddinge SE-141 86, Sweden.
| |
Collapse
|