1
|
Alberts F, Berke O, Maboni G, Petukhova T, Poljak Z. Utilizing machine learning and hemagglutinin sequences to identify likely hosts of influenza H3Nx viruses. Prev Vet Med 2024; 233:106351. [PMID: 39353303 DOI: 10.1016/j.prevetmed.2024.106351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 08/16/2024] [Accepted: 09/25/2024] [Indexed: 10/04/2024]
Abstract
Influenza is a disease that represents both a public health and agricultural risk with pandemic potential. Among the subtypes of influenza A virus, H3 influenza virus can infect many avian and mammalian species and is therefore a virus of interest to human and veterinary public health. The primary goal of this study was to train and validate classifiers for the identification of the most likely host species using the hemagglutinin gene segment of H3 viruses. A five-step process was implemented, which included training four machine learning classifiers, testing the classifiers on the validation dataset, and further exploration of the best-performing model on three additional datasets. The gradient boosting machine classifier showed the highest host-classification accuracy with a 98.0 % (95 % CI [97.01, 98.73]) correct classification rate on an independent validation dataset. The classifications were further analyzed using the predicted probability score which highlighted sequences of particular interest. These sequences were both correctly and incorrectly classified sequences that showed considerable predicted probability for multiple hosts. This showed the potential of using these classifiers for rapid sequence classification and highlighting sequences of interest. Additionally, the classifiers were tested on a separate swine dataset composed of H3N2 sequences from 1998 to 2003 from the United States of America, and a separate canine dataset composed of canine H3N2 sequences of avian origin. These two datasets were utilized to look at the applications of predicted probability and host convergence over time. Lastly, the classifiers were used on an independent dataset of environmental sequences to explore the host identification of environmental sequences. The results of these classifiers show the potential for machine learning to be used as a host identification technique for viruses of unknown origin on a species-specific level.
Collapse
Affiliation(s)
- Famke Alberts
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, 50 Stone Road East, Guelph, Ontario, Canada.
| | - Olaf Berke
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, 50 Stone Road East, Guelph, Ontario, Canada; Centre for Public Health and Zoonoses, Ontario Veterinary College, University of Guelph, 50 Stone Road East, Guelph, Ontario, Canada; Centre for Advancing Responsible and Ethical Artificial Intelligence, University of Guelph, 50 Stone Road East, Guelph, Ontario, Canada.
| | - Grazieli Maboni
- Athens Veterinary Diagnostic Laboratory, Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, 501 D.W.Brooks Drive Athens, GA, USA.
| | - Tatiana Petukhova
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, 50 Stone Road East, Guelph, Ontario, Canada.
| | - Zvonimir Poljak
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, 50 Stone Road East, Guelph, Ontario, Canada; Centre for Public Health and Zoonoses, Ontario Veterinary College, University of Guelph, 50 Stone Road East, Guelph, Ontario, Canada.
| |
Collapse
|
2
|
Costa P, Pereira C, Romalde JL, Almeida A. A game of resistance: War between bacteria and phages and how phage cocktails can be the solution. Virology 2024; 599:110209. [PMID: 39186863 DOI: 10.1016/j.virol.2024.110209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 08/12/2024] [Accepted: 08/14/2024] [Indexed: 08/28/2024]
Abstract
While phages hold promise as an antibiotic alternative, they encounter significant challenges in combating bacterial infections, primarily due to the emergence of phage-resistant bacteria. Bacterial defence mechanisms like superinfection exclusion, CRISPR, and restriction-modification systems can hinder phage effectiveness. Innovative strategies, such as combining different phages into cocktails, have been explored to address these challenges. This review delves into these defence mechanisms and their impact at each stage of the infection cycle, their challenges, and the strategies phages have developed to counteract them. Additionally, we examine the role of phage cocktails in the evolving landscape of antibacterial treatments and discuss recent studies that highlight the effectiveness of diverse phage cocktails in targeting essential bacterial receptors and combating resistant strains.
Collapse
Affiliation(s)
- Pedro Costa
- CESAM, Department of Biology, University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal.
| | - Carla Pereira
- CESAM, Department of Biology, University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal.
| | - Jesús L Romalde
- Department of Microbiology and Parasitology, CRETUS & CIBUS - Faculty of Biology, University of Santiago de Compostela, CP 15782 Santiago de Compostela, Spain.
| | - Adelaide Almeida
- CESAM, Department of Biology, University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal.
| |
Collapse
|
3
|
Alberts F, Berke O, Rocha L, Keay S, Maboni G, Poljak Z. Predicting host species susceptibility to influenza viruses and coronaviruses using genome data and machine learning: a scoping review. Front Vet Sci 2024; 11:1358028. [PMID: 39386249 PMCID: PMC11462629 DOI: 10.3389/fvets.2024.1358028] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 08/28/2024] [Indexed: 10/12/2024] Open
Abstract
Introduction Predicting which species are susceptible to viruses (i.e., host range) is important for understanding and developing effective strategies to control viral outbreaks in both humans and animals. The use of machine learning and bioinformatic approaches to predict viral hosts has been expanded with advancements in in-silico techniques. We conducted a scoping review to identify the breadth of machine learning methods applied to influenza and coronavirus genome data for the identification of susceptible host species. Methods The protocol for this scoping review is available at https://hdl.handle.net/10214/26112. Five online databases were searched, and 1,217 citations, published between January 2000 and May 2022, were obtained, and screened in duplicate for English language and in-silico research, covering the use of machine learning to identify susceptible species to viruses. Results Fifty-three relevant publications were identified for data charting. The breadth of research was extensive including 32 different machine learning algorithms used in combination with 29 different feature selection methods and 43 different genome data input formats. There were 20 different methods used by authors to assess accuracy. Authors mostly used influenza viruses (n = 31/53 publications, 58.5%), however, more recent publications focused on coronaviruses and other viruses in combination with influenza viruses (n = 22/53, 41.5%). The susceptible animal groups authors most used were humans (n = 57/77 analyses, 74.0%), avian (n = 35/77 45.4%), and swine (n = 28/77, 36.4%). In total, 53 different hosts were used and, in most publications, data from multiple hosts was used. Discussion The main gaps in research were a lack of standardized reporting of methodology and the use of broad host categories for classification. Overall, approaches to viral host identification using machine learning were diverse and extensive.
Collapse
Affiliation(s)
- Famke Alberts
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, ON, Canada
| | - Olaf Berke
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, ON, Canada
- Centre for Public Health and Zoonoses, University of Guelph, Guelph, ON, Canada
- Centre for Advancing Responsible and Ethical Artificial Intelligence, University of Guelph, Guelph, ON, Canada
| | - Leilani Rocha
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, ON, Canada
| | - Sheila Keay
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, ON, Canada
| | - Grazieli Maboni
- Athens Veterinary Diagnostic Laboratory, Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, Athens, GA, United States
| | - Zvonimir Poljak
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, ON, Canada
- Centre for Public Health and Zoonoses, University of Guelph, Guelph, ON, Canada
| |
Collapse
|
4
|
Keith M, Park de la Torriente A, Chalka A, Vallejo-Trujillo A, McAteer SP, Paterson GK, Low AS, Gally DL. Predictive phage therapy for Escherichia coli urinary tract infections: Cocktail selection for therapy based on machine learning models. Proc Natl Acad Sci U S A 2024; 121:e2313574121. [PMID: 38478693 PMCID: PMC10962980 DOI: 10.1073/pnas.2313574121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 02/04/2024] [Indexed: 03/27/2024] Open
Abstract
This study supports the development of predictive bacteriophage (phage) therapy: the concept of phage cocktail selection to treat a bacterial infection based on machine learning (ML) models. For this purpose, ML models were trained on thousands of measured interactions between a panel of phage and sequenced bacterial isolates. The concept was applied to Escherichia coli associated with urinary tract infections. This is an important common infection in humans and companion animals from which multidrug-resistant (MDR) bloodstream infections can originate. The global threat of MDR infection has reinvigorated international efforts into alternatives to antibiotics including phage therapy. E. coli exhibit extensive genome-level variation due to horizontal gene transfer via phage and plasmids. Associated with this, phage selection for E. coli is difficult as individual isolates can exhibit considerable variation in phage susceptibility due to differences in factors important to phage infection including phage receptor profiles and resistance mechanisms. The activity of 31 phage was measured on 314 isolates with growth curves in artificial urine. Random Forest models were built for each phage from bacterial genome features, and the more generalist phage, acting on over 20% of the bacterial population, exhibited F1 scores of >0.6 and could be used to predict phage cocktails effective against previously untested strains. The study demonstrates the potential of predictive ML models which integrate bacterial genomics with phage activity datasets allowing their use on data derived from direct sequencing of clinical samples to inform rapid and effective phage therapy.
Collapse
Affiliation(s)
- Marianne Keith
- The Roslin Institute, Division of Bacteriology, University of Edinburgh, EdinburghEH25 9RG, United Kingdom
| | - Alba Park de la Torriente
- The Roslin Institute, Division of Bacteriology, University of Edinburgh, EdinburghEH25 9RG, United Kingdom
| | - Antonia Chalka
- The Roslin Institute, Division of Bacteriology, University of Edinburgh, EdinburghEH25 9RG, United Kingdom
| | - Adriana Vallejo-Trujillo
- The Roslin Institute, Division of Bacteriology, University of Edinburgh, EdinburghEH25 9RG, United Kingdom
| | - Sean P. McAteer
- The Roslin Institute, Division of Bacteriology, University of Edinburgh, EdinburghEH25 9RG, United Kingdom
| | - Gavin K. Paterson
- The Roslin Institute, Division of Bacteriology, University of Edinburgh, EdinburghEH25 9RG, United Kingdom
- Royal (Dick) School of Veterinary Studies, Easter Bush Pathology, University of Edinburgh, EdinburghEH25 9RG, United Kingdom
| | - Alison S. Low
- The Roslin Institute, Division of Bacteriology, University of Edinburgh, EdinburghEH25 9RG, United Kingdom
| | - David L. Gally
- The Roslin Institute, Division of Bacteriology, University of Edinburgh, EdinburghEH25 9RG, United Kingdom
| |
Collapse
|
5
|
Borkenhagen LK, Allen MW, Runstadler JA. Influenza virus genotype to phenotype predictions through machine learning: a systematic review. Emerg Microbes Infect 2021; 10:1896-1907. [PMID: 34498543 PMCID: PMC8462836 DOI: 10.1080/22221751.2021.1978824] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background: There is great interest in understanding the viral genomic predictors of phenotypic traits that allow influenza A viruses to adapt to or become more virulent in different hosts. Machine learning techniques have demonstrated promise in addressing this critical need for other pathogens because the underlying algorithms are especially well equipped to uncover complex patterns in large datasets and produce generalizable predictions for new data. As the body of research where these techniques are applied for influenza A virus phenotype prediction continues to grow, it is useful to consider the strengths and weaknesses of these approaches to understand what has prevented these models from seeing widespread use by surveillance laboratories and to identify gaps that are underexplored with this technology. Methods and Results: We present a systematic review of English literature published through 15 April 2021 of studies employing machine learning methods to generate predictions of influenza A virus phenotypes from genomic or proteomic input. Forty-nine studies were included in this review, spanning the topics of host discrimination, human adaptability, subtype and clade assignment, pandemic lineage assignment, characteristics of infection, and antiviral drug resistance. Conclusions: Our findings suggest that biases in model design and a dearth of wet laboratory follow-up may explain why these models often go underused. We, therefore, offer guidance to overcome these limitations, aid in improving predictive models of previously studied influenza A virus phenotypes, and extend those models to unexplored phenotypes in the ultimate pursuit of tools to enable the characterization of virus isolates across surveillance laboratories.
Collapse
Affiliation(s)
- Laura K Borkenhagen
- Department of Infectious Disease and Global Health, Cummings School of Veterinary Medicine, Tufts University, North Grafton, MA, USA
| | - Martin W Allen
- Department of Computer Science, School of Engineering, Tufts University, Medford, MA, USA
| | - Jonathan A Runstadler
- Department of Infectious Disease and Global Health, Cummings School of Veterinary Medicine, Tufts University, North Grafton, MA, USA
| |
Collapse
|
6
|
Nami Y, Imeni N, Panahi B. Application of machine learning in bacteriophage research. BMC Microbiol 2021; 21:193. [PMID: 34174831 PMCID: PMC8235560 DOI: 10.1186/s12866-021-02256-5] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Accepted: 06/08/2021] [Indexed: 12/20/2022] Open
Abstract
Phages are one of the key components in the structure, dynamics, and interactions of microbial communities in different bins. It has a clear impact on human health and the food industry. Bacteriophage characterization using in vitro approaches are time/cost consuming and laborious tasks. On the other hand, with the advent of new high-throughput sequencing technology, the development of a powerful computational framework to characterize the newly identified bacteriophages is inevitable for future research. Machine learning includes powerful techniques that enable the analysis of complex datasets for knowledge discovery and pattern recognition. In this study, we have conducted a comprehensive review of machine learning methods application using different types of features were applied in various aspects of bacteriophage research including, automated curation, identification, classification, host species recognition, virion protein identification, and life cycle prediction. Moreover, potential limitations and advantages of the developed frameworks were discussed.
Collapse
Affiliation(s)
- Yousef Nami
- Department of Food Biotechnology, Branch for Northwest & West Region, Agricultural Biotechnology Research Institute of Iran, Agricultural Research, Education and Extension Organization (AREEO), Tabriz, Iran
| | - Nazila Imeni
- Young Researchers and Elite Clube, Marand Branch, Islamic Azad University, Marand, Iran
| | - Bahman Panahi
- Department of Genomics, Branch for Northwest & West Region, Agricultural Biotechnology Research Institute of Iran, Agricultural Research, Education and Extension Organization (AREEO), Tabriz, Iran.
| |
Collapse
|
7
|
Olwenyi OA, Dyavar SR, Acharya A, Podany AT, Fletcher CV, Ng CL, Reid SP, Byrareddy SN. Immuno-epidemiology and pathophysiology of coronavirus disease 2019 (COVID-19). J Mol Med (Berl) 2020; 98:1369-1383. [PMID: 32808094 PMCID: PMC7431311 DOI: 10.1007/s00109-020-01961-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 08/01/2020] [Accepted: 08/06/2020] [Indexed: 02/07/2023]
Abstract
Occasional zoonotic viral attacks on immunologically naive populations result in massive death tolls that are capable of threatening human survival. Currently, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the infectious agent that causes coronavirus disease (COVID-19), has spread from its epicenter in Wuhan China to all parts of the globe. Real-time mapping of new infections across the globe has revealed that variable transmission patterns and pathogenicity are associated with differences in SARS-CoV-2 lineages, clades, and strains. Thus, we reviewed how changes in the SARS-CoV-2 genome and its structural architecture affect viral replication, immune evasion, and transmission within different human populations. We also looked at which immune dominant regions of SARS-CoV-2 and other coronaviruses are recognized by Major Histocompatibility Complex (MHC)/Human Leukocyte Antigens (HLA) genes and how this could impact on subsequent disease pathogenesis. Efforts were also placed on understanding immunological changes that occur when exposed individuals either remain asymptomatic or fail to control the virus and later develop systemic complications. Published autopsy studies that reveal alterations in the lung immune microenvironment, morphological, and pathological changes are also explored within the context of the review. Understanding the true correlates of protection and determining how constant virus evolution impacts on host-pathogen interactions could help identify which populations are at high risk and later inform future vaccine and therapeutic interventions.
Collapse
Affiliation(s)
- Omalla A Olwenyi
- Department of Pharmacology and Experimental Neuroscience, University of Nebraska Medical Center, Omaha, NE, USA
- Department of Pathology and Microbiology, University of Nebraska Medical Center, Omaha, NE, USA
| | - Shetty Ravi Dyavar
- Antiviral Pharmacology Laboratory, Center for Drug Discovery, University of Nebraska Medical Center (UNMC), Omaha, NE, USA
| | - Arpan Acharya
- Department of Pharmacology and Experimental Neuroscience, University of Nebraska Medical Center, Omaha, NE, USA
| | - Anthony T Podany
- Antiviral Pharmacology Laboratory, Center for Drug Discovery, University of Nebraska Medical Center (UNMC), Omaha, NE, USA
| | - Courtney V Fletcher
- Antiviral Pharmacology Laboratory, Center for Drug Discovery, University of Nebraska Medical Center (UNMC), Omaha, NE, USA
| | - Caroline L Ng
- Department of Pathology and Microbiology, University of Nebraska Medical Center, Omaha, NE, USA
| | - St Patrick Reid
- Department of Pathology and Microbiology, University of Nebraska Medical Center, Omaha, NE, USA
| | - Siddappa N Byrareddy
- Department of Pharmacology and Experimental Neuroscience, University of Nebraska Medical Center, Omaha, NE, USA.
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, USA.
- Department of Biochemistry and Molecular Biology, University of Nebraska Medical Center, Omaha, NE, USA.
| |
Collapse
|
8
|
Lupolova N, Lycett SJ, Gally DL. A guide to machine learning for bacterial host attribution using genome sequence data. Microb Genom 2020; 5. [PMID: 31778355 PMCID: PMC6939162 DOI: 10.1099/mgen.0.000317] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
With the ever-expanding number of available sequences from bacterial genomes, and the expectation that this data type will be the primary one generated from both diagnostic and research laboratories for the foreseeable future, then there is both an opportunity and a need to evaluate how effectively computational approaches can be used within bacterial genomics to predict and understand complex phenotypes, such as pathogenic potential and host source. This article applied various quantitative methods such as diversity indexes, pangenome-wide association studies (GWAS) and dimensionality reduction techniques to better understand the data and then compared how well unsupervised and supervised machine learning (ML) methods could predict the source host of the isolates. The study uses the example of the pangenomes of 1203 Salmonella enterica serovar Typhimurium isolates in order to predict 'host of isolation' using these different methods. The article is aimed as a review of recent applications of ML in infection biology, but also, by working through this specific dataset, it allows discussion of the advantages and drawbacks of the different techniques. As with all such sub-population studies, the biological relevance will be dependent on the quality and diversity of the input data. Given this major caveat, we show that supervised ML has the potential to add real value to interpretation of bacterial genomic data, as it can provide probabilistic outcomes for important phenotypes, something that is very difficult to achieve with the other methods.
Collapse
Affiliation(s)
- Nadejda Lupolova
- Division of Infection and Immunity, The Roslin Institute, University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG, UK
| | - Samantha J Lycett
- Division of Infection and Immunity, The Roslin Institute, University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG, UK
| | - David L Gally
- Division of Infection and Immunity, The Roslin Institute, University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG, UK
| |
Collapse
|
9
|
Al-Maitah M. Analyzing genetic diseases using multimedia processing techniques associative decision tree-based learning and Hopfield dynamic neural networks from medical images. Neural Comput Appl 2020. [DOI: 10.1007/s00521-018-04004-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
10
|
Missaoui J, Saidane D, Mzoughi R, Minervini F. Fermented Seeds ("Zgougou") from Aleppo Pine as a Novel Source of Potentially Probiotic Lactic Acid Bacteria. Microorganisms 2019; 7:E709. [PMID: 31861080 PMCID: PMC6958562 DOI: 10.3390/microorganisms7120709] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 12/06/2019] [Accepted: 12/15/2019] [Indexed: 12/17/2022] Open
Abstract
Microorganisms inhabiting fermented foods represent the main link between the consumption of this food and human health. Although some fermented food is a reservoir of potentially probiotic microorganisms, several foods are still unexplored. This study aimed at characterizing the probiotic potential of lactic acid bacteria isolated from zgougou, a fermented matrix consisting of a watery mixture of Aleppo pine's seeds. In vitro methods were used to characterize the safety, survival ability in typical conditions of the gastrointestinal tract, and adherence capacity to surfaces, antimicrobial, and antioxidant activities. Strains belonged to the Lactobacillus plantarum group and Enterococcus faecalis showed no DNase, hemolytic, and gelatinase activities. In addition, their susceptibility to most of the tested antibiotics, satisfied some of the safety prerequisites for their potential use as probiotics. All the strains tolerated low pH, gastrointestinal enzymes, and bile salts. They displayed a good antibacterial activity and antibiofilm formation against 10 reference bacterial pathogens, especially when used as a cell-free supernatant. Furthermore, the lactic acid bacteria (LAB) strains inhibited the growth of Aspergillus flavus and Aspergillus carbonarius. Finally, they had good antioxidant activity, although depending on the strain. Overall, the results of this work highlight that zgougou represents an important reservoir of potentially probiotic LAB. Obviously, future studies should be addressed to confirm the health benefits of the LAB strains.
Collapse
Affiliation(s)
- Jihen Missaoui
- Laboratory of Analysis, Treatment and Evaluation of Environmental Pollutants and Products, Faculty of Pharmacy, Monastir University, 5000 Monastir, Tunisia; (J.M.); (D.S.); (R.M.)
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, 70126 Bari, Italy
| | - Dalila Saidane
- Laboratory of Analysis, Treatment and Evaluation of Environmental Pollutants and Products, Faculty of Pharmacy, Monastir University, 5000 Monastir, Tunisia; (J.M.); (D.S.); (R.M.)
| | - Ridha Mzoughi
- Laboratory of Analysis, Treatment and Evaluation of Environmental Pollutants and Products, Faculty of Pharmacy, Monastir University, 5000 Monastir, Tunisia; (J.M.); (D.S.); (R.M.)
| | - Fabio Minervini
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, 70126 Bari, Italy
| |
Collapse
|
11
|
Gałan W, Bąk M, Jakubowska M. Host Taxon Predictor - A Tool for Predicting Taxon of the Host of a Newly Discovered Virus. Sci Rep 2019; 9:3436. [PMID: 30837511 PMCID: PMC6400966 DOI: 10.1038/s41598-019-39847-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Accepted: 01/30/2019] [Indexed: 12/04/2022] Open
Abstract
Recent advances in metagenomics provided a valuable alternative to culture-based approaches for better sampling viral diversity. However, some of newly identified viruses lack sequence similarity to any of previously sequenced ones, and cannot be easily assigned to their hosts. Here we present a bioinformatic approach to this problem. We developed classifiers capable of distinguishing eukaryotic viruses from the phages achieving almost 95% prediction accuracy. The classifiers are wrapped in Host Taxon Predictor (HTP) software written in Python which is freely available at https://github.com/wojciech-galan/viruses_classifier. HTP’s performance was later demonstrated on a collection of newly identified viral genomes and genome fragments. In summary, HTP is a culture- and alignment-free approach for distinction between phages and eukaryotic viruses. We have also shown that it is possible to further extend our method to go up the evolutionary tree and predict whether a virus can infect narrower taxa.
Collapse
Affiliation(s)
- Wojciech Gałan
- Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University in Kraków, ul. Gronostajowa 7, 30-387, Kraków, Poland.
| | - Maciej Bąk
- Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University in Kraków, ul. Gronostajowa 7, 30-387, Kraków, Poland
| | - Małgorzata Jakubowska
- AGH University of Science and Technology, Faculty of Materials Science and Ceramics, al. Mickiewicza 30, 30-059, Kraków, Poland
| |
Collapse
|
12
|
Existing Host Range Mutations Constrain Further Emergence of RNA Viruses. J Virol 2019; 93:JVI.01385-18. [PMID: 30463962 DOI: 10.1128/jvi.01385-18] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Accepted: 11/06/2018] [Indexed: 02/07/2023] Open
Abstract
RNA viruses are capable of rapid host shifting, typically due to a point mutation that confers expanded host range. As additional point mutations are necessary for further expansions, epistasis among host range mutations can potentially affect the mutational neighborhood and frequency of niche expansion. We mapped the mutational neighborhood of host range expansion using three genotypes of the double-stranded RNA (dsRNA) bacteriophage φ6 (wild type and two isogenic host range mutants) on the novel host Pseudomonas syringae pv. atrofaciens. Both Sanger sequencing of 50 P. syringae pv. atrofaciens mutant clones for each genotype and population Illumina sequencing revealed the same high-frequency mutations allowing infection of P. syringae pv. atrofaciens. Wild-type φ6 had at least nine different ways of mutating to enter the novel host, eight of which are in p3 (host attachment protein gene), and 13/50 clones had unchanged p3 genes. However, the two isogenic mutants had dramatically restricted neighborhoods: only one or two mutations, all in p3. Deep sequencing revealed that wild-type clones without mutations in p3 likely had changes in p12 (morphogenic protein), a region that was not polymorphic for the two isogenic host range mutants. Sanger sequencing confirmed that 10/13 of the wild-type φ6 clones had nonsynonymous mutations in p12, and 2 others had point mutations in p9 and p5. None of these genes had previously been associated with host range expansion in φ6. We demonstrate, for the first time, epistatic constraint in an RNA virus due to host range mutations themselves, which has implications for models of serial host range expansion.IMPORTANCE RNA viruses mutate rapidly and frequently expand their host ranges to infect novel hosts, leading to serial host shifts. Using an RNA bacteriophage model system (Pseudomonas phage φ6), we studied the impact of preexisting host range mutations on another host range expansion. Results from both clonal Sanger and Illumina sequencing show that extant host range mutations dramatically narrow the neighborhood of potential host range mutations compared to that of wild-type φ6. This research suggests that serial host-shifting viruses may follow a small number of molecular paths to enter additional novel hosts. We also identified new genes involved in φ6 host range expansion, expanding our knowledge of this important model system in experimental evolution.
Collapse
|
13
|
Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences. Sci Rep 2018; 8:10032. [PMID: 29968780 PMCID: PMC6030160 DOI: 10.1038/s41598-018-28308-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 06/15/2018] [Indexed: 12/05/2022] Open
Abstract
Predicting the hosts of newly discovered viruses is important for pandemic surveillance of infectious diseases. We investigated the use of alignment-based and alignment-free methods and support vector machine using mononucleotide frequency and dinucleotide bias to predict the hosts of viruses, and applied these approaches to three datasets: rabies virus, coronavirus, and influenza A virus. For coronavirus, we used the spike gene sequences, while for rabies and influenza A viruses, we used the more conserved nucleoprotein gene sequences. We compared the three methods under different scenarios and showed that their performances are highly correlated with the variability of sequences and sample size. For conserved genes like the nucleoprotein gene, longer k-mers than mono- and dinucleotides are needed to better distinguish the sequences. We also showed that both alignment-based and alignment-free methods can accurately predict the hosts of viruses. When alignment is difficult to achieve or highly time-consuming, alignment-free methods can be a promising substitute to predict the hosts of new viruses.
Collapse
|
14
|
McLeish MJ, Fraile A, García-Arenal F. Ecological Complexity in Plant Virus Host Range Evolution. Adv Virus Res 2018; 101:293-339. [PMID: 29908592 DOI: 10.1016/bs.aivir.2018.02.009] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The host range of a plant virus is the number of species in which it can reproduce. Most studies of plant virus host range evolution have focused on the genetics of host-pathogen interactions. However, the distribution and abundance of plant viruses and their hosts do not always overlap, and these spatial and temporal discontinuities in plant virus-host interactions can result in various ecological processes that shape host range evolution. Recent work shows that the distributions of pathogenic and resistant genotypes, vectors, and other resources supporting transmission vary widely in the environment, producing both expected and unanticipated patterns. The distributions of all of these factors are influenced further by competitive effects, natural enemies, anthropogenic disturbance, the abiotic environment, and herbivory to mention some. We suggest the need for further development of approaches that (i) explicitly consider resource use and the abiotic and biotic factors that affect the strategies by which viruses exploit resources; and (ii) are sensitive across scales. Host range and habitat specificity will largely determine which phyla are most likely to be new hosts, but predicting which host and when it is likely to be infected is enormously challenging because it is unclear how environmental heterogeneity affects the interactions of viruses and hosts.
Collapse
Affiliation(s)
- Michael J McLeish
- Centro de Biotecnología y Genómica de Plantas UPM-INIA, and E.T.S.I. Agrícola, Alimentaria y de Biosistemas, Campus de Montegancedo, Universidad Politécnica de Madrid, Madrid, Spain
| | - Aurora Fraile
- Centro de Biotecnología y Genómica de Plantas UPM-INIA, and E.T.S.I. Agrícola, Alimentaria y de Biosistemas, Campus de Montegancedo, Universidad Politécnica de Madrid, Madrid, Spain
| | - Fernando García-Arenal
- Centro de Biotecnología y Genómica de Plantas UPM-INIA, and E.T.S.I. Agrícola, Alimentaria y de Biosistemas, Campus de Montegancedo, Universidad Politécnica de Madrid, Madrid, Spain.
| |
Collapse
|
15
|
Lourenço J, Watkins ER, Obolski U, Peacock SJ, Morris C, Maiden MCJ, Gupta S. Lineage structure of Streptococcus pneumoniae may be driven by immune selection on the groEL heat-shock protein. Sci Rep 2017; 7:9023. [PMID: 28831154 PMCID: PMC5567354 DOI: 10.1038/s41598-017-08990-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Accepted: 07/20/2017] [Indexed: 12/29/2022] Open
Abstract
Populations of Streptococcus pneumoniae (SP) are typically structured into groups of closely related organisms or lineages, but it is not clear whether they are maintained by selection or neutral processes. Here, we attempt to address this question by applying a machine learning technique to SP whole genomes. Our results indicate that lineages evolved through immune selection on the groEL chaperone protein. The groEL protein is part of the groESL operon and enables a large range of proteins to fold correctly within the physical environment of the nasopharynx, thereby explaining why lineage structure is so stable within SP despite high levels of genetic transfer. SP is also antigenically diverse, exhibiting a variety of distinct capsular serotypes. Associations exist between lineage and capsular serotype but these can be easily perturbed, such as by vaccination. Overall, our analyses indicate that the evolution of SP can be conceptualized as the rearrangement of modular functional units occurring on several different timescales under different pressures: some patterns have locked in early (such as the epistatic interactions between groESL and a constellation of other genes) and preserve the differentiation of lineages, while others (such as the associations between capsular serotype and lineage) remain in continuous flux.
Collapse
Affiliation(s)
- José Lourenço
- Department of Zoology, University of Oxford, Oxford, United Kingdom.
| | | | - Uri Obolski
- Department of Zoology, University of Oxford, Oxford, United Kingdom
| | - Samuel J Peacock
- Department of Zoology, University of Oxford, Oxford, United Kingdom
| | | | | | - Sunetra Gupta
- Department of Zoology, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
16
|
Mohammadi M, Sharifi Noghabi H, Abed Hodtani G, Rajabi Mashhadi H. Robust and stable gene selection via Maximum–Minimum Correntropy Criterion. Genomics 2016; 107:83-87. [DOI: 10.1016/j.ygeno.2015.12.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Revised: 12/13/2015] [Accepted: 12/23/2015] [Indexed: 11/17/2022]
|
17
|
Pessia A, Grad Y, Cobey S, Puranen JS, Corander J. K-Pax2: Bayesian identification of cluster-defining amino acid positions in large sequence datasets. Microb Genom 2015; 1:e000025. [PMID: 28348810 PMCID: PMC5320600 DOI: 10.1099/mgen.0.000025] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2015] [Accepted: 06/08/2015] [Indexed: 12/21/2022] Open
Abstract
The recent growth in publicly available sequence data has introduced new opportunities for studying microbial evolution and spread. Because the pace of sequence accumulation tends to exceed the pace of experimental studies of protein function and the roles of individual amino acids, statistical tools to identify meaningful patterns in protein diversity are essential. Large sequence alignments from fast-evolving micro-organisms are particularly challenging to dissect using standard tools from phylogenetics and multivariate statistics because biologically relevant functional signals are easily masked by neutral variation and noise. To meet this need, a novel computational method is introduced that is easily executed in parallel using a cluster environment and can handle thousands of sequences with minimal subjective input from the user. The usefulness of this kind of machine learning is demonstrated by applying it to nearly 5000 haemagglutinin sequences of influenza A/H3N2.Antigenic and 3D structural mapping of the results show that the method can recover the major jumps in antigenic phenotype that occurred between 1968 and 2013 and identify specific amino acids associated with these changes. The method is expected to provide a useful tool to uncover patterns of protein evolution.
Collapse
Affiliation(s)
- Alberto Pessia
- Department of Mathematics and Statistics, University of Helsinki, Finland
| | - Yonatan Grad
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- Division of Infectious Diseases, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Sarah Cobey
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, USA
| | | | - Jukka Corander
- Department of Mathematics and Statistics, University of Helsinki, Finland
| |
Collapse
|