1
|
Naor-Hoffmann S, Svetlitsky D, Sal-Man N, Orenstein Y, Ziv-Ukelson M. Predicting the pathogenicity of bacterial genomes using widely spread protein families. BMC Bioinformatics 2022; 23:253. [PMID: 35751023 PMCID: PMC9233384 DOI: 10.1186/s12859-022-04777-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 04/13/2022] [Indexed: 11/15/2022] Open
Abstract
Background The human body is inhabited by a diverse community of commensal non-pathogenic bacteria, many of which are essential for our health. By contrast, pathogenic bacteria have the ability to invade their hosts and cause a disease. Characterizing the differences between pathogenic and commensal non-pathogenic bacteria is important for the detection of emerging pathogens and for the development of new treatments. Previous methods for classification of bacteria as pathogenic or non-pathogenic used either raw genomic reads or protein families as features. Using protein families instead of reads provided a better interpretability of the resulting model. However, the accuracy of protein-families-based classifiers can still be improved. Results We developed a wide scope pathogenicity classifier (WSPC), a new protein-content-based machine-learning classification model. We trained WSPC on a newly curated dataset of 641 bacterial genomes, where each genome belongs to a different species. A comparative analysis we conducted shows that WSPC outperforms existing models on two benchmark test sets. We observed that the most discriminative protein-family features in WSPC are widely spread among bacterial species. These features correspond to proteins that are involved in the ability of bacteria to survive and replicate during an infection, rather than proteins that are directly involved in damaging or invading the host.
Collapse
Affiliation(s)
- Shaked Naor-Hoffmann
- Department of Computer Science, Ben-Gurion University of the Negev, Be'er Sheva, Israel
| | - Dina Svetlitsky
- Department of Computer Science, Ben-Gurion University of the Negev, Be'er Sheva, Israel
| | - Neta Sal-Man
- The Shraga Segal Department of Microbiology, Immunology and Genetics, Faculty of Health Sciences, Ben-Gurion University of the Negev, Be'er Sheva, Israel
| | - Yaron Orenstein
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Be'er Sheva, Israel
| | - Michal Ziv-Ukelson
- Department of Computer Science, Ben-Gurion University of the Negev, Be'er Sheva, Israel.
| |
Collapse
|
2
|
Allen JP, Snitkin E, Pincus NB, Hauser AR. Forest and Trees: Exploring Bacterial Virulence with Genome-wide Association Studies and Machine Learning. Trends Microbiol 2021; 29:621-633. [PMID: 33455849 PMCID: PMC8187264 DOI: 10.1016/j.tim.2020.12.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 12/07/2020] [Accepted: 12/08/2020] [Indexed: 12/15/2022]
Abstract
The advent of inexpensive and rapid sequencing technologies has allowed bacterial whole-genome sequences to be generated at an unprecedented pace. This wealth of information has revealed an unanticipated degree of strain-to-strain genetic diversity within many bacterial species. Awareness of this genetic heterogeneity has corresponded with a greater appreciation of intraspecies variation in virulence. A number of comparative genomic strategies have been developed to link these genotypic and pathogenic differences with the aim of discovering novel virulence factors. Here, we review recent advances in comparative genomic approaches to identify bacterial virulence determinants, with a focus on genome-wide association studies and machine learning.
Collapse
Affiliation(s)
- Jonathan P Allen
- Department of Microbiology and Immunology, Loyola University Chicago Stritch School of Medicine, Maywood, IL 60153, USA.
| | - Evan Snitkin
- Department of Microbiology and Immunology, Department of Internal Medicine/Division of Infectious Diseases, University of Michigan, Ann Arbor, MI 48109, USA
| | - Nathan B Pincus
- Department of Microbiology-Immunology, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Alan R Hauser
- Department of Microbiology-Immunology, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA; Department of Medicine/Division of Infectious Diseases, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| |
Collapse
|
3
|
Barash E, Sal-Man N, Sabato S, Ziv-Ukelson M. BacPaCS-Bacterial Pathogenicity Classification via Sparse-SVM. Bioinformatics 2020; 35:2001-2008. [PMID: 30407484 DOI: 10.1093/bioinformatics/bty928] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2018] [Revised: 08/30/2018] [Accepted: 11/07/2018] [Indexed: 01/01/2023] Open
Abstract
MOTIVATION Bacterial infections are a major cause of illness worldwide. However, most bacterial strains pose no threat to human health and may even be beneficial. Thus, developing powerful diagnostic bioinformatic tools that differentiate pathogenic from commensal bacteria are critical for effective treatment of bacterial infections. RESULTS We propose a machine-learning approach for classifying human-hosted bacteria as pathogenic or non-pathogenic based on their genome-derived proteomes. Our approach is based on sparse Support Vector Machines (SVM), which autonomously selects a small set of genes that are related to bacterial pathogenicity. We implement our approach as a tool-'Bacterial Pathogenicity Classification via sparse-SVM' (BacPaCS)-which is fully automated and handles datasets significantly larger than those previously used. BacPaCS shows high accuracy in distinguishing pathogenic from non-pathogenic bacteria, in a clinically relevant dataset, comprising only human-hosted bacteria. Among the genes that received the highest positive weight in the resulting classifier, we found genes that are known to be related to bacterial pathogenicity, in addition to novel candidates, whose involvement in bacterial virulence was never reported. AVAILABILITY AND IMPLEMENTATION The code and the resulting model are available at: https://github.com/barashe/bacpacs. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Eran Barash
- Department of Computer Science, Faculty of Natural Sciences
| | - Neta Sal-Man
- The Shraga Segal Department of Microbiology Immunology and Genetics, Faculty of Health Sciences, Ben-Gurion University of the Negev, BeerSheva, Israel
| | - Sivan Sabato
- Department of Computer Science, Faculty of Natural Sciences
| | | |
Collapse
|
4
|
Abstract
With the ever-expanding number of available sequences from bacterial genomes, and the expectation that this data type will be the primary one generated from both diagnostic and research laboratories for the foreseeable future, then there is both an opportunity and a need to evaluate how effectively computational approaches can be used within bacterial genomics to predict and understand complex phenotypes, such as pathogenic potential and host source. This article applied various quantitative methods such as diversity indexes, pangenome-wide association studies (GWAS) and dimensionality reduction techniques to better understand the data and then compared how well unsupervised and supervised machine learning (ML) methods could predict the source host of the isolates. The study uses the example of the pangenomes of 1203 Salmonella enterica serovar Typhimurium isolates in order to predict 'host of isolation' using these different methods. The article is aimed as a review of recent applications of ML in infection biology, but also, by working through this specific dataset, it allows discussion of the advantages and drawbacks of the different techniques. As with all such sub-population studies, the biological relevance will be dependent on the quality and diversity of the input data. Given this major caveat, we show that supervised ML has the potential to add real value to interpretation of bacterial genomic data, as it can provide probabilistic outcomes for important phenotypes, something that is very difficult to achieve with the other methods.
Collapse
Affiliation(s)
- Nadejda Lupolova
- Division of Infection and Immunity, The Roslin Institute, University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG, UK
| | - Samantha J Lycett
- Division of Infection and Immunity, The Roslin Institute, University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG, UK
| | - David L Gally
- Division of Infection and Immunity, The Roslin Institute, University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG, UK
| |
Collapse
|
5
|
Hyoju SK, Zaborin A, Keskey R, Sharma A, Arnold W, van den Berg F, Kim SM, Gottel N, Bethel C, Charnot-Katsikas A, Jianxin P, Adriaansens C, Papazian E, Gilbert JA, Zaborina O, Alverdy JC. Mice Fed an Obesogenic Western Diet, Administered Antibiotics, and Subjected to a Sterile Surgical Procedure Develop Lethal Septicemia with Multidrug-Resistant Pathobionts. mBio 2019; 10:e00903-19. [PMID: 31363025 DOI: 10.1128/mBio.00903-19] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Despite antibiotics and sterile technique, postoperative infections remain a real and present danger to patients. Recent estimates suggest that 50% of the pathogens associated with postoperative infections have become resistant to the standard antibiotics used for prophylaxis. Risk factors identified in such cases include obesity and antibiotic exposure. To study the combined effect of obesity and antibiotic exposure on postoperative infection, mice were allowed to gain weight on an obesogenic Western-type diet (WD), administered antibiotics and then subjected to an otherwise recoverable sterile surgical injury (30% hepatectomy). The feeding of a WD alone resulted in a major imbalance of the cecal microbiota characterized by a decrease in diversity, loss of Bacteroidetes, a bloom in Proteobacteria, and the emergence of antibiotic-resistant organisms among the cecal microbiota. When WD-fed mice were administered antibiotics and subjected to 30% liver resection, lethal sepsis, characterized by multiple-organ damage, developed. Notable was the emergence and systemic dissemination of multidrug-resistant (MDR) pathobionts, including carbapenem-resistant, extended-spectrum β-lactamase-producing Serratia marcescens, which expressed a virulent and immunosuppressive phenotype. Analysis of the distribution of exact sequence variants belonging to the genus Serratia suggested that these strains originated from the cecal mucosa. No mortality or MDR pathogens were observed in identically treated mice fed a standard chow diet. Taken together, these results suggest that consumption of a Western diet and exposure to certain antibiotics may predispose to life-threating postoperative infection associated with MDR organisms present among the gut microbiota.IMPORTANCE Obesity remains a prevalent and independent risk factor for life-threatening infection following major surgery. Here, we demonstrate that when mice are fed an obesogenic Western diet (WD), they become susceptible to lethal sepsis with multiple organ damage after exposure to antibiotics and an otherwise-recoverable surgical injury. Analysis of the gut microbiota in this model demonstrates that WD alone leads to loss of Bacteroidetes, a bloom of Proteobacteria, and evidence of antibiotic resistance development even before antibiotics are administered. After antibiotics and surgery, lethal sepsis with organ damage developed in in mice fed a WD with the appearance of multidrug-resistant pathogens in the liver, spleen, and blood. The importance of these findings lies in exposing how the selective pressures of diet, antibiotic exposure, and surgical injury can converge on the microbiome, resulting in lethal sepsis and organ damage without the introduction of an exogenous pathogen.
Collapse
|
6
|
Martínez-García PM, López-Solanilla E, Ramos C, Rodríguez-Palenzuela P. Prediction of bacterial associations with plants using a supervised machine-learning approach. Environ Microbiol 2016; 18:4847-4861. [PMID: 27234490 DOI: 10.1111/1462-2920.13389] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2015] [Revised: 05/20/2016] [Accepted: 05/20/2016] [Indexed: 12/11/2022]
Abstract
Recent scenarios of fresh produce contamination by human enteric pathogens have resulted in severe food-borne outbreaks, and a new paradigm has emerged stating that some human-associated bacteria can use plants as secondary hosts. As a consequence, there has been growing concern in the scientific community about these interactions that have not yet been elucidated. Since this is a relatively new area, there is a lack of strategies to address the problem of food-borne illnesses due to the ingestion of fruits and vegetables. In the present study, we performed specific genome annotations to train a supervised machine-learning model that allows for the identification of plant-associated bacteria with a precision of ∼93%. The application of our method to approximately 9500 genomes predicted several unknown interactions between well-known human pathogens and plants, and it also confirmed several cases for which evidence has been reported. We observed that factors involved in adhesion, the deconstruction of the plant cell wall and detoxifying activities were highlighted as the most predictive features. The application of our strategy to sequenced strains that are involved in food poisoning can be used as a primary screening tool to determine the possible causes of contaminations.
Collapse
Affiliation(s)
- Pedro Manuel Martínez-García
- Área de Genética, Facultad de Ciencias, Instituto de Hortofruticultura Subtropical y Mediterránea 'La Mayora', Universidad de Málaga, Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), Málaga, E-29071, Spain.,Centro de Biotecnología y Genómica de Plantas (CBGP), Universidad Politécnica de Madrid-Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria, Parque Científico y Tecnológico de la Universidad Politécnica de Madrid. Campus de Montegancedo, Pozuelo de Alarcón, Madrid, 28223, Spain
| | - Emilia López-Solanilla
- Centro de Biotecnología y Genómica de Plantas (CBGP), Universidad Politécnica de Madrid-Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria, Parque Científico y Tecnológico de la Universidad Politécnica de Madrid. Campus de Montegancedo, Pozuelo de Alarcón, Madrid, 28223, Spain.,Departamento de Biología Vegetal. Escuela Técnica Superior de Ingenieros Agrónomos, Universidad Politécnica de Madrid, Avenida Complutense, 3, Madrid, 28040, Spain
| | - Cayo Ramos
- Área de Genética, Facultad de Ciencias, Instituto de Hortofruticultura Subtropical y Mediterránea 'La Mayora', Universidad de Málaga, Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), Málaga, E-29071, Spain
| | - Pablo Rodríguez-Palenzuela
- Centro de Biotecnología y Genómica de Plantas (CBGP), Universidad Politécnica de Madrid-Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria, Parque Científico y Tecnológico de la Universidad Politécnica de Madrid. Campus de Montegancedo, Pozuelo de Alarcón, Madrid, 28223, Spain.,Departamento de Biología Vegetal. Escuela Técnica Superior de Ingenieros Agrónomos, Universidad Politécnica de Madrid, Avenida Complutense, 3, Madrid, 28040, Spain
| |
Collapse
|
7
|
Friis-Nielsen J, Kjartansdóttir KR, Mollerup S, Asplund M, Mourier T, Jensen RH, Hansen TA, Rey-Iglesia A, Richter SR, Nielsen IB, Alquezar-Planas DE, Olsen PVS, Vinner L, Fridholm H, Nielsen LP, Willerslev E, Sicheritz-Pontén T, Lund O, Hansen AJ, Izarzugaza JMG, Brunak S. Identification of Known and Novel Recurrent Viral Sequences in Data from Multiple Patients and Multiple Cancers. Viruses 2016; 8:E53. [PMID: 26907326 PMCID: PMC4776208 DOI: 10.3390/v8020053] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Revised: 01/29/2016] [Accepted: 02/05/2016] [Indexed: 12/17/2022] Open
Abstract
Virus discovery from high throughput sequencing data often follows a bottom-up approach where taxonomic annotation takes place prior to association to disease. Albeit effective in some cases, the approach fails to detect novel pathogens and remote variants not present in reference databases. We have developed a species independent pipeline that utilises sequence clustering for the identification of nucleotide sequences that co-occur across multiple sequencing data instances. We applied the workflow to 686 sequencing libraries from 252 cancer samples of different cancer and tissue types, 32 non-template controls, and 24 test samples. Recurrent sequences were statistically associated to biological, methodological or technical features with the aim to identify novel pathogens or plausible contaminants that may associate to a particular kit or method. We provide examples of identified inhabitants of the healthy tissue flora as well as experimental contaminants. Unmapped sequences that co-occur with high statistical significance potentially represent the unknown sequence space where novel pathogens can be identified.
Collapse
Affiliation(s)
- Jens Friis-Nielsen
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.
| | - Kristín Rós Kjartansdóttir
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Sarah Mollerup
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Maria Asplund
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Tobias Mourier
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Randi Holm Jensen
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Thomas Arn Hansen
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Alba Rey-Iglesia
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Stine Raith Richter
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Ida Broman Nielsen
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - David E Alquezar-Planas
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Pernille V S Olsen
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Lasse Vinner
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Helena Fridholm
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Lars Peter Nielsen
- Department of Autoimmunology and Biomarkers, Statens Serum Institut, DK-2300 Copenhagen S, Denmark.
| | - Eske Willerslev
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Thomas Sicheritz-Pontén
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.
| | - Ole Lund
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.
| | - Anders Johannes Hansen
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, DK-1350 Copenhagen, Denmark.
| | - Jose M G Izarzugaza
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.
| | - Søren Brunak
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.
- NNF Center for Protein Research, University of Copenhagen, Blegdamsvej 3B, DK-2200 Copenhagen, Denmark.
| |
Collapse
|
8
|
Land M, Hauser L, Jun SR, Nookaew I, Leuze MR, Ahn TH, Karpinets T, Lund O, Kora G, Wassenaar T, Poudel S, Ussery DW. Insights from 20 years of bacterial genome sequencing. Funct Integr Genomics 2015; 15:141-61. [PMID: 25722247 PMCID: PMC4361730 DOI: 10.1007/s10142-015-0433-4] [Citation(s) in RCA: 391] [Impact Index Per Article: 43.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Revised: 02/11/2015] [Accepted: 02/12/2015] [Indexed: 12/18/2022]
Abstract
Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome sequencing? There are many practical applications, such as genome-scale metabolic modeling, biosurveillance, bioforensics, and infectious disease epidemiology. In the near future, high-throughput sequencing of patient metagenomic samples could revolutionize medicine in terms of speed and accuracy of finding pathogens and knowing how to treat them.
Collapse
Affiliation(s)
- Miriam Land
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Loren Hauser
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
- Joint Institute for Biological Sciences, University of Tennessee, Knoxville, TN 37996 USA
- Department of Microbiology, University of Tennessee, Knoxville, TN 37996 USA
| | - Se-Ran Jun
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Intawat Nookaew
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Michael R. Leuze
- Computer Science and Mathematics Division, Computer Science Research Group, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Tae-Hyuk Ahn
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
- Computer Science and Mathematics Division, Computer Science Research Group, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Tatiana Karpinets
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Ole Lund
- Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark, Kgs. Lyngby, 2800 Denmark
| | - Guruprased Kora
- Computer Science and Mathematics Division, Computer Science Research Group, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Trudy Wassenaar
- Molecular Microbiology and Genomics Consultants, Tannenstr 7, 55576 Zotzenheim, Germany
| | - Suresh Poudel
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
- Genome Science and Technology, University of Tennessee, Knoxville, TN 37996 USA
| | - David W. Ussery
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
- Joint Institute for Biological Sciences, University of Tennessee, Knoxville, TN 37996 USA
- Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark, Kgs. Lyngby, 2800 Denmark
- Genome Science and Technology, University of Tennessee, Knoxville, TN 37996 USA
| |
Collapse
|
9
|
Barbosa E, Röttger R, Hauschild AC, Azevedo V, Baumbach J. On the limits of computational functional genomics for bacterial lifestyle prediction. Brief Funct Genomics 2014; 13:398-408. [PMID: 24855068 DOI: 10.1093/bfgp/elu014] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
We review the level of genomic specificity regarding actinobacterial pathogenicity. As they occupy various niches in diverse habitats, one may assume the existence of lifestyle-specific genomic features. We include 240 actinobacteria classified into four pathogenicity classes: human pathogens (HPs), broad-spectrum pathogens (BPs), opportunistic pathogens (OPs) and non-pathogenic (NP). We hypothesize: (H1) Pathogens (HPs and BPs) possess specific pathogenicity signature genes. (H2) The same holds for OPs. (H3) Broad-spectrum and exclusively HPs cannot be distinguished from each other because of an observation bias, i.e. many HPs might yet be unclassified BPs. (H4) There is no intrinsic genomic characteristic of OPs compared with pathogens, as small mutations are likely to play a more dominant role to survive the immune system. To study these hypotheses, we implemented a bioinformatics pipeline that combines evolutionary sequence analysis with statistical learning methods (Random Forest with feature selection, model tuning and robustness analysis). Essentially, we present orthologous gene sets that computationally distinguish pathogens from NPs (H1). We further show a clear limit in differentiating OPs from both NPs (H2) and pathogens (H4). HPs may also not be distinguished from bacteria annotated as BPs based only on a small set of orthologous genes (H3), as many HPs might as well target a broad range of mammals but have not been annotated accordingly. In conclusion, we illustrate that even in the post-genome era and despite next-generation sequencing technology, our ability to efficiently deduce real-world conclusions, such as pathogenicity classification, remains quite limited.
Collapse
|
10
|
Herrero-Fresno A, Leekitcharoenphon P, Hendriksen RS, Olsen JE, Aarestrup FM. .Analysis of the contribution of bacteriophage ST64B to in vitro virulence traits of Salmonella enterica serovar Typhimurium. J Med Microbiol 2013; 63:331-342. [PMID: 24324031 DOI: 10.1099/jmm.0.068221-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Comparison of the publicly available genomes of the virulent Salmonella enterica serovar Typhimurium (S. Typhimurium) strains SL1344, 14028s and D23580 to that of the virulence-attenuated isolate LT2 revealed the absence of a full sequence of bacteriophage ST64B in the latter. Four selected ST64B regions of unknown function (sb7-sb11, sb46, sb49-sb50 and sb54) were mapped by PCR in two strain collections: (i) 310 isolates of S. Typhimurium from human blood or stool samples, and from food, animal and environmental reservoirs; and (ii) 90 isolates belonging to other serovars. The region sb49-sb50 was found to be unique to S. Typhimurium and was strongly associated with strains isolated from blood samples (100 and 28.4 % of the blood and non-blood isolates, respectively). The region was cloned into LT2 and knocked out in SL1344, and these strains were compared to wild-type isogenic strains in in vitro assays used to predict virulence association. No difference in invasion of the Int407 human cell line was observed between the wild-type and mutated strains, but the isolate carrying the whole ST64B prophage was found to have a slightly better survival in blood. The study showed a high prevalence and a strong association between the prophage ST64B and isolates of S. Typhimurium collected from blood, and may indicate that such strains constitute a selected subpopulation within this serovar. Further studies are indicated to determine whether the slight increase in blood survival observed in the strain carrying ST64B genes is of paramount importance for systemic infections.
Collapse
Affiliation(s)
- Ana Herrero-Fresno
- Department of Veterinary Disease Biology, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark.,WHO Collaborating Centre for Antimicrobial Resistance in Food-borne Pathogens and EU Reference Laboratory for Antimicrobial Resistance, National Food Institute, Technical University of Denmark, Kgs Lyngby, Denmark
| | - Pimlapas Leekitcharoenphon
- WHO Collaborating Centre for Antimicrobial Resistance in Food-borne Pathogens and EU Reference Laboratory for Antimicrobial Resistance, National Food Institute, Technical University of Denmark, Kgs Lyngby, Denmark
| | - Rene S Hendriksen
- WHO Collaborating Centre for Antimicrobial Resistance in Food-borne Pathogens and EU Reference Laboratory for Antimicrobial Resistance, National Food Institute, Technical University of Denmark, Kgs Lyngby, Denmark
| | - John E Olsen
- Department of Veterinary Disease Biology, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
| | - Frank M Aarestrup
- WHO Collaborating Centre for Antimicrobial Resistance in Food-borne Pathogens and EU Reference Laboratory for Antimicrobial Resistance, National Food Institute, Technical University of Denmark, Kgs Lyngby, Denmark
| |
Collapse
|
11
|
Cosentino S, Voldby Larsen M, Møller Aarestrup F, Lund O. PathogenFinder--distinguishing friend from foe using bacterial whole genome sequence data. PLoS One 2013; 8:e77302. [PMID: 24204795 PMCID: PMC3810466 DOI: 10.1371/journal.pone.0077302] [Citation(s) in RCA: 276] [Impact Index Per Article: 25.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2013] [Accepted: 09/09/2013] [Indexed: 01/01/2023] Open
Abstract
Although the majority of bacteria are harmless or even beneficial to their host, others are highly virulent and can cause serious diseases, and even death. Due to the constantly decreasing cost of high-throughput sequencing there are now many completely sequenced genomes available from both human pathogenic and innocuous strains. The data can be used to identify gene families that correlate with pathogenicity and to develop tools to predict the pathogenicity of newly sequenced strains, investigations that previously were mainly done by means of more expensive and time consuming experimental approaches. We describe PathogenFinder (http://cge.cbs.dtu.dk/services/PathogenFinder/), a web-server for the prediction of bacterial pathogenicity by analysing the input proteome, genome, or raw reads provided by the user. The method relies on groups of proteins, created without regard to their annotated function or known involvement in pathogenicity. The method has been built to work with all taxonomic groups of bacteria and using the entire training-set, achieved an accuracy of 88.6% on an independent test-set, by correctly classifying 398 out of 449 completely sequenced bacteria. The approach here proposed is not biased on sets of genes known to be associated with pathogenicity, thus the approach could aid the discovery of novel pathogenicity factors. Furthermore the pathogenicity prediction web-server could be used to isolate the potential pathogenic features of both known and unknown strains.
Collapse
Affiliation(s)
- Salvatore Cosentino
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Kgs. Lyngby, Denmark
- * E-mail:
| | - Mette Voldby Larsen
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Kgs. Lyngby, Denmark
| | | | - Ole Lund
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Kgs. Lyngby, Denmark
| |
Collapse
|