1
|
Dotan E, Jaschek G, Pupko T, Belinkov Y. Effect of tokenization on transformers for biological sequences. Bioinformatics 2024:btae196. [PMID: 38608190 DOI: 10.1093/bioinformatics/btae196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 02/20/2024] [Accepted: 04/11/2024] [Indexed: 04/14/2024]
Abstract
MOTIVATION Deep-learning models are transforming biological research, including many bioinformatics and comparative genomics algorithms, such as sequence alignments, phylogenetic tree inference, and automatic classification of protein functions. Among these deep-learning algorithms, models for processing natural languages, developed in the natural language processing (NLP) community, were recently applied to biological sequences. However, biological sequences are different from natural languages, such as English, and French, in which segmentation of the text to separate words is relatively straightforward. Moreover, biological sequences are characterized by extremely long sentences, which hamper their processing by current machine-learning models, notably the transformer architecture. In NLP, one of the first processing steps is to transform the raw text to a list of tokens. Deep-learning applications to biological sequence data mostly segment proteins and DNA to single characters. In this work, we study the effect of alternative tokenization algorithms on eight different tasks in biology, from predicting the function of proteins and their stability, through nucleotide sequence alignment, to classifying proteins to specific families. RESULTS We demonstrate that applying alternative tokenization algorithms can increase accuracy and at the same time, substantially reduce the input length compared to the trivial tokenizer in which each character is a token. Furthermore, applying these tokenization algorithms allows interpreting trained models, taking into account dependencies among positions. Finally, we trained these tokenizers on a large dataset of protein sequences containing more than 400 billion amino acids, which resulted in over a three-fold decrease in the number of tokens. We then tested these tokenizers trained on large-scale data on the above specific tasks and showed that for some tasks it is highly beneficial to train database-specific tokenizers. Our study suggests that tokenizers are likely to be a critical component in future deep-network analysis of biological sequence data. AVAILABILITY Code, data and trained tokenizers are available on https://github.com/technion-cs-nlp/BiologicalTokenizers.
Collapse
Affiliation(s)
- Edo Dotan
- The Henry and Marilyn Taub Faculty of Computer Science, Technion-Israel Institute of Technology, Haifa, 3200003, Israel
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Gal Jaschek
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06510, USA
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Yonatan Belinkov
- The Henry and Marilyn Taub Faculty of Computer Science, Technion-Israel Institute of Technology, Haifa, 3200003, Israel
| |
Collapse
|
2
|
Wygoda E, Loewenthal G, Moshe A, Alburquerque M, Mayrose I, Pupko T. Statistical framework to determine indel-length distribution. Bioinformatics 2024; 40:btae043. [PMID: 38269647 PMCID: PMC10868340 DOI: 10.1093/bioinformatics/btae043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 01/10/2024] [Accepted: 01/22/2024] [Indexed: 01/26/2024] Open
Abstract
MOTIVATION Insertions and deletions (indels) of short DNA segments, along with substitutions, are the most frequent molecular evolutionary events. Indels were shown to affect numerous macro-evolutionary processes. Because indels may span multiple positions, their impact is a product of both their rate and their length distribution. An accurate inference of indel-length distribution is important for multiple evolutionary and bioinformatics applications, most notably for alignment software. Previous studies counted the number of continuous gap characters in alignments to determine the best-fitting length distribution. However, gap-counting methods are not statistically rigorous, as gap blocks are not synonymous with indels. Furthermore, such methods rely on alignments that regularly contain errors and are biased due to the assumption of alignment methods that indels lengths follow a geometric distribution. RESULTS We aimed to determine which indel-length distribution best characterizes alignments using statistical rigorous methodologies. To this end, we reduced the alignment bias using a machine-learning algorithm and applied an Approximate Bayesian Computation methodology for model selection. Moreover, we developed a novel method to test if current indel models provide an adequate representation of the evolutionary process. We found that the best-fitting model varies among alignments, with a Zipf length distribution fitting the vast majority of them. AVAILABILITY AND IMPLEMENTATION The data underlying this article are available in Github, at https://github.com/elyawy/SpartaSim and https://github.com/elyawy/SpartaPipeline.
Collapse
Affiliation(s)
- Elya Wygoda
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Gil Loewenthal
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Asher Moshe
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Michael Alburquerque
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Itay Mayrose
- School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
3
|
Polonsky K, Pupko T, Freund NT. Evaluation of the Ability of AlphaFold to Predict the Three-Dimensional Structures of Antibodies and Epitopes. J Immunol 2023; 211:1578-1588. [PMID: 37782047 DOI: 10.4049/jimmunol.2300150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 09/06/2023] [Indexed: 10/03/2023]
Abstract
Being able to accurately predict the three-dimensional structure of an Ab can facilitate Ab characterization and epitope prediction, with important diagnostic and clinical implications. In this study, we evaluated the ability of AlphaFold to predict the structures of 222 recently published, high-resolution Fab H and L chain structures of Abs from different species directed against different Ags. We show that although the overall Ab prediction quality is in line with the results of CASP14, regions such as the complementarity-determining regions (CDRs) of the H chain, which are prone to higher variation, are predicted less accurately. Moreover, we discovered that AlphaFold mispredicts the bending angles between the variable and constant domains. To evaluate the ability of AlphaFold to model Ab-Ag interactions based only on sequence, we used AlphaFold-Multimer in combination with ZDOCK to predict the structures of 26 known Ab-Ag complexes. ZDOCK, which was applied on bound components of both the Ab and the Ag, succeeded in assembling 11 complexes, whereas AlphaFold succeeded in predicting only 2 of 26 models, with significant deviations in the docking contacts predicted in the rest of the molecules. Within the 11 complexes that were successfully predicted by ZDOCK, 9 involved short-peptide Ags (18-mer or less), whereas only 2 were complexes of Ab with a full-length protein. Docking of modeled unbound Ab and Ag was unsuccessful. In summary, our study provides important information about the abilities and limitations of using AlphaFold to predict Ab-Ag interactions and suggests areas for possible improvement.
Collapse
Affiliation(s)
- Ksenia Polonsky
- Department of Clinical Microbiology and Immunology, Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Tal Pupko
- Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Natalia T Freund
- Department of Clinical Microbiology and Immunology, Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
4
|
Geraffi N, Gupta P, Wagner N, Barash I, Pupko T, Sessa G. Comparative sequence analysis of pPATH pathogenicity plasmids in Pantoea agglomerans gall-forming bacteria. Front Plant Sci 2023; 14:1198160. [PMID: 37583594 PMCID: PMC10425158 DOI: 10.3389/fpls.2023.1198160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 07/10/2023] [Indexed: 08/17/2023]
Abstract
Acquisition of the pathogenicity plasmid pPATH that encodes a type III secretion system (T3SS) and effectors (T3Es) has likely led to the transition of a non-pathogenic bacterium into the tumorigenic pathogen Pantoea agglomerans. P. agglomerans pv. gypsophilae (Pag) forms galls on gypsophila (Gypsophila paniculata) and triggers immunity on sugar beet (Beta vulgaris), while P. agglomerans pv. betae (Pab) causes galls on both gypsophila and sugar beet. Draft sequences of the Pag and Pab genomes were previously generated using the MiSeq Illumina technology and used to determine partial T3E inventories of Pab and Pag. Here, we fully assembled the Pab and Pag genomes following sequencing with PacBio technology and carried out a comparative sequence analysis of the Pab and Pag pathogenicity plasmids pPATHpag and pPATHpab. Assembly of Pab and Pag genomes revealed a ~4 Mbp chromosome with a 55% GC content, and three and four plasmids in Pab and Pag, respectively. pPATHpag and pPATHpab share 97% identity within a 74% coverage, and a similar GC content (51%); they are ~156 kb and ~131 kb in size and consist of 198 and 155 coding sequences (CDSs), respectively. In both plasmids, we confirmed the presence of highly similar gene clusters encoding a T3SS, as well as auxin and cytokinins biosynthetic enzymes. Three putative novel T3Es were identified in Pab and one in Pag. Among T3SS-associated proteins encoded by Pag and Pab, we identified two novel chaperons of the ShcV and CesT families that are present in both pathovars with high similarity. We also identified insertion sequences (ISs) and transposons (Tns) that may have contributed to the evolution of the two pathovars. These include seven shared IS elements, and three ISs and two transposons unique to Pab. Finally, comparative sequence analysis revealed plasmid regions and CDSs that are present only in pPATHpab or in pPATHpag. The high similarity and common features of the pPATH plasmids support the hypothesis that the two strains recently evolved into host-specific pathogens.
Collapse
Affiliation(s)
- Naama Geraffi
- School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Priya Gupta
- School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Naama Wagner
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Isaac Barash
- School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Guido Sessa
- School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
5
|
Nagar N, Tubiana J, Loewenthal G, Wolfson HJ, Ben Tal N, Pupko T. EvoRator2: Predicting Site-specific Amino Acid Substitutions Based on Protein Structural Information Using Deep Learning. J Mol Biol 2023; 435:168155. [PMID: 37356902 DOI: 10.1016/j.jmb.2023.168155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 05/13/2023] [Accepted: 05/17/2023] [Indexed: 06/27/2023]
Abstract
Multiple sequence alignments (MSAs) are the workhorse of molecular evolution and structural biology research. From MSAs, the amino acids that are tolerated at each site during protein evolution can be inferred. However, little is known regarding the repertoire of tolerated amino acids in proteins when only a few or no sequence homologs are available, such as orphan and de novo designed proteins. Here we present EvoRator2, a deep-learning algorithm trained on over 15,000 protein structures that can predict which amino acids are tolerated at any given site, based exclusively on protein structural information mined from atomic coordinate files. We show that EvoRator2 obtained satisfying results for the prediction of position-weighted scoring matrices (PSSM). We further show that EvoRator2 obtained near state-of-the-art performance on proteins with high quality structures in predicting the effect of mutations in deep mutation scanning (DMS) experiments and that for certain DMS targets, EvoRator2 outperformed state-of-the-art methods. We also show that by combining EvoRator2's predictions with those obtained by a state-of-the-art deep-learning method that accounts for the information in the MSA, the prediction of the effect of mutation in DMS experiments was improved in terms of both accuracy and stability. EvoRator2 is designed to predict which amino-acid substitutions are tolerated in such proteins without many homologous sequences, including orphan or de novo designed proteins. We implemented our approach in the EvoRator web server (https://evorator.tau.ac.il).
Collapse
Affiliation(s)
- Natan Nagar
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Jérôme Tubiana
- Blavatnik School of Computer Science, Raymond & Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Gil Loewenthal
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Haim J Wolfson
- Blavatnik School of Computer Science, Raymond & Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Nir Ben Tal
- School of Neurobiology, Biochemistry & Biophysics, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.
| |
Collapse
|
6
|
Wagner N, Ben-Meir D, Teper D, Pupko T. Complete genome sequence of an Israeli isolate of Xanthomonas hortorum pv. pelargonii strain 305 and novel type III effectors identified in Xanthomonas. Front Plant Sci 2023; 14:1155341. [PMID: 37332699 PMCID: PMC10275491 DOI: 10.3389/fpls.2023.1155341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 05/10/2023] [Indexed: 06/20/2023]
Abstract
Xanthomonas hortorum pv. pelargonii is the causative agent of bacterial blight in geranium ornamental plants, the most threatening bacterial disease of this plant worldwide. Xanthomonas fragariae is the causative agent of angular leaf spot in strawberries, where it poses a significant threat to the strawberry industry. Both pathogens rely on the type III secretion system and the translocation of effector proteins into the plant cells for their pathogenicity. Effectidor is a freely available web server we have previously developed for the prediction of type III effectors in bacterial genomes. Following a complete genome sequencing and assembly of an Israeli isolate of Xanthomonas hortorum pv. pelargonii - strain 305, we used Effectidor to predict effector encoding genes both in this newly sequenced genome, and in X. fragariae strain Fap21, and validated its predictions experimentally. Four and two genes in X. hortorum and X. fragariae, respectively, contained an active translocation signal that allowed the translocation of the reporter AvrBs2 that induced the hypersensitive response in pepper leaves, and are thus considered validated novel effectors. These newly validated effectors are XopBB, XopBC, XopBD, XopBE, XopBF, and XopBG.
Collapse
Affiliation(s)
- Naama Wagner
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Daniella Ben-Meir
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Doron Teper
- Department of Plant Pathology and Weed Research, Institute of Plant Protection Agricultural Research Organization (ARO), Volcani Institute, Rishon LeZion, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
7
|
Dotan E, Alburquerque M, Wygoda E, Huchon D, Pupko T. GenomeFLTR: filtering reads made easy. Nucleic Acids Res 2023:7161531. [PMID: 37177997 DOI: 10.1093/nar/gkad410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Revised: 04/20/2023] [Accepted: 05/03/2023] [Indexed: 05/15/2023] Open
Abstract
In the last decade, advances in sequencing technology have led to an exponential increase in genomic data. These new data have dramatically changed our understanding of the evolution and function of genes and genomes. Despite improvements in sequencing technologies, identifying contaminated reads remains a complex task for many research groups. Here, we introduce GenomeFLTR, a new web server to filter contaminated reads. Reads are compared against existing sequence databases from various representative organisms to detect potential contaminants. The main features implemented in GenomeFLTR are: (i) automated updating of the relevant databases; (ii) fast comparison of each read against the database; (iii) the ability to create user-specified databases; (iv) a user-friendly interactive dashboard to investigate the origin and frequency of the contaminations; (v) the generation of a contamination-free file. Availability: https://genomefltr.tau.ac.il/.
Collapse
Affiliation(s)
- Edo Dotan
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Michael Alburquerque
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Elya Wygoda
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Dorothée Huchon
- School of Zoology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
- The Steinhardt Museum of Natural History, Israel National Center for Biodiversity Studies, Tel-Aviv University, Tel Aviv 69978, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
8
|
Yariv B, Yariv E, Kessel A, Masrati G, Chorin AB, Martz E, Mayrose I, Pupko T, Ben-Tal N. Using evolutionary data to make sense of macromolecules with a "face-lifted" ConSurf. Protein Sci 2023; 32:e4582. [PMID: 36718848 PMCID: PMC9942591 DOI: 10.1002/pro.4582] [Citation(s) in RCA: 55] [Impact Index Per Article: 55.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 01/21/2023] [Accepted: 01/27/2023] [Indexed: 02/01/2023]
Abstract
The ConSurf web-sever for the analysis of proteins, RNA, and DNA provides a quick and accurate estimate of the per-site evolutionary rate among homologues. The analysis reveals functionally important regions, such as catalytic and ligand-binding sites, which often evolve slowly. Since the last report in 2016, ConSurf has been improved in multiple ways. It now has a user-friendly interface that makes it easier to perform the analysis and to visualize the results. Evolutionary rates are calculated based on a set of homologous sequences, collected using hidden Markov model-based search tools, recently embedded in the pipeline. Using these, and following the removal of redundancy, ConSurf assembles a representative set of effective homologues for protein and nucleic acid queries to enable informative analysis of the evolutionary patterns. The analysis is particularly insightful when the evolutionary rates are mapped on the macromolecule structure. In this respect, the availability of AlphaFold model structures of essentially all UniProt proteins makes ConSurf particularly relevant to the research community. The UniProt ID of a query protein with an available AlphaFold model can now be used to start a calculation. Another important improvement is the Python re-implementation of the entire computational pipeline, making it easier to maintain. This Python pipeline is now available for download as a standalone version. We demonstrate some of ConSurf's key capabilities by the analysis of caveolin-1, the main protein of membrane invaginations called caveolae.
Collapse
Affiliation(s)
- Barak Yariv
- George S. Wise Faculty of Life Sciences, Department of Biochemistry and Molecular Biology, Tel Aviv University, Tel Aviv, Israel
| | - Elon Yariv
- George S. Wise Faculty of Life Sciences, Department of Biochemistry and Molecular Biology, Tel Aviv University, Tel Aviv, Israel
| | - Amit Kessel
- George S. Wise Faculty of Life Sciences, Department of Biochemistry and Molecular Biology, Tel Aviv University, Tel Aviv, Israel
| | - Gal Masrati
- George S. Wise Faculty of Life Sciences, Department of Biochemistry and Molecular Biology, Tel Aviv University, Tel Aviv, Israel
| | - Adi Ben Chorin
- George S. Wise Faculty of Life Sciences, Department of Biochemistry and Molecular Biology, Tel Aviv University, Tel Aviv, Israel
| | - Eric Martz
- Department of Microbiology, University of Massachusetts, Amherst, Massachusetts, USA
| | - Itay Mayrose
- George S. Wise Faculty of Life Sciences, School of Plant Sciences and Food Security, Tel Aviv University, Tel Aviv, Israel
| | - Tal Pupko
- George S. Wise Faculty of Life Sciences, The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Tel Aviv, Israel
| | - Nir Ben-Tal
- George S. Wise Faculty of Life Sciences, Department of Biochemistry and Molecular Biology, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
9
|
Loewenthal G, Wygoda E, Nagar N, Glick L, Mayrose I, Pupko T. The evolutionary dynamics that retain long neutral genomic sequences in face of indel deletion bias: a model and its application to human introns. Open Biol 2022; 12:220223. [PMID: 36514983 PMCID: PMC9748784 DOI: 10.1098/rsob.220223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Insertions and deletions (indels) of short DNA segments are common evolutionary events. Numerous studies showed that deletions occur more often than insertions in both prokaryotes and eukaryotes. It raises the question why neutral sequences are not eradicated from the genome. We suggest that this is due to a phenomenon we term border-induced selection. Accordingly, a neutral sequence is bordered between conserved regions. Deletions occurring near the borders occasionally protrude to the conserved region and are thereby subject to strong purifying selection. Thus, for short neutral sequences, an insertion bias is expected. Here, we develop a set of increasingly complex models of indel dynamics that incorporate border-induced selection. Furthermore, we show that short conserved sequences within the neutrally evolving sequence help explain: (i) the presence of very long sequences; (ii) the high variance of sequence lengths; and (iii) the possible emergence of multimodality in sequence length distributions. Finally, we fitted our models to the human intron length distribution, as introns are thought to be mostly neutral and bordered by conserved exons. We show that when accounting for the occurrence of short conserved sequences within introns, we reproduce the main features, including the presence of long introns and the multimodality of intron distribution.
Collapse
Affiliation(s)
- Gil Loewenthal
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Tel Aviv 69978, Israel
| | - Elya Wygoda
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Tel Aviv 69978, Israel
| | - Natan Nagar
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Tel Aviv 69978, Israel
| | - Lior Glick
- School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Itay Mayrose
- School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
10
|
Wagner N, Alburquerque M, Ecker N, Dotan E, Zerah B, Pena MM, Potnis N, Pupko T. Natural language processing approach to model the secretion signal of type III effectors. Front Plant Sci 2022; 13:1024405. [PMID: 36388586 PMCID: PMC9659976 DOI: 10.3389/fpls.2022.1024405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Accepted: 10/11/2022] [Indexed: 06/16/2023]
Abstract
Type III effectors are proteins injected by Gram-negative bacteria into eukaryotic hosts. In many plant and animal pathogens, these effectors manipulate host cellular processes to the benefit of the bacteria. Type III effectors are secreted by a type III secretion system that must "classify" each bacterial protein into one of two categories, either the protein should be translocated or not. It was previously shown that type III effectors have a secretion signal within their N-terminus, however, despite numerous efforts, the exact biochemical identity of this secretion signal is generally unknown. Computational characterization of the secretion signal is important for the identification of novel effectors and for better understanding the molecular translocation mechanism. In this work we developed novel machine-learning algorithms for characterizing the secretion signal in both plant and animal pathogens. Specifically, we represented each protein as a vector in high-dimensional space using Facebook's protein language model. Classification algorithms were next used to separate effectors from non-effector proteins. We subsequently curated a benchmark dataset of hundreds of effectors and thousands of non-effector proteins. We showed that on this curated dataset, our novel approach yielded substantially better classification accuracy compared to previously developed methodologies. We have also tested the hypothesis that plant and animal pathogen effectors are characterized by different secretion signals. Finally, we integrated the novel approach in Effectidor, a web-server for predicting type III effector proteins, leading to a more accurate classification of effectors from non-effectors.
Collapse
Affiliation(s)
- Naama Wagner
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Michael Alburquerque
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Noa Ecker
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Edo Dotan
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Ben Zerah
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Michelle Mendonca Pena
- Department of Entomology and Plant Pathology, Auburn University, Auburn, AL, United States
| | - Neha Potnis
- Department of Entomology and Plant Pathology, Auburn University, Auburn, AL, United States
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
11
|
Moshe A, Wygoda E, Ecker N, Loewenthal G, Avram O, Israeli O, Hazkani-Covo E, Pe'er I, Pupko T. An Approximate Bayesian Computation Approach for Modeling Genome Rearrangements. Mol Biol Evol 2022; 39:6772916. [DOI: 10.1093/molbev/msac231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Abstract
The inference of genome rearrangement events has been extensively studied, as they play a major role in molecular evolution. However, probabilistic evolutionary models that explicitly imitate the evolutionary dynamics of such events, as well as methods to infer model parameters, are yet to be fully utilized. Here, we developed a probabilistic approach to infer genome rearrangement rate parameters using an Approximate Bayesian Computation (ABC) framework. We developed two genome rearrangement models, a basic model, which accounts for genomic changes in gene order, and a more sophisticated one which also accounts for changes in chromosome number. We characterized the ABC inference accuracy using simulations and applied our methodology to both prokaryotic and eukaryotic empirical datasets. Knowledge of genome-rearrangement rates can help elucidate their role in evolution as well as help simulate genomes with evolutionary dynamics that reflect empirical genomes.
Collapse
Affiliation(s)
- Asher Moshe
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University , Tel Aviv 69978 , Israel
| | - Elya Wygoda
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University , Tel Aviv 69978 , Israel
| | - Noa Ecker
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University , Tel Aviv 69978 , Israel
| | - Gil Loewenthal
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University , Tel Aviv 69978 , Israel
| | - Oren Avram
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University , Tel Aviv 69978 , Israel
| | - Omer Israeli
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University , Tel Aviv 69978 , Israel
| | - Einat Hazkani-Covo
- Department of Natural and Life Sciences, Open University of Israel , Ra'anana , Israel
| | - Itsik Pe'er
- Department of Computer Science, Columbia University , New York, New York , USA
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University , Tel Aviv 69978 , Israel
| |
Collapse
|
12
|
Liyanapathiranage P, Wagner N, Avram O, Pupko T, Potnis N. Phylogenetic Distribution and Evolution of Type VI Secretion System in the Genus Xanthomonas. Front Microbiol 2022; 13:840308. [PMID: 35495725 PMCID: PMC9048695 DOI: 10.3389/fmicb.2022.840308] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 02/10/2022] [Indexed: 11/13/2022] Open
Abstract
The type VI secretion system (T6SS) present in many Gram-negative bacteria is a contact-dependent apparatus that can directly deliver secreted effectors or toxins into diverse neighboring cellular targets including both prokaryotic and eukaryotic organisms. Recent reverse genetics studies with T6 core gene loci have indicated the importance of functional T6SS toward overall competitive fitness in various pathogenic Xanthomonas spp. To understand the contribution of T6SS toward ecology and evolution of Xanthomonas spp., we explored the distribution of the three distinguishable T6SS clusters, i3*, i3***, and i4, in approximately 1,740 Xanthomonas genomes, along with their conservation, genetic organization, and their evolutionary patterns in this genus. Screening genomes for core genes of each T6 cluster indicated that 40% of the sequenced strains possess two T6 clusters, with combinations of i3*** and i3* or i3*** and i4. A few strains of Xanthomonas citri, Xanthomonas phaseoli, and Xanthomonas cissicola were the exception, possessing a unique combination of i3* and i4. The findings also indicated clade-specific distribution of T6SS clusters. Phylogenetic analysis demonstrated that T6SS clusters i3* and i3*** were probably acquired by the ancestor of the genus Xanthomonas, followed by gain or loss of individual clusters upon diversification into subsequent clades. T6 i4 cluster has been acquired in recent independent events by group 2 xanthomonads followed by its spread via horizontal dissemination across distinct clades across groups 1 and 2 xanthomonads. We also noted reshuffling of the entire core T6 loci, as well as T6SS spike complex components, hcp and vgrG, among different species. Our findings indicate that gain or loss events of specific T6SS clusters across Xanthomonas phylogeny have not been random.
Collapse
Affiliation(s)
| | - Naama Wagner
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Oren Avram
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Neha Potnis
- Department of Entomology and Plant Pathology, Auburn University, Auburn, AL, United States
- *Correspondence: Neha Potnis,
| |
Collapse
|
13
|
Labes S, Stupp D, Wagner N, Bloch I, Lotem M, L Lahad E, Polak P, Pupko T, Tabach Y. Machine-learning of complex evolutionary signals improves classification of SNVs. NAR Genom Bioinform 2022; 4:lqac025. [PMID: 35402908 PMCID: PMC8988715 DOI: 10.1093/nargab/lqac025] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Revised: 02/08/2022] [Accepted: 03/28/2022] [Indexed: 12/12/2022] Open
Abstract
Conservation is a strong predictor for the pathogenicity of single-nucleotide variants (SNVs). However, some positions that present complex conservation patterns across vertebrates stray from this paradigm. Here, we analyzed the association between complex conservation patterns and the pathogenicity of SNVs in the 115 disease-genes that had sufficient variant data. We show that conservation is not a one-rule-fits-all solution since its accuracy highly depends on the analyzed set of species and genes. For example, pairwise comparisons between the human and 99 vertebrate species showed that species differ in their ability to predict the clinical outcomes of variants among different genes using conservation. Furthermore, certain genes were less amenable for conservation-based variant prediction, while others demonstrated species that optimize prediction. These insights led to developing EvoDiagnostics, which uses the conservation against each species as a feature within a random-forest machine-learning classification algorithm. EvoDiagnostics outperformed traditional conservation algorithms, deep-learning based methods and most ensemble tools in every prediction-task, highlighting the strength of optimizing conservation analysis per-species and per-gene. Overall, we suggest a new and a more biologically relevant approach for analyzing conservation, which improves prediction of variant pathogenicity.
Collapse
Affiliation(s)
- Sapir Labes
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Faculty of Medicine, and Hadassah University Medical School, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| | - Doron Stupp
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Faculty of Medicine, and Hadassah University Medical School, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| | - Naama Wagner
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Idit Bloch
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Faculty of Medicine, and Hadassah University Medical School, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| | - Michal Lotem
- Sharett Institute of Oncology, Hadassah University Medical Center, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| | - Ephrat L Lahad
- Medical Genetics Institute, Shaare Zedek Medical Center, Jerusalem9103102, Israel
| | - Paz Polak
- Oncological Sciences, Icahn School of Medicine at Mount Sinai, NY10029, USA
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Yuval Tabach
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Faculty of Medicine, and Hadassah University Medical School, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| |
Collapse
|
14
|
Wagner N, Avram O, Gold-Binshtok D, Zerah B, Teper D, Pupko T. Effectidor: an automated machine-learning-based web server for the prediction of type-III secretion system effectors. Bioinformatics 2022; 38:2341-2343. [PMID: 35157036 DOI: 10.1093/bioinformatics/btac087] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 01/31/2022] [Accepted: 02/08/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Type-III secretion systems are utilized by many Gram-negative bacteria to inject type-3 effectors (T3Es) to eukaryotic cells. These effectors manipulate host processes for the benefit of the bacteria and thus promote disease. They can also function as host-specificity determinants through their recognition as avirulence proteins that elicit immune response. Identifying the full effector repertoire within a set of bacterial genomes is of great importance to develop appropriate treatments against the associated pathogens. RESULTS We present Effectidor, a user-friendly web server that harnesses several machine-learning techniques to predict T3Es within bacterial genomes. We compared the performance of Effectidor to other available tools for the same task on three pathogenic bacteria. Effectidor outperformed these tools in terms of classification accuracy (area under the precision-recall curve above 0.98 in all cases). AVAILABILITY AND IMPLEMENTATION Effectidor is available at: https://effectidor.tau.ac.il, and the source code is available at: https://github.com/naamawagner/Effectidor. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Naama Wagner
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Oren Avram
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Dafna Gold-Binshtok
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Ben Zerah
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Doron Teper
- Department of Plant Pathology and Weed Research, Institute of Plant Protection Agricultural Research Organization (ARO), Volcani Center, Rishon LeZion 7505101, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
15
|
Nagar N, Ben Tal N, Pupko T. EvoRator: Prediction of residue-level evolutionary rates from protein structures using machine learning. J Mol Biol 2022; 434:167538. [DOI: 10.1016/j.jmb.2022.167538] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 03/07/2022] [Accepted: 03/07/2022] [Indexed: 10/18/2022]
|
16
|
Abstract
Various Gram-negative bacteria use secretion systems to secrete effector proteins that manipulate host biochemical pathways to their benefit. We and others have previously developed machine-learning algorithms to predict novel effectors. Specifically, given a set of known effectors and a set of known non-effectors, the machine-learning algorithm extracts features that distinguish these two protein groups. In the training phase, the machine learning learns how to best combine the features to separate the two groups. The trained machine learning is then applied to open reading frames (ORFs) with unknown functions, resulting in a score for each ORF, which is its likelihood to be an effector. We developed Effectidor, a web server for predicting type III effectors. In this book chapter, we provide a step-by-step introduction to the application of Effectidor, from selecting input data to analyzing the obtained predictions.
Collapse
Affiliation(s)
- Naama Wagner
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Tel Aviv, Israel
| | - Doron Teper
- Department of Plant Pathology and Weed Research, Institute of Plant Protection, Agricultural Research Organization (ARO), Volcani Center, Rishon LeZion, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
17
|
Ecker N, Azouri D, Bettisworth B, Stamatakis A, Mansour Y, Mayrose I, Pupko T. OUP accepted manuscript. Bioinformatics 2022; 38:i118-i124. [PMID: 35758778 PMCID: PMC9236582 DOI: 10.1093/bioinformatics/btac252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Motivation In recent years, full-genome sequences have become increasingly available and as a result many modern phylogenetic analyses are based on very long sequences, often with over 100 000 sites. Phylogenetic reconstructions of large-scale alignments are challenging for likelihood-based phylogenetic inference programs and usually require using a powerful computer cluster. Current tools for alignment trimming prior to phylogenetic analysis do not promise a significant reduction in the alignment size and are claimed to have a negative effect on the accuracy of the obtained tree. Results Here, we propose an artificial-intelligence-based approach, which provides means to select the optimal subset of sites and a formula by which one can compute the log-likelihood of the entire data based on this subset. Our approach is based on training a regularized Lasso-regression model that optimizes the log-likelihood prediction accuracy while putting a constraint on the number of sites used for the approximation. We show that computing the likelihood based on 5% of the sites already provides accurate approximation of the tree likelihood based on the entire data. Furthermore, we show that using this Lasso-based approximation during a tree search decreased running-time substantially while retaining the same tree-search performance. Availability and implementation The code was implemented in Python version 3.8 and is available through GitHub (https://github.com/noaeker/lasso_positions_sampling). The datasets used in this paper were retrieved from Zhou et al. (2018) as described in section 3. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Noa Ecker
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Dana Azouri
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
- School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Ben Bettisworth
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, 76128 Karlsruhe, Germany
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, 76128 Karlsruhe, Germany
| | - Yishay Mansour
- The Blavatnik School of Computer Science, Raymond & Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Itay Mayrose
- To whom correspondence should be addressed. E-mail: or
| | - Tal Pupko
- To whom correspondence should be addressed. E-mail: or
| |
Collapse
|
18
|
Loewenthal G, Rapoport D, Avram O, Moshe A, Wygoda E, Itzkovitch A, Israeli O, Azouri D, Cartwright RA, Mayrose I, Pupko T. A probabilistic model for indel evolution: differentiating insertions from deletions. Mol Biol Evol 2021; 38:5769-5781. [PMID: 34469521 PMCID: PMC8662616 DOI: 10.1093/molbev/msab266] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Insertions and deletions (indels) are common molecular evolutionary events. However, probabilistic models for indel evolution are under-developed due to their computational complexity. Here, we introduce several improvements to indel modeling: 1) While previous models for indel evolution assumed that the rates and length distributions of insertions and deletions are equal, here we propose a richer model that explicitly distinguishes between the two; 2) we introduce numerous summary statistics that allow approximate Bayesian computation-based parameter estimation; 3) we develop a method to correct for biases introduced by alignment programs, when inferring indel parameters from empirical data sets; and 4) using a model-selection scheme, we test whether the richer model better fits biological data compared with the simpler model. Our analyses suggest that both our inference scheme and the model-selection procedure achieve high accuracy on simulated data. We further demonstrate that our proposed richer model better fits a large number of empirical data sets and that, for the majority of these data sets, the deletion rate is higher than the insertion rate.
Collapse
Affiliation(s)
- Gil Loewenthal
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Dana Rapoport
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Oren Avram
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Asher Moshe
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Elya Wygoda
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Alon Itzkovitch
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Omer Israeli
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Dana Azouri
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.,School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Reed A Cartwright
- The Biodesign Institute, Arizona State University, Tempe, Arizona, USA.,School of Life Sciences, Arizona State University, Tempe, Arizona, USA
| | - Itay Mayrose
- School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
19
|
Ashkenazy H, Avram O, Ryvkin A, Roitburd-Berman A, Weiss-Ottolenghi Y, Hada-Neeman S, Gershoni JM, Pupko T. Motifier: An IgOme Profiler Based on Peptide Motifs Using Machine Learning. J Mol Biol 2021; 433:167071. [PMID: 34052285 DOI: 10.1016/j.jmb.2021.167071] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 04/26/2021] [Accepted: 05/22/2021] [Indexed: 11/26/2022]
Abstract
Antibodies provide a comprehensive record of the encounters with threats and insults to the immune system. The ability to examine the repertoire of antibodies in serum and discover those that best represent "discriminating features" characteristic of various clinical situations, is potentially very useful. Recently, phage display technologies combined with Next-Generation Sequencing (NGS) produced a powerful experimental methodology, coined "Deep-Panning", in which the spectrum of serum antibodies is probed. In order to extract meaningful biological insights from the tens of millions of affinity-selected peptides generated by Deep-Panning, advanced bioinformatics algorithms are a must. In this study, we describe Motifier, a computational pipeline comprised of a set of algorithms that systematically generates discriminatory peptide motifs based on the affinity-selected peptides identified by Deep-Panning. These motifs are shown to effectively characterize antibody binding activities and through the implementation of machine-learning protocols are shown to accurately classify complex antibody mixtures representing various biological conditions.
Collapse
Affiliation(s)
- Haim Ashkenazy
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Oren Avram
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Arie Ryvkin
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Anna Roitburd-Berman
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Yael Weiss-Ottolenghi
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Smadar Hada-Neeman
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Jonathan M Gershoni
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.
| |
Collapse
|
20
|
Abadi S, Avram O, Rosset S, Pupko T, Mayrose I. ModelTeller: Model Selection for Optimal Phylogenetic Reconstruction Using Machine Learning. Mol Biol Evol 2021; 37:3338-3352. [PMID: 32585030 DOI: 10.1093/molbev/msaa154] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Statistical criteria have long been the standard for selecting the best model for phylogenetic reconstruction and downstream statistical inference. Although model selection is regarded as a fundamental step in phylogenetics, existing methods for this task consume computational resources for long processing time, they are not always feasible, and sometimes depend on preliminary assumptions which do not hold for sequence data. Moreover, although these methods are dedicated to revealing the processes that underlie the sequence data, they do not always produce the most accurate trees. Notably, phylogeny reconstruction consists of two related tasks, topology reconstruction and branch-length estimation. It was previously shown that in many cases the most complex model, GTR+I+G, leads to topologies that are as accurate as using existing model selection criteria, but overestimates branch lengths. Here, we present ModelTeller, a computational methodology for phylogenetic model selection, devised within the machine-learning framework, optimized to predict the most accurate nucleotide substitution model for branch-length estimation. We demonstrate that ModelTeller leads to more accurate branch-length inference than current model selection criteria on data sets simulated under realistic processes. ModelTeller relies on a readily implemented machine-learning model and thus the prediction according to features extracted from the sequence data results in a substantial decrease in running time compared with existing strategies. By harnessing the machine-learning framework, we distinguish between features that mostly contribute to branch-length optimization, concerning the extent of sequence divergence, and features that are related to estimates of the model parameters that are important for the selection made by current criteria.
Collapse
Affiliation(s)
- Shiran Abadi
- School of Plant Sciences and Food security, Tel-Aviv University, Tel-Aviv, Israel
| | - Oren Avram
- School of Molecular Cell Biology & Biotechnology, Tel-Aviv University, Tel-Aviv, Israel
| | - Saharon Rosset
- Department of Statistics and Operations Research, School of Mathematical Sciences, Tel-Aviv University, Tel-Aviv, Israel
| | - Tal Pupko
- School of Molecular Cell Biology & Biotechnology, Tel-Aviv University, Tel-Aviv, Israel
| | - Itay Mayrose
- School of Plant Sciences and Food security, Tel-Aviv University, Tel-Aviv, Israel
| |
Collapse
|
21
|
Abstract
Inferring a phylogenetic tree is a fundamental challenge in evolutionary studies. Current paradigms for phylogenetic tree reconstruction rely on performing costly likelihood optimizations. With the aim of making tree inference feasible for problems involving more than a handful of sequences, inference under the maximum-likelihood paradigm integrates heuristic approaches to evaluate only a subset of all potential trees. Consequently, existing methods suffer from the known tradeoff between accuracy and running time. In this proof-of-concept study, we train a machine-learning algorithm over an extensive cohort of empirical data to predict the neighboring trees that increase the likelihood, without actually computing their likelihood. This provides means to safely discard a large set of the search space, thus potentially accelerating heuristic tree searches without losing accuracy. Our analyses suggest that machine learning can guide tree-search methodologies towards the most promising candidate trees.
Collapse
Affiliation(s)
- Dana Azouri
- School of Plant Sciences and Food Security, Tel Aviv University, Ramat Aviv, Tel-Aviv, Israel
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Ramat Aviv, Tel-Aviv, Israel
| | - Shiran Abadi
- School of Plant Sciences and Food Security, Tel Aviv University, Ramat Aviv, Tel-Aviv, Israel
| | - Yishay Mansour
- Balvatnik School of Computer Science, Tel-Aviv University, Ramat Aviv, Tel-Aviv, Israel
| | - Itay Mayrose
- School of Plant Sciences and Food Security, Tel Aviv University, Ramat Aviv, Tel-Aviv, Israel.
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Ramat Aviv, Tel-Aviv, Israel.
| |
Collapse
|
22
|
Ruano-Gallego D, Sanchez-Garrido J, Kozik Z, Núñez-Berrueco E, Cepeda-Molero M, Mullineaux-Sanders C, Naemi Baghshomali Y, Slater SL, Wagner N, Glegola-Madejska I, Roumeliotis TI, Pupko T, Fernández LÁ, Rodríguez-Patón A, Choudhary JS, Frankel G. Type III secretion system effectors form robust and flexible intracellular virulence networks. Science 2021; 371:eabc9531. [PMID: 33707240 DOI: 10.1126/science.abc9531] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Revised: 12/15/2020] [Accepted: 01/15/2021] [Indexed: 12/14/2022]
Abstract
Infections with many Gram-negative pathogens, including Escherichia coli, Salmonella, Shigella, and Yersinia, rely on type III secretion system (T3SS) effectors. We hypothesized that while hijacking processes within mammalian cells, the effectors operate as a robust network that can tolerate substantial contractions. This was tested in vivo using the mouse pathogen Citrobacter rodentium (encoding 31 effectors). Sequential gene deletions showed that effector essentiality for infection was context dependent and that the network could tolerate 60% contraction while maintaining pathogenicity. Despite inducing very different colonic cytokine profiles (e.g., interleukin-22, interleukin-17, interferon-γ, or granulocyte-macrophage colony-stimulating factor), different networks induced protective immunity. Using data from >100 distinct mutant combinations, we built and trained a machine learning model able to predict colonization outcomes, which were confirmed experimentally. Furthermore, reproducing the human-restricted enteropathogenic E. coli effector repertoire in C. rodentium was not sufficient for efficient colonization, which implicates effector networks in host adaptation. These results unveil the extreme robustness of both T3SS effector networks and host responses.
Collapse
Affiliation(s)
- David Ruano-Gallego
- Centre for Molecular Microbiology and Infection, Department of Life Sciences, Imperial College, London, UK
| | - Julia Sanchez-Garrido
- Centre for Molecular Microbiology and Infection, Department of Life Sciences, Imperial College, London, UK
| | - Zuzanna Kozik
- Functional Proteomics Group, Chester Beatty Laboratories, Institute of Cancer Research, London, UK
| | - Elena Núñez-Berrueco
- Laboratorio de Inteligencia Artificial, Departamento de Inteligencia Artificial, Universidad Politécnica de Madrid, Campus de Montegancedo, Boadilla del Monte, Madrid, Spain
| | - Massiel Cepeda-Molero
- Centre for Molecular Microbiology and Infection, Department of Life Sciences, Imperial College, London, UK
| | | | - Yasaman Naemi Baghshomali
- Centre for Molecular Microbiology and Infection, Department of Life Sciences, Imperial College, London, UK
| | - Sabrina L Slater
- Centre for Molecular Microbiology and Infection, Department of Life Sciences, Imperial College, London, UK
| | - Naama Wagner
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Izabela Glegola-Madejska
- Centre for Molecular Microbiology and Infection, Department of Life Sciences, Imperial College, London, UK
| | - Theodoros I Roumeliotis
- Functional Proteomics Group, Chester Beatty Laboratories, Institute of Cancer Research, London, UK
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Luis Ángel Fernández
- Centro Nacional de Biotecnología (CNB-CSIC), Department of Microbial Biotechnology, Madrid, Spain
| | - Alfonso Rodríguez-Patón
- Laboratorio de Inteligencia Artificial, Departamento de Inteligencia Artificial, Universidad Politécnica de Madrid, Campus de Montegancedo, Boadilla del Monte, Madrid, Spain
| | - Jyoti S Choudhary
- Functional Proteomics Group, Chester Beatty Laboratories, Institute of Cancer Research, London, UK.
| | - Gad Frankel
- Centre for Molecular Microbiology and Infection, Department of Life Sciences, Imperial College, London, UK.
| |
Collapse
|
23
|
Hada-Neeman S, Weiss-Ottolenghi Y, Wagner N, Avram O, Ashkenazy H, Maor Y, Sklan EH, Shcherbakov D, Pupko T, Gershoni JM. Domain-Scan: Combinatorial Sero-Diagnosis of Infectious Diseases Using Machine Learning. Front Immunol 2021; 11:619896. [PMID: 33643301 PMCID: PMC7902724 DOI: 10.3389/fimmu.2020.619896] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 12/29/2020] [Indexed: 12/30/2022] Open
Abstract
The presence of pathogen-specific antibodies in an individual's blood-sample is used as an indication of previous exposure and infection to that specific pathogen (e.g., virus or bacterium). Measurement of the diagnostic antibodies is routinely achieved using solid phase immuno-assays such as ELISA tests and western blots. Here, we describe a sero-diagnostic approach based on phage-display of epitope arrays we term "Domain-Scan". We harness Next-generation sequencing (NGS) to measure the serum binding to dozens of epitopes derived from HIV-1 and HCV simultaneously. The distinction of healthy individuals from those infected with either HIV-1 or HCV, is modeled as a machine-learning classification problem, in which each determinant ("domain") is considered as a feature, and its NGS read-out provides values that correspond to the level of determinant-specific antibodies in the sample. We show that following training of a machine-learning model on labeled examples, we can very accurately classify unlabeled samples and pinpoint the domains that contribute most to the classification. Our experimental/computational Domain-Scan approach is general and can be adapted to other pathogens as long as sufficient training samples are provided.
Collapse
Affiliation(s)
- Smadar Hada-Neeman
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Yael Weiss-Ottolenghi
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Naama Wagner
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Oren Avram
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Haim Ashkenazy
- Max Planck Institute for Developmental Biology, Max Planck Society (MPG), Tübingen, Germany
| | - Yaakov Maor
- Institute of Gastroenterology and Hepatology, Kaplan Medical Center, Rehovot, Israel
| | - Ella H Sklan
- Department of Clinical Microbiology and Immunology, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Dmitry Shcherbakov
- Russian-American Anti-Cancer Center, Altai State University, Barnaul, Russia
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Jonathan M Gershoni
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
24
|
Avram O, Kigel A, Vaisman-Mentesh A, Kligsberg S, Rosenstein S, Dror Y, Pupko T, Wine Y. PASA: Proteomic analysis of serum antibodies web server. PLoS Comput Biol 2021; 17:e1008607. [PMID: 33493161 PMCID: PMC7861515 DOI: 10.1371/journal.pcbi.1008607] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 02/04/2021] [Accepted: 12/06/2020] [Indexed: 01/17/2023] Open
Abstract
MOTIVATION A comprehensive characterization of the humoral response towards a specific antigen requires quantification of the B-cell receptor repertoire by next-generation sequencing (BCR-Seq), as well as the analysis of serum antibodies against this antigen, using proteomics. The proteomic analysis is challenging since it necessitates the mapping of antigen-specific peptides to individual B-cell clones. RESULTS The PASA web server provides a robust computational platform for the analysis and integration of data obtained from proteomics of serum antibodies. PASA maps peptides derived from antibodies raised against a specific antigen to corresponding antibody sequences. It then analyzes and integrates proteomics and BCR-Seq data, thus providing a comprehensive characterization of the humoral response. The PASA web server is freely available at https://pasa.tau.ac.il and open to all users without a login requirement.
Collapse
Affiliation(s)
- Oren Avram
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Aya Kigel
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Anna Vaisman-Mentesh
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Sharon Kligsberg
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Shai Rosenstein
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Yael Dror
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Yariv Wine
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
25
|
Loewenthal G, Abadi S, Avram O, Halabi K, Ecker N, Nagar N, Mayrose I, Pupko T. COVID-19 pandemic-related lockdown: response time is more important than its strictness. EMBO Mol Med 2020; 12:e13171. [PMID: 33073919 PMCID: PMC7645374 DOI: 10.15252/emmm.202013171] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 09/21/2020] [Accepted: 09/23/2020] [Indexed: 01/20/2023] Open
Abstract
The rapid spread of SARS-CoV-2 and its threat to health systems worldwide have led governments to take acute actions to enforce social distancing. Previous studies used complex epidemiological models to quantify the effect of lockdown policies on infection rates. However, these rely on prior assumptions or on official regulations. Here, we use country-specific reports of daily mobility from people cellular usage to model social distancing. Our data-driven model enabled the extraction of lockdown characteristics which were crossed with observed mortality rates to show that: (i) the time at which social distancing was initiated is highly correlated with the number of deaths, r2 = 0.64, while the lockdown strictness or its duration is not as informative; (ii) a delay of 7.49 days in initiating social distancing would double the number of deaths; and (iii) the immediate response has a prolonged effect on COVID-19 death toll.
Collapse
Affiliation(s)
- Gil Loewenthal
- The Shmunis School of Biomedicine and Cancer ResearchGeorge S. Wise Faculty of Life SciencesTel Aviv UniversityTel AvivIsrael
| | - Shiran Abadi
- School of Plant Sciences and Food SecurityGeorge S. Wise Faculty of Life SciencesTel Aviv UniversityTel AvivIsrael
| | - Oren Avram
- The Shmunis School of Biomedicine and Cancer ResearchGeorge S. Wise Faculty of Life SciencesTel Aviv UniversityTel AvivIsrael
| | - Keren Halabi
- School of Plant Sciences and Food SecurityGeorge S. Wise Faculty of Life SciencesTel Aviv UniversityTel AvivIsrael
| | - Noa Ecker
- The Shmunis School of Biomedicine and Cancer ResearchGeorge S. Wise Faculty of Life SciencesTel Aviv UniversityTel AvivIsrael
| | - Natan Nagar
- The Shmunis School of Biomedicine and Cancer ResearchGeorge S. Wise Faculty of Life SciencesTel Aviv UniversityTel AvivIsrael
| | - Itay Mayrose
- School of Plant Sciences and Food SecurityGeorge S. Wise Faculty of Life SciencesTel Aviv UniversityTel AvivIsrael
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer ResearchGeorge S. Wise Faculty of Life SciencesTel Aviv UniversityTel AvivIsrael
| |
Collapse
|
26
|
Avram O, Rapoport D, Portugez S, Pupko T. M1CR0B1AL1Z3R—a user-friendly web server for the analysis of large-scale microbial genomics data. Access Microbiol 2020. [DOI: 10.1099/acmi.ac2020.po1014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Large-scale mining and analysis of bacterial datasets contribute to the comprehensive characterization of complex microbial dynamics within a microbiome and among different bacterial strains, e.g., during disease outbreaks. The study of large-scale bacterial evolutionary dynamics poses many challenges. These include data-mining steps, such as gene annotation, ortholog detection, sequence alignment, and phylogeny reconstruction. These steps require the use of multiple bioinformatics tools and ad-hoc programming scripts, making the entire process cumbersome, tedious and error-prone due to manual handling. This motivated us to develop the M1CR0B1AL1Z3R web server, a ‘one-stop shop’ for conducting microbial genomics data analyses via a simple graphical user interface (Avram, et al., Nucleic Acids Res., 2019). Some of the features implemented in M1CR0B1AL1Z3R are: (i) extracting putative open reading frames and comparative genomics analysis of gene content; (ii) extracting orthologous sets and analyzing their size distribution; (iii) analyzing gene presence-absence patterns; (iv) reconstructing a phylogenetic tree based on the extracted orthologous set; (v) inferring GC-content variation among lineages. M1CR0B1AL1Z3R facilitates the mining and analysis of dozens of bacterial genomes using advanced techniques, with the click of a button. M1CR0B1AL1Z3R is freely available at https://microbializer.tau.ac.il/ [https://microbializer.tau.ac.il/].
Collapse
|
27
|
Moshe A, Pupko T. Ancestral sequence reconstruction: accounting for structural information by averaging over replacement matrices. Bioinformatics 2020; 35:2562-2568. [PMID: 30590382 DOI: 10.1093/bioinformatics/bty1031] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Revised: 12/03/2018] [Accepted: 12/16/2018] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Ancestral sequence reconstruction (ASR) is widely used to understand protein evolution, structure and function. Current ASR methodologies do not fully consider differences in evolutionary constraints among positions imposed by the three-dimensional (3D) structure of the protein. Here, we developed an ASR algorithm that allows different protein sites to evolve according to different mixtures of replacement matrices. We show that assigning replacement matrices to protein positions based on their solvent accessibility leads to ASR with higher log-likelihoods compared to naïve models that assume a single replacement matrix for all sites. Improved ASR log-likelihoods are also demonstrated when solvent accessibility is predicted from protein sequences rather than inferred from a known 3D structure. Finally, we show that using such structure-aware mixture models results in substantial differences in the inferred ancestral sequences. AVAILABILITY AND IMPLEMENTATION http://fastml.tau.ac.il. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Asher Moshe
- Department of Cell Research and Immunology, School of Molecular Cell Biology and Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Tal Pupko
- Department of Cell Research and Immunology, School of Molecular Cell Biology and Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
28
|
Avram O, Rapoport D, Portugez S, Pupko T. M1CR0B1AL1Z3R-a user-friendly web server for the analysis of large-scale microbial genomics data. Nucleic Acids Res 2020; 47:W88-W92. [PMID: 31114912 PMCID: PMC6602433 DOI: 10.1093/nar/gkz423] [Citation(s) in RCA: 68] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Revised: 04/29/2019] [Accepted: 05/06/2019] [Indexed: 11/21/2022] Open
Abstract
Large-scale mining and analysis of bacterial datasets contribute to the comprehensive characterization of complex microbial dynamics within a microbiome and among different bacterial strains, e.g., during disease outbreaks. The study of large-scale bacterial evolutionary dynamics poses many challenges. These include data-mining steps, such as gene annotation, ortholog detection, sequence alignment and phylogeny reconstruction. These steps require the use of multiple bioinformatics tools and ad-hoc programming scripts, making the entire process cumbersome, tedious and error-prone due to manual handling. This motivated us to develop the M1CR0B1AL1Z3R web server, a ‘one-stop shop’ for conducting microbial genomics data analyses via a simple graphical user interface. Some of the features implemented in M1CR0B1AL1Z3R are: (i) extracting putative open reading frames and comparative genomics analysis of gene content; (ii) extracting orthologous sets and analyzing their size distribution; (iii) analyzing gene presence–absence patterns; (iv) reconstructing a phylogenetic tree based on the extracted orthologous set; (v) inferring GC-content variation among lineages. M1CR0B1AL1Z3R facilitates the mining and analysis of dozens of bacterial genomes using advanced techniques, with the click of a button. M1CR0B1AL1Z3R is freely available at https://microbializer.tau.ac.il/.
Collapse
Affiliation(s)
- Oren Avram
- The School of Molecular Cell Biology & Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Dana Rapoport
- The School of Molecular Cell Biology & Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Shir Portugez
- The School of Molecular Cell Biology & Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tal Pupko
- The School of Molecular Cell Biology & Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
29
|
Jiménez‐Guerrero I, Pérez‐Montaño F, Da Silva GM, Wagner N, Shkedy D, Zhao M, Pizarro L, Bar M, Walcott R, Sessa G, Pupko T, Burdman S. Show me your secret(ed) weapons: a multifaceted approach reveals a wide arsenal of type III-secreted effectors in the cucurbit pathogenic bacterium Acidovorax citrulli and novel effectors in the Acidovorax genus. Mol Plant Pathol 2020; 21:17-37. [PMID: 31643123 PMCID: PMC6913199 DOI: 10.1111/mpp.12877] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
The cucurbit pathogenic bacterium Acidovorax citrulli requires a functional type III secretion system (T3SS) for pathogenicity. In this bacterium, as with Xanthomonas and Ralstonia spp., an AraC-type transcriptional regulator, HrpX, regulates expression of genes encoding T3SS components and type III-secreted effectors (T3Es). The annotation of a sequenced A. citrulli strain revealed 11 T3E genes. Assuming that this could be an underestimation, we aimed to uncover the T3E arsenal of the A. citrulli model strain, M6. Thorough sequence analysis revealed 51 M6 genes whose products are similar to known T3Es. Furthermore, we combined machine learning and transcriptomics to identify novel T3Es. The machine-learning approach ranked all A. citrulli M6 genes according to their propensity to encode T3Es. RNA-Seq revealed differential gene expression between wild-type M6 and a mutant defective in HrpX: 159 and 28 genes showed significantly reduced and increased expression in the mutant relative to wild-type M6, respectively. Data combined from these approaches led to the identification of seven novel T3E candidates that were further validated using a T3SS-dependent translocation assay. These T3E genes encode hypothetical proteins that seem to be restricted to plant pathogenic Acidovorax species. Transient expression in Nicotiana benthamiana revealed that two of these T3Es localize to the cell nucleus and one interacts with the endoplasmic reticulum. This study places A. citrulli among the 'richest' bacterial pathogens in terms of T3E cargo. It also revealed novel T3Es that appear to be involved in the pathoadaptive evolution of plant pathogenic Acidovorax species.
Collapse
Affiliation(s)
- Irene Jiménez‐Guerrero
- Department of Plant Pathology and MicrobiologyThe Robert H. Smith Faculty of Agriculture, Food and EnvironmentThe Hebrew University of JerusalemRehovotIsrael
| | - Francisco Pérez‐Montaño
- Department of Plant Pathology and MicrobiologyThe Robert H. Smith Faculty of Agriculture, Food and EnvironmentThe Hebrew University of JerusalemRehovotIsrael
- Department of MicrobiologyUniversity of SevilleSevilleSpain
| | - Gustavo Mateus Da Silva
- Department of Plant Pathology and MicrobiologyThe Robert H. Smith Faculty of Agriculture, Food and EnvironmentThe Hebrew University of JerusalemRehovotIsrael
| | - Naama Wagner
- The School of Molecular Cell Biology and BiotechnologyThe George S. Wise Faculty of Life SciencesTel Aviv UniversityTel AvivIsrael
| | - Dafna Shkedy
- The School of Molecular Cell Biology and BiotechnologyThe George S. Wise Faculty of Life SciencesTel Aviv UniversityTel AvivIsrael
| | - Mei Zhao
- Department of Plant PathologyUniversity of GeorgiaAthensGAUSA
| | - Lorena Pizarro
- Department of Plant Pathology and Weed ResearchAgricultural Research OrganizationThe Volcani CenterBet DaganIsrael
| | - Maya Bar
- Department of Plant Pathology and Weed ResearchAgricultural Research OrganizationThe Volcani CenterBet DaganIsrael
| | - Ron Walcott
- Department of Plant PathologyUniversity of GeorgiaAthensGAUSA
| | - Guido Sessa
- School of Plant Sciences and Food SecurityThe George S. Wise Faculty of Life SciencesTel Aviv UniversityTel AvivIsrael
| | - Tal Pupko
- The School of Molecular Cell Biology and BiotechnologyThe George S. Wise Faculty of Life SciencesTel Aviv UniversityTel AvivIsrael
| | - Saul Burdman
- Department of Plant Pathology and MicrobiologyThe Robert H. Smith Faculty of Agriculture, Food and EnvironmentThe Hebrew University of JerusalemRehovotIsrael
| |
Collapse
|
30
|
Sügis E, Dauvillier J, Leontjeva A, Adler P, Hindie V, Moncion T, Collura V, Daudin R, Loe-Mie Y, Herault Y, Lambert JC, Hermjakob H, Pupko T, Rain JC, Xenarios I, Vilo J, Simonneau M, Peterson H. HENA, heterogeneous network-based data set for Alzheimer's disease. Sci Data 2019; 6:151. [PMID: 31413325 PMCID: PMC6694132 DOI: 10.1038/s41597-019-0152-0] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 06/18/2019] [Indexed: 12/13/2022] Open
Abstract
Alzheimer's disease and other types of dementia are the top cause for disabilities in later life and various types of experiments have been performed to understand the underlying mechanisms of the disease with the aim of coming up with potential drug targets. These experiments have been carried out by scientists working in different domains such as proteomics, molecular biology, clinical diagnostics and genomics. The results of such experiments are stored in the databases designed for collecting data of similar types. However, in order to get a systematic view of the disease from these independent but complementary data sets, it is necessary to combine them. In this study we describe a heterogeneous network-based data set for Alzheimer's disease (HENA). Additionally, we demonstrate the application of state-of-the-art graph convolutional networks, i.e. deep learning methods for the analysis of such large heterogeneous biological data sets. We expect HENA to allow scientists to explore and analyze their own results in the broader context of Alzheimer's disease research.
Collapse
Affiliation(s)
- Elena Sügis
- Quretec Ltd., Ülikooli 6a, 51003, Tartu, Estonia
- Institute of Computer Science, University of Tartu, J. Liivi 2, 50409, Tartu, Estonia
| | - Jerome Dauvillier
- Swiss Institute of Bioinformatics, Vital-IT group, Unil Quartier Sorge, Genopode building, CH-1015, Lausanne, Switzerland
| | - Anna Leontjeva
- CSIRO Data 61, 5/13 Garden St, Eveleigh, NSW, 2015, Australia
| | - Priit Adler
- Quretec Ltd., Ülikooli 6a, 51003, Tartu, Estonia
- Institute of Computer Science, University of Tartu, J. Liivi 2, 50409, Tartu, Estonia
| | - Valerie Hindie
- Hybrigenics SA, 3-5 Impasse Reille, 75014, Paris, France
| | - Thomas Moncion
- Hybrigenics SA, 3-5 Impasse Reille, 75014, Paris, France
| | | | - Rachel Daudin
- Institut national de la santé et de la recherche médicale, INSERM U894 2 ter rue d'Alésia, 75014, Paris, France
- Laboratoire Aimé Cotton, Centre National Recherche Scientifique, Université Paris-Sud, Ecole Normale Supérieure Paris-Saclay, Université Paris-Saclay, 91405, Orsay, France
| | - Yann Loe-Mie
- (Epi)genomics of Animal Development Unit, Institut Pasteur, CNRS UMR3738, Paris, 75015, France
| | - Yann Herault
- Centre Européen de Recherche en Biologie et Médecine, 1 rue Laurent Fries, 67404, Illkirch, France
| | - Jean-Charles Lambert
- Institut Pasteur de Lille, UMR 744 1 rue du Pr. Calmette BP 245, 59019, Lille cedex, France
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, CB10 1SD, Hinxton, United Kingdom
| | - Tal Pupko
- George S. Wise Faculty of Life Sciences, School of Molecular Cell Biology and Biotechnology, Tel Aviv University, P.O. Box 39040, 6997801, Tel Aviv, Israel
| | | | - Ioannis Xenarios
- Center for Integrative Genomics University of Lausanne, Genopode, 1015, Lausanne, Switzerland
- Genome Center Health 2030, Analytical Platform Department, Chemin des Mines 9, 1202, Genève, Switzerland
- DFR CHUV, Rue du Bugnon 21, 1011, Lausanne, Switzerland
- Agora Center, LICR/Department of Oncology, Rue du Bugnon 25A, 1005, Lausanne, Switzerland
| | - Jaak Vilo
- Quretec Ltd., Ülikooli 6a, 51003, Tartu, Estonia
- Institute of Computer Science, University of Tartu, J. Liivi 2, 50409, Tartu, Estonia
| | - Michel Simonneau
- Institut national de la santé et de la recherche médicale, INSERM U894 2 ter rue d'Alésia, 75014, Paris, France.
- Laboratoire Aimé Cotton, Centre National Recherche Scientifique, Université Paris-Sud, Ecole Normale Supérieure Paris-Saclay, Université Paris-Saclay, 91405, Orsay, France.
| | - Hedi Peterson
- Quretec Ltd., Ülikooli 6a, 51003, Tartu, Estonia.
- Institute of Computer Science, University of Tartu, J. Liivi 2, 50409, Tartu, Estonia.
| |
Collapse
|
31
|
Ashkenazy H, Levy Karin E, Mertens Z, Cartwright RA, Pupko T. SpartaABC: a web server to simulate sequences with indel parameters inferred using an approximate Bayesian computation algorithm. Nucleic Acids Res 2019; 45:W453-W457. [PMID: 28460062 PMCID: PMC5570005 DOI: 10.1093/nar/gkx322] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Accepted: 04/15/2017] [Indexed: 11/22/2022] Open
Abstract
Many analyses for the detection of biological phenomena rely on a multiple sequence alignment as input. The results of such analyses are often further studied through parametric bootstrap procedures, using sequence simulators. One of the problems with conducting such simulation studies is that users currently have no means to decide which insertion and deletion (indel) parameters to choose, so that the resulting sequences mimic biological data. Here, we present SpartaABC, a web server that aims to solve this issue. SpartaABC implements an approximate-Bayesian-computation rejection algorithm to infer indel parameters from sequence data. It does so by extracting summary statistics from the input. It then performs numerous sequence simulations under randomly sampled indel parameters. By computing a distance between the summary statistics extracted from the input and each simulation, SpartaABC retains only parameters behind simulations close to the real data. As output, SpartaABC provides point estimates and approximate posterior distributions of the indel parameters. In addition, SpartaABC allows simulating sequences with the inferred indel parameters. To this end, the sequence simulators, Dawg 2.0 and INDELible were integrated. Using SpartaABC we demonstrate the differences in indel dynamics among three protein-coding genes across mammalian orthologs. SpartaABC is freely available for use at http://spartaabc.tau.ac.il/webserver.
Collapse
Affiliation(s)
- Haim Ashkenazy
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Eli Levy Karin
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.,Department of Molecular Biology and Ecology of Plants, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Zach Mertens
- The Biodesign Institute, Arizona State University, Tempe, AZ 85287-5301, USA
| | - Reed A Cartwright
- The Biodesign Institute, Arizona State University, Tempe, AZ 85287-5301, USA.,School of Life Sciences, Arizona State University, Tempe, AZ 85287-5301, USA
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
32
|
Abstract
Determining the most suitable model for phylogeny reconstruction constitutes a fundamental step in numerous evolutionary studies. Over the years, various criteria for model selection have been proposed, leading to debate over which criterion is preferable. However, the necessity of this procedure has not been questioned to date. Here, we demonstrate that although incongruency regarding the selected model is frequent over empirical and simulated data, all criteria lead to very similar inferences. When topologies and ancestral sequence reconstruction are the desired output, choosing one criterion over another is not crucial. Moreover, skipping model selection and using instead the most parameter-rich model, GTR+I+G, leads to similar inferences, thus rendering this time-consuming step nonessential, at least under current strategies of model selection.
Collapse
Affiliation(s)
- Shiran Abadi
- School of Plant Sciences and Food Security, Tel Aviv University, Ramat Aviv, Tel-Aviv, 69978, Israel
| | - Dana Azouri
- School of Plant Sciences and Food Security, Tel Aviv University, Ramat Aviv, Tel-Aviv, 69978, Israel
- School of Molecular Cell Biology & Biotechnology, Tel Aviv University, Ramat Aviv, Tel-Aviv, 69978, Israel
| | - Tal Pupko
- School of Molecular Cell Biology & Biotechnology, Tel Aviv University, Ramat Aviv, Tel-Aviv, 69978, Israel.
| | - Itay Mayrose
- School of Plant Sciences and Food Security, Tel Aviv University, Ramat Aviv, Tel-Aviv, 69978, Israel.
| |
Collapse
|
33
|
Ashkenazy H, Sela I, Levy Karin E, Landan G, Pupko T. Multiple Sequence Alignment Averaging Improves Phylogeny Reconstruction. Syst Biol 2018; 68:117-130. [PMID: 29771363 DOI: 10.1093/sysbio/syy036] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Accepted: 05/09/2018] [Indexed: 01/11/2023] Open
Abstract
The classic methodology of inferring a phylogenetic tree from sequence data is composed of two steps. First, a multiple sequence alignment (MSA) is computed. Then, a tree is reconstructed assuming the MSA is correct. Yet, inferred MSAs were shown to be inaccurate and alignment errors reduce tree inference accuracy. It was previously proposed that filtering unreliable alignment regions can increase the accuracy of tree inference. However, it was also demonstrated that the benefit of this filtering is often obscured by the resulting loss of phylogenetic signal. In this work we explore an approach, in which instead of relying on a single MSA, we generate a large set of alternative MSAs and concatenate them into a single SuperMSA. By doing so, we account for phylogenetic signals contained in columns that are not present in the single MSA computed by alignment algorithms. Using simulations, we demonstrate that this approach results, on average, in more accurate trees compared to 1) using an unfiltered MSA and 2) using a single MSA with weights assigned to columns according to their reliability. Next, we explore in which regions of the MSA space our approach is expected to be beneficial. Finally, we provide a simple criterion for deciding whether or not the extra effort of computing a SuperMSA and inferring a tree from it is beneficial. Based on these assessments, we expect our methodology to be useful for many cases in which diverged sequences are analyzed. The option to generate such a SuperMSA is available at http://guidance.tau.ac.il.
Collapse
Affiliation(s)
- Haim Ashkenazy
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Tel Aviv, Israel
| | - Itamar Sela
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Eli Levy Karin
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Tel Aviv, Israel.,Department of Molecular Biology & Ecology of Plants, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Giddy Landan
- Institute of Microbiology, Christian-Albrechts-University of Kiel, 24118 Kiel, Germany
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Tel Aviv, Israel
| |
Collapse
|
34
|
Xue AY, Di Pizio A, Levit A, Yarnitzky T, Penn O, Pupko T, Niv MY. Corrigendum: Independent Evolution of Strychnine Recognition by Bitter Taste Receptor Subtypes. Front Mol Biosci 2018; 5:84. [PMID: 30255025 PMCID: PMC6142832 DOI: 10.3389/fmolb.2018.00084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2018] [Accepted: 08/23/2018] [Indexed: 11/13/2022] Open
Affiliation(s)
- Ava Yuan Xue
- Robert H. Smith Faculty of Agriculture, Food and Environment, Institute of Biochemistry, Food Science and Nutrition, The Hebrew University of Jerusalem, Rehovot, Israel.,The Fritz Haber Center for Molecular Dynamics, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Antonella Di Pizio
- Robert H. Smith Faculty of Agriculture, Food and Environment, Institute of Biochemistry, Food Science and Nutrition, The Hebrew University of Jerusalem, Rehovot, Israel.,The Fritz Haber Center for Molecular Dynamics, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Anat Levit
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, United States
| | - Tali Yarnitzky
- Robert H. Smith Faculty of Agriculture, Food and Environment, Institute of Biochemistry, Food Science and Nutrition, The Hebrew University of Jerusalem, Rehovot, Israel.,Tali Yarnitzky Scientific Consulting, Maccabim-Reut, Israel
| | - Osnat Penn
- Modeling, Analysis and Theory Group, Allen Institute for Brain Science, Seattle, WA, United States
| | - Tal Pupko
- The Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Masha Y Niv
- Robert H. Smith Faculty of Agriculture, Food and Environment, Institute of Biochemistry, Food Science and Nutrition, The Hebrew University of Jerusalem, Rehovot, Israel.,The Fritz Haber Center for Molecular Dynamics, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
35
|
Levy Karin E, Ashkenazy H, Hein J, Pupko T. A Simulation-Based Approach to Statistical Alignment. Syst Biol 2018; 68:252-266. [DOI: 10.1093/sysbio/syy059] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2018] [Accepted: 09/10/2018] [Indexed: 12/26/2022] Open
Affiliation(s)
- Eli Levy Karin
- School of Molecular Cell Biology & Biotechnology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv 69978, Israel
| | - Haim Ashkenazy
- School of Molecular Cell Biology & Biotechnology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv 69978, Israel
| | - Jotun Hein
- School of Molecular Cell Biology & Biotechnology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv 69978, Israel
- Department of Statistics, University of Oxford, Oxford, UK
| | - Tal Pupko
- School of Molecular Cell Biology & Biotechnology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv 69978, Israel
| |
Collapse
|
36
|
Avram O, Vaisman-Mentesh A, Yehezkel D, Ashkenazy H, Pupko T, Wine Y. ASAP - A Webserver for Immunoglobulin-Sequencing Analysis Pipeline. Front Immunol 2018; 9:1686. [PMID: 30105017 PMCID: PMC6077260 DOI: 10.3389/fimmu.2018.01686] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Accepted: 07/09/2018] [Indexed: 12/16/2022] Open
Abstract
Reproducible and robust data on antibody repertoires are invaluable for basic and applied immunology. Next-generation sequencing (NGS) of antibody variable regions has emerged as a powerful tool in systems immunology, providing quantitative molecular information on antibody polyclonal composition. However, major computational challenges exist when analyzing antibody sequences, from error handling to hypermutation profiles and clonal expansion analyses. In this work, we developed the ASAP (A webserver for Immunoglobulin-Seq Analysis Pipeline) webserver (https://asap.tau.ac.il). The input to ASAP is a paired-end sequence dataset from one or more replicates, with or without unique molecular identifiers. These datasets can be derived from NGS of human or murine antibody variable regions. ASAP first filters and annotates the sequence reads using public or user-provided germline sequence information. The ASAP webserver next performs various calculations, including somatic hypermutation level, CDR3 lengths, V(D)J family assignments, and V(D)J combination distribution. These analyses are repeated for each replicate. ASAP provides additional information by analyzing the commonalities and differences between the repeats (“joint” analysis). For example, ASAP examines the shared variable regions and their frequency in each replicate to determine which sequences are less likely to be a result of a sample preparation derived and/or sequencing errors. Moreover, ASAP clusters the data to clones and reports the identity and prevalence of top ranking clones (clonal expansion analysis). ASAP further provides the distribution of synonymous and non-synonymous mutations within the V genes somatic hypermutations. Finally, ASAP provides means to process the data for proteomic analysis of serum/secreted antibodies by generating a variable region database for liquid chromatography high resolution tandem mass spectrometry (LC-MS/MS) interpretation. ASAP is user-friendly, free, and open to all users, with no login requirement. ASAP is applicable for researchers interested in basic questions related to B cell development and differentiation, as well as applied researchers who are interested in vaccine development and monoclonal antibody engineering. By virtue of its user-friendliness, ASAP opens the antibody analysis field to non-expert users who seek to boost their research with immune repertoire analysis.
Collapse
Affiliation(s)
- Oren Avram
- George S. Wise Faculty of Life Sciences, School of Molecular Cell Biology and Biotechnology, Tel Aviv University, Ramat Aviv, Israel
| | - Anna Vaisman-Mentesh
- George S. Wise Faculty of Life Sciences, School of Molecular Cell Biology and Biotechnology, Tel Aviv University, Ramat Aviv, Israel
| | - Dror Yehezkel
- George S. Wise Faculty of Life Sciences, School of Molecular Cell Biology and Biotechnology, Tel Aviv University, Ramat Aviv, Israel
| | - Haim Ashkenazy
- George S. Wise Faculty of Life Sciences, School of Molecular Cell Biology and Biotechnology, Tel Aviv University, Ramat Aviv, Israel
| | - Tal Pupko
- George S. Wise Faculty of Life Sciences, School of Molecular Cell Biology and Biotechnology, Tel Aviv University, Ramat Aviv, Israel
| | - Yariv Wine
- George S. Wise Faculty of Life Sciences, School of Molecular Cell Biology and Biotechnology, Tel Aviv University, Ramat Aviv, Israel
| |
Collapse
|
37
|
Ryvkin A, Ashkenazy H, Weiss-Ottolenghi Y, Piller C, Pupko T, Gershoni JM. Phage display peptide libraries: deviations from randomness and correctives. Nucleic Acids Res 2018; 46:e52. [PMID: 29420788 PMCID: PMC5961013 DOI: 10.1093/nar/gky077] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Revised: 12/25/2017] [Accepted: 01/31/2018] [Indexed: 12/14/2022] Open
Abstract
Peptide-expressing phage display libraries are widely used for the interrogation of antibodies. Affinity selected peptides are then analyzed to discover epitope mimetics, or are subjected to computational algorithms for epitope prediction. A critical assumption for these applications is the random representation of amino acids in the initial naïve peptide library. In a previous study, we implemented next generation sequencing to evaluate a naïve library and discovered severe deviations from randomness in UAG codon over-representation as well as in high G phosphoramidite abundance causing amino acid distribution biases. In this study, we demonstrate that the UAG over-representation can be attributed to the burden imposed on the phage upon the assembly of the recombinant Protein 8 subunits. This was corrected by constructing the libraries using supE44-containing bacteria which suppress the UAG driven abortive termination. We also demonstrate that the overabundance of G stems from variant synthesis-efficiency and can be corrected using compensating oligonucleotide-mixtures calibrated by mass spectroscopy. Construction of libraries implementing these correctives results in markedly improved libraries that display random distribution of amino acids, thus ensuring that enriched peptides obtained in biopanning represent a genuine selection event, a fundamental assumption for phage display applications.
Collapse
Affiliation(s)
- Arie Ryvkin
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Haim Ashkenazy
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Yael Weiss-Ottolenghi
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Chen Piller
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Jonathan M Gershoni
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
38
|
Xue AY, Di Pizio A, Levit A, Yarnitzky T, Penn O, Pupko T, Niv MY. Independent Evolution of Strychnine Recognition by Bitter Taste Receptor Subtypes. Front Mol Biosci 2018; 5:9. [PMID: 29552563 PMCID: PMC5840161 DOI: 10.3389/fmolb.2018.00009] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2017] [Accepted: 01/19/2018] [Indexed: 01/02/2023] Open
Abstract
The 25 human bitter taste receptors (hT2Rs) recognize thousands of structurally and chemically diverse bitter substances. The binding modes of human bitter taste receptors hT2R10 and hT2R46, which are responsible for strychnine recognition, were previously established using site-directed mutagenesis, functional assays, and molecular modeling. Here we construct a phylogenetic tree and reconstruct ancestral sequences of the T2R10 and T2R46 clades. We next analyze the binding sites in view of experimental data to predict their ability to recognize strychnine. This analysis suggests that the common ancestor of hT2R10 and hT2R46 is unlikely to bind strychnine in the same mode as either of its two descendants. Estimation of relative divergence times shows that hT2R10 evolved earlier than hT2R46. Strychnine recognition was likely acquired first by the earliest common ancestor of the T2R10 clade before the separation of primates from other mammals, and was highly conserved within the clade. It was probably independently acquired by the common ancestor of T2R43-47 before the homo-ape speciation, lost in most T2Rs within this clade, but enhanced in the hT2R46 after humans diverged from the rest of primates. Our findings suggest hypothetical strychnine T2R receptors in several species, and serve as an experimental guide for further study. Improved understanding of how bitter taste receptors acquire the ability to be activated by particular ligands is valuable for the development of sensors for bitterness and for potential toxicity.
Collapse
Affiliation(s)
- Ava Yuan Xue
- Robert H. Smith Faculty of Agriculture, Food and Environment, Institute of Biochemistry, Food Science and Nutrition, The Hebrew University of Jerusalem, Rehovot, Israel
- The Fritz Haber Center for Molecular Dynamics, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Antonella Di Pizio
- Robert H. Smith Faculty of Agriculture, Food and Environment, Institute of Biochemistry, Food Science and Nutrition, The Hebrew University of Jerusalem, Rehovot, Israel
- The Fritz Haber Center for Molecular Dynamics, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Anat Levit
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, United States
| | - Tali Yarnitzky
- Robert H. Smith Faculty of Agriculture, Food and Environment, Institute of Biochemistry, Food Science and Nutrition, The Hebrew University of Jerusalem, Rehovot, Israel
- Tali Yarnitzky Scientific Consulting, Maccabim-Reut, Israel
| | - Osnat Penn
- Modeling, Analysis and Theory Group, Allen Institute for Brain Science, Seattle, WA, United States
| | - Tal Pupko
- The Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Masha Y. Niv
- Robert H. Smith Faculty of Agriculture, Food and Environment, Institute of Biochemistry, Food Science and Nutrition, The Hebrew University of Jerusalem, Rehovot, Israel
- The Fritz Haber Center for Molecular Dynamics, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
39
|
Lavi B, Levy Karin E, Pupko T, Hazkani-Covo E. The Prevalence and Evolutionary Conservation of Inverted Repeats in Proteobacteria. Genome Biol Evol 2018; 10:918-927. [PMID: 29608719 PMCID: PMC5941160 DOI: 10.1093/gbe/evy044] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/21/2018] [Indexed: 12/11/2022] Open
Abstract
Perfect short inverted repeats (IRs) are known to be enriched in a variety of bacterial and eukaryotic genomes. Currently, it is unclear whether perfect IRs are conserved over evolutionary time scales. In this study, we aimed to characterize the prevalence and evolutionary conservation of IRs across 20 proteobacterial strains. We first identified IRs in Escherichia coli K-12 substr MG1655 and showed that they are overabundant. We next aimed to test whether this overabundance is reflected in the conservation of IRs over evolutionary time scales. To this end, for each perfect IR identified in E. coli MG1655, we collected orthologous sequences from related proteobacterial genomes. We next quantified the evolutionary conservation of these IRs, that is, the presence of the exact same IR across orthologous regions. We observed high conservation of perfect IRs: out of the 234 examined orthologous regions, 145 were more conserved than expected, which is statistically significant even after correcting for multiple testing. Our results together with previous experimental findings support a model in which imperfect IRs are corrected to perfect IRs in a preferential manner via a template switching mechanism.
Collapse
Affiliation(s)
- Bar Lavi
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Israel
- Department of Natural and Life Sciences, The Open University of Israel, Ra'anana, Israel
| | - Eli Levy Karin
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Israel
- Department of Molecular Biology & Ecology of Plants, George S. Wise Faculty of Life Sciences, Tel Aviv University, Israel
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Israel
| | - Einat Hazkani-Covo
- Department of Natural and Life Sciences, The Open University of Israel, Ra'anana, Israel
| |
Collapse
|
40
|
Nissan G, Gershovits M, Morozov M, Chalupowicz L, Sessa G, Manulis‐Sasson S, Barash I, Pupko T. Revealing the inventory of type III effectors in Pantoea agglomerans gall-forming pathovars using draft genome sequences and a machine-learning approach. Mol Plant Pathol 2018; 19:381-392. [PMID: 28019708 PMCID: PMC6638007 DOI: 10.1111/mpp.12528] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Revised: 12/06/2016] [Accepted: 12/14/2016] [Indexed: 05/03/2023]
Abstract
Pantoea agglomerans, a widespread epiphytic bacterium, has evolved into a hypersensitive response and pathogenicity (hrp)-dependent and host-specific gall-forming pathogen by the acquisition of a pathogenicity plasmid containing a type III secretion system (T3SS) and its effectors (T3Es). Pantoea agglomerans pv. betae (Pab) elicits galls on beet (Beta vulgaris) and gypsophila (Gypsophila paniculata), whereas P. agglomerans pv. gypsophilae (Pag) incites galls on gypsophila and a hypersensitive response (HR) on beet. Draft genome sequences were generated and employed in combination with a machine-learning approach and a translocation assay into beet roots to identify the pools of T3Es in the two pathovars. The genomes of the sequenced Pab4188 and Pag824-1 strains have a similar size (∼5 MB) and GC content (∼55%). Mutational analysis revealed that, in Pab4188, eight T3Es (HsvB, HsvG, PseB, DspA/E, HopAY1, HopX2, HopAF1 and HrpK) contribute to pathogenicity on beet and gypsophila. In Pag824-1, nine T3Es (HsvG, HsvB, PthG, DspA/E, HopAY1, HopD1, HopX2, HopAF1 and HrpK) contribute to pathogenicity on gypsophila, whereas the PthG effector triggers HR on beet. HsvB, HsvG, PthG and PseB appear to endow pathovar specificities to Pab and Pag, and no homologous T3Es were identified for these proteins in other phytopathogenic bacteria. Conversely, the remaining T3Es contribute to the virulence of both pathovars, and homologous T3Es were found in other phytopathogenic bacteria. Remarkably, HsvG and HsvB, which act as host-specific transcription factors, displayed the largest contribution to disease development.
Collapse
Affiliation(s)
- Gal Nissan
- Department of Molecular Biology and Ecology of Plants, Faculty of Life SciencesTel‐Aviv UniversityTel‐Aviv69978Israel
- Department of Plant Pathology and Weed ResearchAgricultural Research Organization, The Volcani CenterRishonLeZion7528809Israel
| | - Michael Gershovits
- Department of Cell Research and Immunology, Faculty of Life SciencesTel‐Aviv UniversityTel‐Aviv69978Israel
| | - Michael Morozov
- Department of Molecular Biology and Ecology of Plants, Faculty of Life SciencesTel‐Aviv UniversityTel‐Aviv69978Israel
- Department of Plant Pathology and Weed ResearchAgricultural Research Organization, The Volcani CenterRishonLeZion7528809Israel
| | - Laura Chalupowicz
- Department of Plant Pathology and Weed ResearchAgricultural Research Organization, The Volcani CenterRishonLeZion7528809Israel
| | - Guido Sessa
- Department of Molecular Biology and Ecology of Plants, Faculty of Life SciencesTel‐Aviv UniversityTel‐Aviv69978Israel
| | - Shulamit Manulis‐Sasson
- Department of Plant Pathology and Weed ResearchAgricultural Research Organization, The Volcani CenterRishonLeZion7528809Israel
| | - Isaac Barash
- Department of Molecular Biology and Ecology of Plants, Faculty of Life SciencesTel‐Aviv UniversityTel‐Aviv69978Israel
| | - Tal Pupko
- Department of Cell Research and Immunology, Faculty of Life SciencesTel‐Aviv UniversityTel‐Aviv69978Israel
| |
Collapse
|
41
|
Danziger O, Pupko T, Bacharach E, Ehrlich M. Interleukin-6 and Interferon-α Signaling via JAK1-STAT Differentially Regulate Oncolytic versus Cytoprotective Antiviral States. Front Immunol 2018; 9:94. [PMID: 29441069 PMCID: PMC5797546 DOI: 10.3389/fimmu.2018.00094] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2017] [Accepted: 01/12/2018] [Indexed: 12/17/2022] Open
Abstract
Malignancy-induced alterations to cytokine signaling in tumor cells differentially regulate their interactions with the immune system and oncolytic viruses. The abundance of inflammatory cytokines in the tumor microenvironment suggests that such signaling plays key roles in tumor development and therapy efficacy. The JAK-STAT axis transduces signals of interleukin-6 (IL-6) and interferons (IFNs), mediates antiviral responses, and is frequently altered in prostate cancer (PCa) cells. However, how activation of JAK-STAT signaling with different cytokines regulates interactions between oncolytic viruses and PCa cells is not known. Here, we employ LNCaP PCa cells, expressing (or not) JAK1, activated (or not) with IFNs (α or γ) or IL-6, and infected with RNA viruses of different oncolytic potential (EHDV-TAU, hMPV-GFP, or HIV-GFP) to address this matter. We show that in JAK1-expressing cells, IL-6 sensitized PCa cells to viral cell death in the presence or absence of productive infection, with dependence on virus employed. Contrastingly, IFNα induced a cytoprotective antiviral state. Biochemical and genetic (knockout) analyses revealed dependency of antiviral state or cytoprotection on STAT1 or STAT2 activation, respectively. In IL-6-treated cells, STAT3 expression was required for anti-proliferative signaling. Quantitative proteomics (SILAC) revealed a core repertoire of antiviral IFN-stimulated genes, induced by IL-6 or IFNs. Oncolysis in the absence of productive infection, induced by IL-6, correlated with reduction in regulators of cell cycle and metabolism. These results call for matching the viral features of the oncolytic agent, the malignancy-induced genetic-epigenetic alterations to JAK/STAT signaling and the cytokine composition of the tumor microenvironment for efficient oncolytic virotherapy.
Collapse
Affiliation(s)
- Oded Danziger
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Eran Bacharach
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Marcelo Ehrlich
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
42
|
Levy Karin E, Shkedy D, Ashkenazy H, Cartwright RA, Pupko T. Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation. Genome Biol Evol 2018; 9:1280-1294. [PMID: 28453624 PMCID: PMC5438127 DOI: 10.1093/gbe/evx084] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2017] [Indexed: 02/07/2023] Open
Abstract
The most common evolutionary events at the molecular level are single-base substitutions, as well as insertions and deletions (indels) of short DNA segments. A large body of research has been devoted to develop probabilistic substitution models and to infer their parameters using likelihood and Bayesian approaches. In contrast, relatively little has been done to model indel dynamics, probably due to the difficulty in writing explicit likelihood functions. Here, we contribute to the effort of modeling indel dynamics by presenting SpartaABC, an approximate Bayesian computation (ABC) approach to infer indel parameters from sequence data (either aligned or unaligned). SpartaABC circumvents the need to use an explicit likelihood function by extracting summary statistics from simulated sequences. First, summary statistics are extracted from the input sequence data. Second, SpartaABC samples indel parameters from a prior distribution and uses them to simulate sequences. Third, it computes summary statistics from the simulated sets of sequences. By computing a distance between the summary statistics extracted from the input and each simulation, SpartaABC can provide an approximation to the posterior distribution of indel parameters as well as point estimates. We study the performance of our methodology and show that it provides accurate estimates of indel parameters in simulations. We next demonstrate the utility of SpartaABC by studying the impact of alignment errors on the inference of positive selection. A C ++ program implementing SpartaABC is freely available in http://spartaabc.tau.ac.il.
Collapse
Affiliation(s)
- Eli Levy Karin
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Israel.,Department of Molecular Biology & Ecology of Plants, George S. Wise Faculty of Life Sciences, Tel Aviv University, Israel
| | - Dafna Shkedy
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Israel
| | - Haim Ashkenazy
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Israel
| | - Reed A Cartwright
- The Biodesign Institute, Arizona State University, Tempe, AZ.,School of Life Sciences, Arizona State University, Tempe, AZ
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Israel
| |
Collapse
|
43
|
Mushegian A, Karin EL, Pupko T. Sequence analysis of malacoherpesvirus proteins: Pan-herpesvirus capsid module and replication enzymes with an ancient connection to "Megavirales". Virology 2018; 513:114-128. [PMID: 29065352 PMCID: PMC7172337 DOI: 10.1016/j.virol.2017.10.009] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2017] [Revised: 10/08/2017] [Accepted: 10/09/2017] [Indexed: 12/30/2022]
Abstract
The order Herpesvirales includes animal viruses with large double-strand DNA genomes replicating in the nucleus. The main capsid protein in the best-studied family Herpesviridae contains a domain with HK97-like fold related to bacteriophage head proteins, and several virion maturation factors are also homologous between phages and herpesviruses. The origin of herpesvirus DNA replication proteins is less well understood. While analyzing the genomes of herpesviruses in the family Malacohepresviridae, we identified nearly 30 families of proteins conserved in other herpesviruses, including several phage-related domains in morphogenetic proteins. Herpesvirus DNA replication factors have complex evolutionary history: some are related to cellular proteins, but others are closer to homologs from large nucleocytoplasmic DNA viruses. Phylogenetic analyses suggest that the core replication machinery of herpesviruses may have been recruited from the same pool as in the case of other large DNA viruses of eukaryotes.
Collapse
Affiliation(s)
- Arcady Mushegian
- Division of Molecular and Cellular Biosciences, National Science Foundation, 2415 Eisenhower Avenue, Alexandria, VA 22314, USA.
| | - Eli Levy Karin
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv 69978, Israel
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv 69978, Israel
| |
Collapse
|
44
|
Levy Karin E, Ashkenazy H, Wicke S, Pupko T, Mayrose I. TraitRateProp: a web server for the detection of trait-dependent evolutionary rate shifts in sequence sites. Nucleic Acids Res 2017; 45:W260-W264. [PMID: 28453644 PMCID: PMC5570260 DOI: 10.1093/nar/gkx288] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2017] [Revised: 04/02/2017] [Accepted: 04/26/2017] [Indexed: 11/23/2022] Open
Abstract
Understanding species adaptation at the molecular level has been a central goal of evolutionary biology and genomics research. This important task becomes increasingly relevant with the constant rise in both genotypic and phenotypic data availabilities. The TraitRateProp web server offers a unique perspective into this task by allowing the detection of associations between sequence evolution rate and whole-organism phenotypes. By analyzing sequences and phenotypes of extant species in the context of their phylogeny, it identifies sequence sites in a gene/protein whose evolutionary rate is associated with shifts in the phenotype. To this end, it considers alternative histories of whole-organism phenotypic changes, which result in the extant phenotypic states. Its joint likelihood framework that combines models of sequence and phenotype evolution allows testing whether an association between these processes exists. In addition to predicting sequence sites most likely to be associated with the phenotypic trait, the server can optionally integrate structural 3D information. This integration allows a visual detection of trait-associated sequence sites that are juxtapose in 3D space, thereby suggesting a common functional role. We used TraitRateProp to study the shifts in sequence evolution rate of the RPS8 protein upon transitions into heterotrophy in Orchidaceae. TraitRateProp is available at http://traitrate.tau.ac.il/prop.
Collapse
Affiliation(s)
- Eli Levy Karin
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
- Department Molecular Biology and Ecology of Plants, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Haim Ashkenazy
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
- Department Molecular Biology and Ecology of Plants, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Susann Wicke
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Itay Mayrose
- Department Molecular Biology and Ecology of Plants, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
45
|
Levy Karin E, Wicke S, Pupko T, Mayrose I. An Integrated Model of Phenotypic Trait Changes and Site-Specific Sequence Evolution. Syst Biol 2017; 66:917-933. [DOI: 10.1093/sysbio/syx032] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2016] [Accepted: 01/24/2017] [Indexed: 02/05/2023] Open
|
46
|
Preisner H, Karin EL, Poschmann G, Stühler K, Pupko T, Gould SB. The Cytoskeleton of Parabasalian Parasites Comprises Proteins that Share Properties Common to Intermediate Filament Proteins. Protist 2016; 167:526-543. [PMID: 27744090 DOI: 10.1016/j.protis.2016.09.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2016] [Revised: 08/25/2016] [Accepted: 09/02/2016] [Indexed: 01/15/2023]
Abstract
Certain protist lineages bear cytoskeletal structures that are germane to them and define their individual group. Trichomonadida are excavate parasites united by a unique cytoskeletal framework, which includes tubulin-based structures such as the pelta and axostyle, but also other filaments such as the striated costa whose protein composition remains unknown. We determined the proteome of the detergent-resistant cytoskeleton of Tetratrichomonas gallinarum. 203 proteins with homology to Trichomonas vaginalis were identified, which contain significantly more long coiled-coil regions than control protein sets. Five candidates were shown to associate with previously described cytoskeletal structures including the costa and the expression of a single T. vaginalis protein in T. gallinarum induced the formation of accumulated, striated filaments. Our data suggests that filament-forming proteins of protists other than actin and tubulin share common structural properties with metazoan intermediate filament proteins, while not being homologous. These filament-forming proteins might have evolved many times independently in eukaryotes, or simultaneously in a common ancestor but with different evolutionary trajectories downstream in different phyla. The broad variety of filament-forming proteins uncovered, and with no homologs outside of the Trichomonadida, once more highlights the diverse nature of eukaryotic proteins with the ability to form unique cytoskeletal filaments.
Collapse
Affiliation(s)
- Harald Preisner
- Institute for Molecular Evolution, Heinrich-Heine-University, Düsseldorf, Germany
| | - Eli Levy Karin
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv, Israel
| | - Gereon Poschmann
- Molecular Proteomics Laboratory (MPL), BMFZ, Heinrich-Heine-University, Düsseldorf, Germany
| | - Kai Stühler
- Molecular Proteomics Laboratory (MPL), BMFZ, Heinrich-Heine-University, Düsseldorf, Germany
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv, Israel
| | - Sven B Gould
- Institute for Molecular Evolution, Heinrich-Heine-University, Düsseldorf, Germany.
| |
Collapse
|
47
|
McNally A, Oren Y, Kelly D, Pascoe B, Dunn S, Sreecharan T, Vehkala M, Välimäki N, Prentice MB, Ashour A, Avram O, Pupko T, Dobrindt U, Literak I, Guenther S, Schaufler K, Wieler LH, Zhiyong Z, Sheppard SK, McInerney JO, Corander J. Combined Analysis of Variation in Core, Accessory and Regulatory Genome Regions Provides a Super-Resolution View into the Evolution of Bacterial Populations. PLoS Genet 2016; 12:e1006280. [PMID: 27618184 PMCID: PMC5019451 DOI: 10.1371/journal.pgen.1006280] [Citation(s) in RCA: 125] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Accepted: 08/04/2016] [Indexed: 02/05/2023] Open
Abstract
The use of whole-genome phylogenetic analysis has revolutionized our understanding of the evolution and spread of many important bacterial pathogens due to the high resolution view it provides. However, the majority of such analyses do not consider the potential role of accessory genes when inferring evolutionary trajectories. Moreover, the recently discovered importance of the switching of gene regulatory elements suggests that an exhaustive analysis, combining information from core and accessory genes with regulatory elements could provide unparalleled detail of the evolution of a bacterial population. Here we demonstrate this principle by applying it to a worldwide multi-host sample of the important pathogenic E. coli lineage ST131. Our approach reveals the existence of multiple circulating subtypes of the major drug–resistant clade of ST131 and provides the first ever population level evidence of core genome substitutions in gene regulatory regions associated with the acquisition and maintenance of different accessory genome elements. We present an approach to evolutionary analysis of bacterial pathogens combining core genome, accessory genome, and gene regulatory region analyses. This enables unparalleled resolution of the evolution of a multi-drug resistant pandemic pathogen that would remain invisible to a core genome phylogenetic analysis alone. In particular, our combined analysis approach identifies population-level evidence for compensatory mutations offsetting the costs of resistance plasmid maintenance as a key event in the emergence of dominant MDR lineages of E. coli.
Collapse
Affiliation(s)
- Alan McNally
- Pathogen Research Group, Nottingham Trent University, Nottingham, United Kingdom
- Institute of Microbiology and Infection, University of Birmingham, Birmingham, United Kingdom
- * E-mail:
| | - Yaara Oren
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Darren Kelly
- Department of Biology, National University Ireland, Maynooth, Ireland
| | - Ben Pascoe
- College of Medicine, University of Swansea, Swansea, United Kingdom
| | - Steven Dunn
- Pathogen Research Group, Nottingham Trent University, Nottingham, United Kingdom
| | - Tristan Sreecharan
- Pathogen Research Group, Nottingham Trent University, Nottingham, United Kingdom
| | - Minna Vehkala
- Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Niko Välimäki
- Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Michael B. Prentice
- Departments of Pathology and Microbiology, University College Cork, Cork, Ireland
| | - Amgad Ashour
- Departments of Pathology and Microbiology, University College Cork, Cork, Ireland
| | - Oren Avram
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Ulrich Dobrindt
- Institute of Hygiene, Universitat Muenster, Muenster, Germany
| | - Ivan Literak
- Department of Biology and Wildlife Diseases, Faculty of Veterinary Hygiene and Ecology, and CEITEC VFU, University of Veterinary and Pharmaceutical Sciences, Brno, Czech Republic
| | - Sebastian Guenther
- Centre for Infection Medicine, Institute of Microbiology and Epizootics, Freie Universitat, Berlin, Germany
| | - Katharina Schaufler
- Centre for Infection Medicine, Institute of Microbiology and Epizootics, Freie Universitat, Berlin, Germany
| | - Lothar H. Wieler
- Centre for Infection Medicine, Institute of Microbiology and Epizootics, Freie Universitat, Berlin, Germany
- Robert Koch Institute, Berlin, Germany
| | - Zong Zhiyong
- Centre for Infectious Diseases, West China Hospital of Sichuan University, Chengdu, China
| | | | - James O. McInerney
- Department of Biology, National University Ireland, Maynooth, Ireland
- Faculty of Life Sciences, The University of Manchester, Manchester, United Kingdom
| | - Jukka Corander
- Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
- Department of Biostatistics, University of Oslo, Oslo, Norway
| |
Collapse
|
48
|
Ashkenazy H, Abadi S, Martz E, Chay O, Mayrose I, Pupko T, Ben-Tal N. ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res 2016; 44:W344-50. [PMID: 27166375 PMCID: PMC4987940 DOI: 10.1093/nar/gkw408] [Citation(s) in RCA: 1875] [Impact Index Per Article: 234.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2016] [Accepted: 05/03/2016] [Indexed: 12/12/2022] Open
Abstract
The degree of evolutionary conservation of an amino acid in a protein or a nucleic acid in DNA/RNA reflects a balance between its natural tendency to mutate and the overall need to retain the structural integrity and function of the macromolecule. The ConSurf web server (http://consurf.tau.ac.il), established over 15 years ago, analyses the evolutionary pattern of the amino/nucleic acids of the macromolecule to reveal regions that are important for structure and/or function. Starting from a query sequence or structure, the server automatically collects homologues, infers their multiple sequence alignment and reconstructs a phylogenetic tree that reflects their evolutionary relations. These data are then used, within a probabilistic framework, to estimate the evolutionary rates of each sequence position. Here we introduce several new features into ConSurf, including automatic selection of the best evolutionary model used to infer the rates, the ability to homology-model query proteins, prediction of the secondary structure of query RNA molecules from sequence, the ability to view the biological assembly of a query (in addition to the single chain), mapping of the conservation grades onto 2D RNA models and an advanced view of the phylogenetic tree that enables interactively rerunning ConSurf with the taxa of a sub-tree.
Collapse
Affiliation(s)
- Haim Ashkenazy
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Shiran Abadi
- Department of Molecular Biology and Ecology of Plants, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Eric Martz
- Department of Microbiology, University of Massachusetts, Amherst, MA 01003, USA
| | - Ofer Chay
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel Department of Molecular Biology and Ecology of Plants, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Itay Mayrose
- Department of Molecular Biology and Ecology of Plants, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Nir Ben-Tal
- Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
49
|
Eckshtain-Levi N, Shkedy D, Gershovits M, Da Silva GM, Tamir-Ariel D, Walcott R, Pupko T, Burdman S. Insights from the Genome Sequence of Acidovorax citrulli M6, a Group I Strain of the Causal Agent of Bacterial Fruit Blotch of Cucurbits. Front Microbiol 2016; 7:430. [PMID: 27092114 PMCID: PMC4821854 DOI: 10.3389/fmicb.2016.00430] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2015] [Accepted: 03/17/2016] [Indexed: 11/13/2022] Open
Abstract
Acidovorax citrulli is a seedborne bacterium that causes bacterial fruit blotch of cucurbit plants including watermelon and melon. A. citrulli strains can be divided into two major groups based on DNA fingerprint analyses and biochemical properties. Group I strains have been generally isolated from non-watermelon cucurbits, while group II strains are closely associated with watermelon. In the present study, we report the genome sequence of M6, a group I model A. citrulli strain, isolated from melon. We used comparative genome analysis to investigate differences between the genome of strain M6 and the genome of the group II model strain AAC00-1. The draft genome sequence of A. citrulli M6 harbors 139 contigs, with an overall approximate size of 4.85 Mb. The genome of M6 is ∼500 Kb shorter than that of strain AAC00-1. Comparative analysis revealed that this size difference is mainly explained by eight fragments, ranging from ∼35-120 Kb and distributed throughout the AAC00-1 genome, which are absent in the M6 genome. In agreement with this finding, while AAC00-1 was found to possess 532 open reading frames (ORFs) that are absent in strain M6, only 123 ORFs in M6 were absent in AAC00-1. Most of these M6 ORFs are hypothetical proteins and most of them were also detected in two group I strains that were recently sequenced, tw6 and pslb65. Further analyses by PCR assays and coverage analyses with other A. citrulli strains support the notion that some of these fragments or significant portions of them are discriminative between groups I and II strains of A. citrulli. Moreover, GC content, effective number of codon values and cluster of orthologs' analyses indicate that these fragments were introduced into group II strains by horizontal gene transfer events. Our study reports the genome sequence of a model group I strain of A. citrulli, one of the most important pathogens of cucurbits. It also provides the first comprehensive comparison at the genomic level between the two major groups of strains of this pathogen.
Collapse
Affiliation(s)
- Noam Eckshtain-Levi
- Department of Plant Pathology and Microbiology and the Otto Warburg Center for Agricultural Biotechnology, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of JerusalemRehovot, Israel
| | - Dafna Shkedy
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv UniversityTel Aviv, Israel
| | - Michael Gershovits
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv UniversityTel Aviv, Israel
| | | | - Dafna Tamir-Ariel
- Department of Plant Pathology and Microbiology and the Otto Warburg Center for Agricultural Biotechnology, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of JerusalemRehovot, Israel
| | - Ron Walcott
- Department of Plant Pathology, The University of Georgia, AthensGA, USA
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv UniversityTel Aviv, Israel
| | - Saul Burdman
- Department of Plant Pathology and Microbiology and the Otto Warburg Center for Agricultural Biotechnology, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of JerusalemRehovot, Israel
| |
Collapse
|
50
|
Teper D, Burstein D, Salomon D, Gershovitz M, Pupko T, Sessa G. Identification of novel Xanthomonas euvesicatoria type III effector proteins by a machine-learning approach. Mol Plant Pathol 2016; 17:398-411. [PMID: 26104875 PMCID: PMC6638362 DOI: 10.1111/mpp.12288] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
The Gram-negative bacterium Xanthomonas euvesicatoria (Xcv) is the causal agent of bacterial spot disease in pepper and tomato. Xcv pathogenicity depends on a type III secretion (T3S) system that delivers effector proteins into host cells to suppress plant immunity and promote disease. The pool of known Xcv effectors includes approximately 30 proteins, most identified in the 85-10 strain by various experimental and computational techniques. To identify additional Xcv 85-10 effectors, we applied a genome-wide machine-learning approach, in which all open reading frames (ORFs) were scored according to their propensity to encode effectors. Scoring was based on a large set of features, including genomic organization, taxonomic dispersion, hypersensitive response and pathogenicity (hrp)-dependent expression, 5' regulatory sequences, amino acid composition bias and GC content. Thirty-six predicted effectors were tested for translocation into plant cells using the hypersensitive response (HR)-inducing domain of AvrBs2 as a reporter. Seven proteins (XopAU, XopAV, XopAW, XopAP, XopAX, XopAK and XopAD) harboured a functional translocation signal and their translocation relied on the HrpF translocon, indicating that they are bona fide T3S effectors. Remarkably, four belong to novel effector families. Inactivation of the xopAP gene reduced the severity of disease symptoms in infected plants. A decrease in cell death and chlorophyll content was observed in pepper leaves inoculated with the xopAP mutant when compared with the wild-type strain. However, populations of the xopAP mutant in infected leaves were similar in size to those of wild-type bacteria, suggesting that the reduction in virulence was not caused by impaired bacterial growth.
Collapse
Affiliation(s)
- Doron Teper
- Department of Molecular Biology and Ecology of Plants, Tel Aviv University, Tel Aviv, 69978, Israel
| | - David Burstein
- Department of Cell Research and Immunology, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Dor Salomon
- Department of Molecular Biology and Ecology of Plants, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Michael Gershovitz
- Department of Cell Research and Immunology, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Tal Pupko
- Department of Earth and Planetary Science, UC Berkeley, Berkeley, CA, 94720, USA
| | - Guido Sessa
- Department of Molecular Biology and Ecology of Plants, Tel Aviv University, Tel Aviv, 69978, Israel
| |
Collapse
|