1
|
Musil M, Jezik A, Horackova J, Borko S, Kabourek P, Damborsky J, Bednar D. FireProt 2.0: web-based platform for the fully automated design of thermostable proteins. Brief Bioinform 2023; 25:bbad425. [PMID: 38018911 PMCID: PMC10685400 DOI: 10.1093/bib/bbad425] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 10/25/2023] [Accepted: 11/01/2023] [Indexed: 11/30/2023] Open
Abstract
Thermostable proteins find their use in numerous biomedical and biotechnological applications. However, the computational design of stable proteins often results in single-point mutations with a limited effect on protein stability. However, the construction of stable multiple-point mutants can prove difficult due to the possibility of antagonistic effects between individual mutations. FireProt protocol enables the automated computational design of highly stable multiple-point mutants. FireProt 2.0 builds on top of the previously published FireProt web, retaining the original functionality and expanding it with several new stabilization strategies. FireProt 2.0 integrates the AlphaFold database and the homology modeling for structure prediction, enabling calculations starting from a sequence. Multiple-point designs are constructed using the Bron-Kerbosch algorithm minimizing the antagonistic effect between the individual mutations. Users can newly limit the FireProt calculation to a set of user-defined mutations, run a saturation mutagenesis of the whole protein or select rigidifying mutations based on B-factors. Evolution-based back-to-consensus strategy is complemented by ancestral sequence reconstruction. FireProt 2.0 is significantly faster and a reworked graphical user interface broadens the tool's availability even to users with older hardware. FireProt 2.0 is freely available at http://loschmidt.chemi.muni.cz/fireprotweb.
Collapse
Affiliation(s)
- Milos Musil
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
- International Clinical Research Centre, St. Anne’s University Hospital Brno, Brno, Czech Republic
| | - Andrej Jezik
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
| | - Jana Horackova
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
| | - Simeon Borko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
- International Clinical Research Centre, St. Anne’s University Hospital Brno, Brno, Czech Republic
| | - Petr Kabourek
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
- International Clinical Research Centre, St. Anne’s University Hospital Brno, Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
- International Clinical Research Centre, St. Anne’s University Hospital Brno, Brno, Czech Republic
| | - David Bednar
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
- International Clinical Research Centre, St. Anne’s University Hospital Brno, Brno, Czech Republic
| |
Collapse
|
2
|
Schafer JW, Porter LL. Evolutionary selection of proteins with two folds. Nat Commun 2023; 14:5478. [PMID: 37673981 PMCID: PMC10482954 DOI: 10.1038/s41467-023-41237-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 08/24/2023] [Indexed: 09/08/2023] Open
Abstract
Although most globular proteins fold into a single stable structure, an increasing number have been shown to remodel their secondary and tertiary structures in response to cellular stimuli. State-of-the-art algorithms predict that these fold-switching proteins adopt only one stable structure, missing their functionally critical alternative folds. Why these algorithms predict a single fold is unclear, but all of them infer protein structure from coevolved amino acid pairs. Here, we hypothesize that coevolutionary signatures are being missed. Suspecting that single-fold variants could be masking these signatures, we developed an approach, called Alternative Contact Enhancement (ACE), to search both highly diverse protein superfamilies-composed of single-fold and fold-switching variants-and protein subfamilies with more fold-switching variants. ACE successfully revealed coevolution of amino acid pairs uniquely corresponding to both conformations of 56/56 fold-switching proteins from distinct families. Then, we used ACE-derived contacts to (1) predict two experimentally consistent conformations of a candidate protein with unsolved structure and (2) develop a blind prediction pipeline for fold-switching proteins. The discovery of widespread dual-fold coevolution indicates that fold-switching sequences have been preserved by natural selection, implying that their functionalities provide evolutionary advantage and paving the way for predictions of diverse protein structures from single sequences.
Collapse
Affiliation(s)
- Joseph W Schafer
- National Library of Medicine, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Lauren L Porter
- National Library of Medicine, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA.
- National Heart, Lung, and Blood Institute, Biochemistry and Biophysics Center, National Institutes of Health, Bethesda, MD, 20892, USA.
| |
Collapse
|
3
|
He R, Zhang J, Shao Y, Gu S, Song C, Qian L, Yin WB, Li Z. Knowledge-guided data mining on the standardized architecture of NRPS: Subtypes, novel motifs, and sequence entanglements. PLoS Comput Biol 2023; 19:e1011100. [PMID: 37186644 DOI: 10.1371/journal.pcbi.1011100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 05/25/2023] [Accepted: 04/12/2023] [Indexed: 05/17/2023] Open
Abstract
Non-ribosomal peptide synthetase (NRPS) is a diverse family of biosynthetic enzymes for the assembly of bioactive peptides. Despite advances in microbial sequencing, the lack of a consistent standard for annotating NRPS domains and modules has made data-driven discoveries challenging. To address this, we introduced a standardized architecture for NRPS, by using known conserved motifs to partition typical domains. This motif-and-intermotif standardization allowed for systematic evaluations of sequence properties from a large number of NRPS pathways, resulting in the most comprehensive cross-kingdom C domain subtype classifications to date, as well as the discovery and experimental validation of novel conserved motifs with functional significance. Furthermore, our coevolution analysis revealed important barriers associated with re-engineering NRPSs and uncovered the entanglement between phylogeny and substrate specificity in NRPS sequences. Our findings provide a comprehensive and statistically insightful analysis of NRPS sequences, opening avenues for future data-driven discoveries.
Collapse
Affiliation(s)
- Ruolin He
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Jinyu Zhang
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, PR China
- Savaid Medical School, University of Chinese Academy of Sciences, Beijing, PR China
| | - Yuanzhe Shao
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Shaohua Gu
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Chen Song
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Long Qian
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Wen-Bing Yin
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, PR China
- Savaid Medical School, University of Chinese Academy of Sciences, Beijing, PR China
| | - Zhiyuan Li
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| |
Collapse
|
4
|
Schafer JW, Porter LL. Evolutionary selection of proteins with two folds. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.18.524637. [PMID: 36789442 PMCID: PMC9928049 DOI: 10.1101/2023.01.18.524637] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Although most globular proteins fold into a single stable structure 1 , an increasing number have been shown to remodel their secondary and tertiary structures in response to cellular stimuli 2 . State-of-the-art algorithms 3-5 predict that these fold-switching proteins assume only one stable structure 6,7 , missing their functionally critical alternative folds. Why these algorithms predict a single fold is unclear, but all of them infer protein structure from coevolved amino acid pairs. Here, we hypothesize that coevolutionary signatures are being missed. Suspecting that over-represented single-fold sequences may be masking these signatures, we developed an approach to search both highly diverse protein superfamilies-composed of single-fold and fold-switching variants-and protein subfamilies with more fold-switching variants. This approach successfully revealed coevolution of amino acid pairs uniquely corresponding to both conformations of 56/58 fold-switching proteins from distinct families. Then, using a set of coevolved amino acid pairs predicted by our approach, we successfully biased AlphaFold2 5 to predict two experimentally consistent conformations of a candidate protein with unsolved structure. The discovery of widespread dual-fold coevolution indicates that fold-switching sequences have been preserved by natural selection, implying that their functionalities provide evolutionary advantage and paving the way for predictions of diverse protein structures from single sequences.
Collapse
Affiliation(s)
- Joseph W. Schafer
- National Library of Medicine, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Lauren L. Porter
- National Library of Medicine, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
- National Heart, Lung, and Blood Institute, Biochemistry and Biophysics Center, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
5
|
Improved inter-residue contact prediction via a hybrid generative model and dynamic loss function. Comput Struct Biotechnol J 2022; 20:6138-6148. [DOI: 10.1016/j.csbj.2022.11.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 11/07/2022] [Accepted: 11/07/2022] [Indexed: 11/13/2022] Open
|
6
|
Li Y, Zhang C, Yu DJ, Zhang Y. Deep learning geometrical potential for high-accuracy ab initio protein structure prediction. iScience 2022; 25:104425. [PMID: 35663033 PMCID: PMC9160776 DOI: 10.1016/j.isci.2022.104425] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 05/02/2022] [Accepted: 05/11/2022] [Indexed: 11/22/2022] Open
Abstract
Ab initio protein structure prediction has been vastly boosted by the modeling of inter-residue contact/distance maps in recent years. We developed a new deep learning model, DeepPotential, which accurately predicts the distribution of a complementary set of geometric descriptors including a novel hydrogen-bonding potential defined by C-alpha atom coordinates. On 154 Free-Modeling/Hard targets from the CASP and CAMEO experiments, DeepPotential demonstrated significant advantage on both geometrical feature prediction and full-length structure construction, with Top-L/5 contact accuracy and TM-score of full-length models 4.1% and 6.7% higher than the best of other deep-learning restraint prediction approaches. Detail analyses showed that the major contributions to the TM-score/contact-map improvements come from the employment of multi-tasking network architecture and metagenome-based MSA collection assisted with confidence-based MSA selection, where hydrogen-bonding and inter-residue orientation predictions help improve hydrogen-bonding network and secondary structure packing. These results demonstrated new progress in the deep-learning restraint-guided ab initio protein structure prediction. Multi-tasking network architecture for multiple inter-residue geometries Novel deep learning model for improved hydrogen-bonding modeling Rapid and high-accuracy Ab initio protein structure prediction
Collapse
Affiliation(s)
- Yang Li
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 21000, China.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 21000, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
7
|
Bohnsack KS, Kaden M, Abel J, Saralajew S, Villmann T. The Resolved Mutual Information Function as a Structural Fingerprint of Biomolecular Sequences for Interpretable Machine Learning Classifiers. ENTROPY (BASEL, SWITZERLAND) 2021; 23:1357. [PMID: 34682081 PMCID: PMC8534762 DOI: 10.3390/e23101357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 10/11/2021] [Accepted: 10/14/2021] [Indexed: 11/16/2022]
Abstract
In the present article we propose the application of variants of the mutual information function as characteristic fingerprints of biomolecular sequences for classification analysis. In particular, we consider the resolved mutual information functions based on Shannon-, Rényi-, and Tsallis-entropy. In combination with interpretable machine learning classifier models based on generalized learning vector quantization, a powerful methodology for sequence classification is achieved which allows substantial knowledge extraction in addition to the high classification ability due to the model-inherent robustness. Any potential (slightly) inferior performance of the used classifier is compensated by the additional knowledge provided by interpretable models. This knowledge may assist the user in the analysis and understanding of the used data and considered task. After theoretical justification of the concepts, we demonstrate the approach for various example data sets covering different areas in biomolecular sequence analysis.
Collapse
Affiliation(s)
- Katrin Sophie Bohnsack
- Saxon Institute for Computational Intelligence and Machine Learning, University of Applied Sciences Mittweida, 09648 Mittweida, Germany; (M.K.); (J.A.)
| | - Marika Kaden
- Saxon Institute for Computational Intelligence and Machine Learning, University of Applied Sciences Mittweida, 09648 Mittweida, Germany; (M.K.); (J.A.)
| | - Julia Abel
- Saxon Institute for Computational Intelligence and Machine Learning, University of Applied Sciences Mittweida, 09648 Mittweida, Germany; (M.K.); (J.A.)
| | - Sascha Saralajew
- Bosch Center for Artificial Intelligence, 71272 Renningen, Germany;
| | - Thomas Villmann
- Saxon Institute for Computational Intelligence and Machine Learning, University of Applied Sciences Mittweida, 09648 Mittweida, Germany; (M.K.); (J.A.)
| |
Collapse
|
8
|
Use of Average Mutual Information and Derived Measures to Find Coding Regions. ENTROPY 2021; 23:e23101324. [PMID: 34682048 PMCID: PMC8534840 DOI: 10.3390/e23101324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 08/09/2021] [Accepted: 09/16/2021] [Indexed: 11/17/2022]
Abstract
One of the important steps in the annotation of genomes is the identification of regions in the genome which code for proteins. One of the tools used by most annotation approaches is the use of signals extracted from genomic regions that can be used to identify whether the region is a protein coding region. Motivated by the fact that these regions are information bearing structures we propose signals based on measures motivated by the average mutual information for use in this task. We show that these signals can be used to identify coding and noncoding sequences with high accuracy. We also show that these signals are robust across species, phyla, and kingdom and can, therefore, be used in species agnostic genome annotation algorithms for identifying protein coding regions. These in turn could be used for gene identification.
Collapse
|
9
|
Li Y, Zhang C, Zheng W, Zhou X, Bell EW, Yu DJ, Zhang Y. Protein inter-residue contact and distance prediction by coupling complementary coevolution features with deep residual networks in CASP14. Proteins 2021; 89:1911-1921. [PMID: 34382712 DOI: 10.1002/prot.26211] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Revised: 07/24/2021] [Accepted: 08/05/2021] [Indexed: 01/12/2023]
Abstract
This article reports and analyzes the results of protein contact and distance prediction by our methods in the 14th Critical Assessment of techniques for protein Structure Prediction (CASP14). A new deep learning-based contact/distance predictor was employed based on the ensemble of two complementary coevolution features coupling with deep residual networks. We also improved our multiple sequence alignment (MSA) generation protocol with wholesale meta-genome sequence databases. On 22 CASP14 free modeling (FM) targets, the proposed model achieved a top-L/5 long-range precision of 63.8% and a mean distance bin error of 1.494. Based on the predicted distance potentials, 11 out of 22 FM targets and all of the 14 FM/template-based modeling (TBM) targets have correctly predicted folds (TM-score >0.5), suggesting that our approach can provide reliable distance potentials for ab initio protein folding.
Collapse
Affiliation(s)
- Yang Li
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Eric W Bell
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
10
|
Mukherjee I, Chakrabarti S. Co-evolutionary landscape at the interface and non-interface regions of protein-protein interaction complexes. Comput Struct Biotechnol J 2021; 19:3779-3795. [PMID: 34285778 PMCID: PMC8271121 DOI: 10.1016/j.csbj.2021.06.039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 06/22/2021] [Accepted: 06/22/2021] [Indexed: 11/16/2022] Open
Abstract
Proteins involved in interactions throughout the course of evolution tend to co-evolve and compensatory changes may occur in interacting proteins to maintain or refine such interactions. However, certain residue pair alterations may prove to be detrimental for functional interactions. Hence, determining co-evolutionary pairings that could be structurally or functionally relevant for maintaining the conservation of an inter-protein interaction is important. Inter-protein co-evolution analysis in several complexes utilizing multiple existing methodologies suggested that co-evolutionary pairings can occur in spatially proximal and distant regions in inter-protein interactions. Subsequently, the Co-Var (Correlated Variation) method based on mutual information and Bhattacharyya coefficient was developed, validated, and found to perform relatively better than CAPS and EV-complex. Interestingly, while applying the Co-Var measure and EV-complex program on a set of protein-protein interaction complexes, co-evolutionary pairings were obtained in interface and non-interface regions in protein complexes. The Co-Var approach involves determining high degree co-evolutionary pairings that include multiple co-evolutionary connections between particular co-evolved residue positions in one protein with multiple residue positions in the binding partner. Detailed analyses of high degree co-evolutionary pairings in protein-protein complexes involved in cancer metastasis suggested that most of the residue positions forming such co-evolutionary connections mainly occurred within functional domains of constituent proteins and substitution mutations were also common among these positions. The physiological relevance of these predictions suggested that Co-Var can predict residues that could be crucial for preserving functional protein-protein interactions. Finally, Co-Var web server (http://www.hpppi.iicb.res.in/ishi/covar/index.html) that implements this methodology identifies co-evolutionary pairings in intra and inter-protein interactions.
Collapse
Affiliation(s)
- Ishita Mukherjee
- Structural Biology and Bioinformatics Division, Council for Scientific and Industrial Research (CSIR) - Indian Institute of Chemical Biology (IICB), Kolkata, West Bengal 700032, India
| | - Saikat Chakrabarti
- Structural Biology and Bioinformatics Division, Council for Scientific and Industrial Research (CSIR) - Indian Institute of Chemical Biology (IICB), Kolkata, West Bengal 700032, India
| |
Collapse
|
11
|
Jernigan R, Jia K, Ren Z, Zhou W. Large-scale multiple inference of collective dependence with applications to protein function. Ann Appl Stat 2021; 15:902-924. [DOI: 10.1214/20-aoas1431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Robert Jernigan
- Department of Biochemistry, Biophysics, and Molecular Biology, Program of Bioinformatics and Computational Biology, Iowa State University
| | - Kejue Jia
- Department of Biochemistry, Biophysics, and Molecular Biology, Program of Bioinformatics and Computational Biology, Iowa State University
| | - Zhao Ren
- Department of Statistics, University of Pittsburgh
| | - Wen Zhou
- Department of Statistics, Colorado State University
| |
Collapse
|
12
|
Huang JS, Huang JM, Zhang W. Semicovariance Coefficient Analysis of Spike Proteins from SARS-CoV-2 and Other Coronaviruses for Viral Evolution and Characteristics Associated with Fatality. ENTROPY 2021; 23:e23050512. [PMID: 33922613 PMCID: PMC8146220 DOI: 10.3390/e23050512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 04/14/2021] [Accepted: 04/20/2021] [Indexed: 11/16/2022]
Abstract
Complex modeling has received significant attention in recent years and is increasingly used to explain statistical phenomena with increasing and decreasing fluctuations, such as the similarity or difference of spike protein charge patterns of coronaviruses. Different from the existing covariance or correlation coefficient methods in traditional integer dimension construction, this study proposes a simplified novel fractional dimension derivation with the exact Excel tool algorithm. It involves the fractional center moment extension to covariance, which results in a complex covariance coefficient that is better than the Pearson correlation coefficient, in the sense that the nonlinearity relationship can be further depicted. The spike protein sequences of coronaviruses were obtained from the GenBank and GISAID databases, including the coronaviruses from pangolin, bat, canine, swine (three variants), feline, tiger, SARS-CoV-1, MERS, and SARS-CoV-2 (including the strains from Wuhan, Beijing, New York, German, and the UK variant B.1.1.7) which were used as the representative examples in this study. By examining the values above and below the average/mean based on the positive and negative charge patterns of the amino acid residues of the spike proteins from coronaviruses, the proposed algorithm provides deep insights into the nonlinear evolving trends of spike proteins for understanding the viral evolution and identifying the protein characteristics associated with viral fatality. The calculation results demonstrate that the complex covariance coefficient analyzed by this algorithm is capable of distinguishing the subtle nonlinear differences in the spike protein charge patterns with reference to Wuhan strain SARS-CoV-2, which the Pearson correlation coefficient may overlook. Our analysis reveals the unique convergent (positive correlative) to divergent (negative correlative) domain center positions of each virus. The convergent or conserved region may be critical to the viral stability or viability; while the divergent region is highly variable between coronaviruses, suggesting high frequency of mutations in this region. The analyses show that the conserved center region of SARS-CoV-1 spike protein is located at amino acid residues 900, but shifted to the amino acid residues 700 in MERS spike protein, and then to amino acid residues 600 in SARS-COV-2 spike protein, indicating the evolution of the coronaviruses. Interestingly, the conserved center region of the spike protein in SARS-COV-2 variant B.1.1.7 shifted back to amino acid residues 700, suggesting this variant is more virulent than the original SARS-COV-2 strain. Another important characteristic our study reveals is that the distance between the divergent mean and the maximal divergent point in each of the viruses (MERS > SARS-CoV-1 > SARS-CoV-2) is proportional to viral fatality rate. This algorithm may help to understand and analyze the evolving trends and critical characteristics of SARS-COV-2 variants, other coronaviral proteins and viruses.
Collapse
Affiliation(s)
- Jun Steed Huang
- School of Information Technology, Carleton University, Ottawa, ON K1S 5B6, Canada;
| | - Jiamin Moran Huang
- Department of Computer Science, Jiangsu University, Suqian 223800, China;
| | - Wandong Zhang
- Human Health Therapeutics Research Centre, National Research Council of Canada, 1200 Montreal Road, Building M54, Ottawa, ON K1A 0R6, Canada
- Faculty of Medicine, University of Ottawa, Ottawa, ON K1H 8M5, Canada
- Correspondence: or ; Tel.: +1-613-993-5988
| |
Collapse
|
13
|
Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks. PLoS Comput Biol 2021; 17:e1008865. [PMID: 33770072 PMCID: PMC8026059 DOI: 10.1371/journal.pcbi.1008865] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 04/07/2021] [Accepted: 03/10/2021] [Indexed: 12/24/2022] Open
Abstract
The topology of protein folds can be specified by the inter-residue contact-maps and accurate contact-map prediction can help ab initio structure folding. We developed TripletRes to deduce protein contact-maps from discretized distance profiles by end-to-end training of deep residual neural-networks. Compared to previous approaches, the major advantage of TripletRes is in its ability to learn and directly fuse a triplet of coevolutionary matrices extracted from the whole-genome and metagenome databases and therefore minimize the information loss during the course of contact model training. TripletRes was tested on a large set of 245 non-homologous proteins from CASP 11&12 and CAMEO experiments and outperformed other top methods from CASP12 by at least 58.4% for the CASP 11&12 targets and 44.4% for the CAMEO targets in the top-L long-range contact precision. On the 31 FM targets from the latest CASP13 challenge, TripletRes achieved the highest precision (71.6%) for the top-L/5 long-range contact predictions. It was also shown that a simple re-training of the TripletRes model with more proteins can lead to further improvement with precisions comparable to state-of-the-art methods developed after CASP13. These results demonstrate a novel efficient approach to extend the power of deep convolutional networks for high-accuracy medium- and long-range protein contact-map predictions starting from primary sequences, which are critical for constructing 3D structure of proteins that lack homologous templates in the PDB library. Ab initio protein folding has been a major unsolved problem in computational biology for more than half a century. Recent community-wide Critical Assessment of Structure Prediction (CASP) experiments have witnessed exciting progress on ab initio structure prediction, which was mainly powered by the boosting of contact-map prediction as the latter can be used as constraints to guide ab initio folding simulations. In this work, we proposed a new open-source deep-learning architecture, TripletRes, built on the residual convolutional neural networks for high-accuracy contact prediction. The large-scale benchmark and blind test results demonstrate competitive performance of the proposed methods to other top approaches in predicting medium- and long-range contact-maps that are critical for guiding protein folding simulations. Detailed data analyses showed that the major advantage of TripletRes lies in the unique protocol to fuse multiple evolutionary feature matrices which are directly extracted from whole-genome and metagenome databases and therefore minimize the information loss during the contact model training.
Collapse
|
14
|
Niu Y, Moghimyfiroozabad S, Moghimyfiroozabad A, Tierney TS, Alavian KN. The factors for the early and late development of midbrain dopaminergic neurons segregate into two distinct evolutionary clusters. BRAIN DISORDERS 2021. [DOI: 10.1016/j.dscb.2021.100002] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
|
15
|
González Méndez AS, Cerón Téllez F, Tórtora Pérez JL, Martínez Rodríguez HA, García Flores MM, Ramírez Álvarez H. Signature patterns in region V4 of small ruminant lentivirus surface protein in sheep and goats. Virus Res 2020; 280:197900. [PMID: 32070688 DOI: 10.1016/j.virusres.2020.197900] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Revised: 01/25/2020] [Accepted: 02/14/2020] [Indexed: 02/05/2023]
Abstract
The env gene in Small Ruminant Lentiviruses (SRLV) encodes the surface glycoprotein (SU) that divides into conserved (C1-C4) and variable regions (V1-V5). SRLV region V4 has been found to be homologous to the V3 region of human lentivirus (HIV). HIV V3 is responsible for tropism and the development of nervous clinical patterns when there is a tendency to conserve amino acids in specific "signature pattern" positions. The goal of this study was to identify signature patterns in the V4 region of the SU, which is encoded by the SRLV env gene. Secondarily, to understand how these signature patterns are associated with different clinical status in naturally infected sheep and goats. Starting with 244 samples from seropositive animals from nine Mexican states, we amplified the V4 region using nested PCR and obtained 49 SRLV sequences from peripheral blood leukocytes. Based on phylogenetic analysis results, we identified three groups: asymptomatic genotypes A (Ssx GA) and B (Ssx GB), as well as animals with arthritic presentation, genotype B (A GB). Similarity levels between group sequences ranged from 67.9%-86.7%, with a genetic diversity ranging from 12.7%-29.5% and a dN / dS ratio that indicated negative selection. Analyses using Vespa and Entropy programs identified four residues at positions 54, 78, 79 and 82 in SU region V4 as possible signature patterns, although with variable statistical significance. However, position 54 residues "N" (p = 0.017), "T" (p = 0.001) and "G" (p = 0.024) in groups A GB, Ssx GA and Ssx GB respectively, best characterized the signature patterns. The results obtained identified a signature pattern related to different genotypes and clinical status by SRLV in sheep and goats.
Collapse
Affiliation(s)
- Ana Silvia González Méndez
- Virology, Genetics and Molecular Biology Laboratory, Faculty of Higher Education, Cuautitlan, Veterinary Medicine, Campus 4, National Autonomous University of Mexico, Km. 2.5 Carretera Cuautitlán-Teoloyucan San Sebastián Xhala, Cuautitlan Izcalli, Estado de México, C.P. 54714, Mexico.
| | - Fernando Cerón Téllez
- Virology, Genetics and Molecular Biology Laboratory, Faculty of Higher Education, Cuautitlan, Veterinary Medicine, Campus 4, National Autonomous University of Mexico, Km. 2.5 Carretera Cuautitlán-Teoloyucan San Sebastián Xhala, Cuautitlan Izcalli, Estado de México, C.P. 54714, Mexico.
| | - Jorge Luis Tórtora Pérez
- Virology, Genetics and Molecular Biology Laboratory, Faculty of Higher Education, Cuautitlan, Veterinary Medicine, Campus 4, National Autonomous University of Mexico, Km. 2.5 Carretera Cuautitlán-Teoloyucan San Sebastián Xhala, Cuautitlan Izcalli, Estado de México, C.P. 54714, Mexico.
| | - Humberto Alejandro Martínez Rodríguez
- Virology, Genetics and Molecular Biology Laboratory, Faculty of Higher Education, Cuautitlan, Veterinary Medicine, Campus 4, National Autonomous University of Mexico, Km. 2.5 Carretera Cuautitlán-Teoloyucan San Sebastián Xhala, Cuautitlan Izcalli, Estado de México, C.P. 54714, Mexico.
| | - María Martha García Flores
- Laboratory of Immunovirology, Medical Research in Immunology Unit, Pediatric Hospital, National Medical Center XXI Century, Mexican Institute of Social Security, Mexico.
| | - Hugo Ramírez Álvarez
- Virology, Genetics and Molecular Biology Laboratory, Faculty of Higher Education, Cuautitlan, Veterinary Medicine, Campus 4, National Autonomous University of Mexico, Km. 2.5 Carretera Cuautitlán-Teoloyucan San Sebastián Xhala, Cuautitlan Izcalli, Estado de México, C.P. 54714, Mexico.
| |
Collapse
|
16
|
Abstract
Viral population numbers are extremely large compared with those of their host species. Population bottlenecks are frequent during the life cycle of viruses and can reduce viral populations transiently to very few individuals. Viruses have to confront several types of constraints that can be divided into basal, cell-dependent, and organism-dependent constraints. Viruses overcome them exploiting a number of molecular mechanisms, with an important contribution of population numbers and genome variation. The adaptive potential of viruses is reflected in modifications of cell tropism and host range, escape to components of the host immune response, and capacity to alternate among different host species, among other phenotypic changes. Despite a fitness cost of most mutations required to overcome a selective constraint, viruses can find evolutionary pathways that ensure their survival in equilibrium with their hosts.
Collapse
|
17
|
Abid N, Chillemi G, Salemi M. Coding-Gene Coevolution Analysis of Rotavirus Proteins: A Bioinformatics and Statistical Approach. Genes (Basel) 2019; 11:genes11010028. [PMID: 31878331 PMCID: PMC7016848 DOI: 10.3390/genes11010028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2019] [Revised: 12/10/2019] [Accepted: 12/19/2019] [Indexed: 01/12/2023] Open
Abstract
Rotavirus remains a major cause of diarrhea in infants and young children worldwide. The permanent emergence of new genotypes puts the potential effectiveness of vaccines under serious question. The distribution of unusual genotypes subject to viral fitness is influenced by interactions among viral proteins. The present work aimed at analyzing the genetic constellation and the coevolution of rotavirus coding genes for the available rotavirus genotypes. Seventy-two full genome sequences of different genetic constellations were analyzed using a genetic algorithm. The results revealed an extensive genome-wide covariance network among the 12 viral proteins. Altogether, the emergence of new genotypes represents a challenge to the outcome and success of vaccination and the coevolutionary analysis of rotavirus proteins may boost efforts to better understand the interaction networks of proteins during viral replication/transcription.
Collapse
Affiliation(s)
- Nabil Abid
- Laboratory of Transmissible Diseases and Biological Active Substances LR99ES27, Faculty of Pharmacy, University of Monastir, Rue Ibn Sina, Monastir 5000, Tunisia
- High Institute of Biotechnology of Sidi Thabet, Department of Biotechnology, University Manouba, BP-66, Ariana-Tunis 2020, Tunisia
- Correspondence: or ; Tel.: +216-92–974-000
| | - Giovanni Chillemi
- Department for Innovation in Biological, Agro-food and Forest systems, DIBAF, University of Tuscia, via S. Camillo de Lellis s.n.c., 01100 Viterbo, Italy;
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, IBIOM, CNR, Via Giovanni Amendola, 122/O, 70126 Bari, Italy
| | - Marco Salemi
- Department of Pathology, Immunology and Laboratory Medicine, University of Florida College of Medicine, Emerging Pathogens Institute, P.O. Box 100009, Gainesville, FL 32610-3633, USA;
| |
Collapse
|
18
|
Musil M, Stourac J, Bendl J, Brezovsky J, Prokop Z, Zendulka J, Martinek T, Bednar D, Damborsky J. FireProt: web server for automated design of thermostable proteins. Nucleic Acids Res 2019; 45:W393-W399. [PMID: 28449074 PMCID: PMC5570187 DOI: 10.1093/nar/gkx285] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2017] [Accepted: 04/11/2017] [Indexed: 01/07/2023] Open
Abstract
There is a continuous interest in increasing proteins stability to enhance their usability in numerous biomedical and biotechnological applications. A number of in silico tools for the prediction of the effect of mutations on protein stability have been developed recently. However, only single-point mutations with a small effect on protein stability are typically predicted with the existing tools and have to be followed by laborious protein expression, purification, and characterization. Here, we present FireProt, a web server for the automated design of multiple-point thermostable mutant proteins that combines structural and evolutionary information in its calculation core. FireProt utilizes sixteen tools and three protein engineering strategies for making reliable protein designs. The server is complemented with interactive, easy-to-use interface that allows users to directly analyze and optionally modify designed thermostable mutants. FireProt is freely available at http://loschmidt.chemi.muni.cz/fireprot.
Collapse
Affiliation(s)
- Milos Musil
- Loschmidt Laboratories, Department of Experimental Biology, Masaryk University, Brno, Czech Republic.,Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic.,International Centre for Clinical Research, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Jan Stourac
- Loschmidt Laboratories, Department of Experimental Biology, Masaryk University, Brno, Czech Republic.,International Centre for Clinical Research, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Jaroslav Bendl
- Loschmidt Laboratories, Department of Experimental Biology, Masaryk University, Brno, Czech Republic.,Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic.,International Centre for Clinical Research, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Jan Brezovsky
- Loschmidt Laboratories, Department of Experimental Biology, Masaryk University, Brno, Czech Republic.,International Centre for Clinical Research, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Zbynek Prokop
- Loschmidt Laboratories, Department of Experimental Biology, Masaryk University, Brno, Czech Republic.,International Centre for Clinical Research, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Jaroslav Zendulka
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic.,Centre of Excellence IT4Innovations, Technical University Ostrava, Ostrava
| | - Tomas Martinek
- Loschmidt Laboratories, Department of Experimental Biology, Masaryk University, Brno, Czech Republic.,Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic.,Centre of Excellence IT4Innovations, Technical University Ostrava, Ostrava
| | - David Bednar
- Loschmidt Laboratories, Department of Experimental Biology, Masaryk University, Brno, Czech Republic.,International Centre for Clinical Research, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology, Masaryk University, Brno, Czech Republic.,International Centre for Clinical Research, St. Anne's University Hospital Brno, Brno, Czech Republic
| |
Collapse
|
19
|
Wu Q, Peng Z, Anishchenko I, Cong Q, Baker D, Yang J. Protein contact prediction using metagenome sequence data and residual neural networks. Bioinformatics 2019; 36:41-48. [PMID: 31173061 PMCID: PMC8792440 DOI: 10.1093/bioinformatics/btz477] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Revised: 05/30/2019] [Accepted: 06/04/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Almost all protein residue contact prediction methods rely on the availability of deep multiple sequence alignments (MSAs). However, many proteins from the poorly populated families do not have sufficient number of homologs in the conventional UniProt database. Here we aim to solve this issue by exploring the rich sequence data from the metagenome sequencing projects. RESULTS Based on the improved MSA constructed from the metagenome sequence data, we developed MapPred, a new deep learning-based contact prediction method. MapPred consists of two component methods, DeepMSA and DeepMeta, both trained with the residual neural networks. DeepMSA was inspired by the recent method DeepCov, which was trained on 441 matrices of covariance features. By considering the symmetry of contact map, we reduced the number of matrices to 231, which makes the training more efficient in DeepMSA. Experiments show that DeepMSA outperforms DeepCov by 10-13% in precision. DeepMeta works by combining predicted contacts and other sequence profile features. Experiments on three benchmark datasets suggest that the contribution from the metagenome sequence data is significant with P-values less than 4.04E-17. MapPred is shown to be complementary and comparable the state-of-the-art methods. The success of MapPred is attributed to three factors: the deeper MSA from the metagenome sequence data, improved feature design in DeepMSA and optimized training by the residual neural networks. AVAILABILITY AND IMPLEMENTATION http://yanglab.nankai.edu.cn/mappred/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qi Wu
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| | - Zhenling Peng
- To whom correspondence should be addressed. E-mail: or
| | - Ivan Anishchenko
- Department of Biochemistry, Seattle, WA 98105, USA,Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Qian Cong
- Department of Biochemistry, Seattle, WA 98105, USA,Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - David Baker
- Department of Biochemistry, Seattle, WA 98105, USA,Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Jianyi Yang
- To whom correspondence should be addressed. E-mail: or
| |
Collapse
|
20
|
Fisher KJ, Kryazhimskiy S, Lang GI. Detecting genetic interactions using parallel evolution in experimental populations. Philos Trans R Soc Lond B Biol Sci 2019; 374:20180237. [PMID: 31154981 DOI: 10.1098/rstb.2018.0237] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Eukaryotic genomes contain thousands of genes organized into complex and interconnected genetic interaction networks. Most of our understanding of how genetic variation affects these networks comes from quantitative-trait loci mapping and from the systematic analysis of double-deletion (or knockdown) mutants, primarily in the yeast Saccharomyces cerevisiae. Evolve and re-sequence experiments are an alternative approach for identifying novel functional variants and genetic interactions, particularly between non-loss-of-function mutations. These experiments leverage natural selection to obtain genotypes with functionally important variants and positive genetic interactions. However, no systematic methods for detecting genetic interactions in these data are yet available. Here, we introduce a computational method based on the idea that variants in genes that interact will co-occur in evolved genotypes more often than expected by chance. We apply this method to a previously published yeast experimental evolution dataset. We find that genetic targets of selection are distributed non-uniformly among evolved genotypes, indicating that genetic interactions had a significant effect on evolutionary trajectories. We identify individual gene pairs with a statistically significant genetic interaction score. The strongest interaction is between genes TRK1 and PHO84, genes that have not been reported to interact in previous systematic studies. Our work demonstrates that leveraging parallelism in experimental evolution is useful for identifying genetic interactions that have escaped detection by other methods. This article is part of the theme issue 'Convergent evolution in the genomics era: new insights and directions'.
Collapse
Affiliation(s)
- Kaitlin J Fisher
- 1 Department of Biological Sciences, Lehigh University , Bethlehem, PA 18015 , USA
| | - Sergey Kryazhimskiy
- 2 Division of Biological Sciences, University of California San Diego , La Jolla, CA 92093 , USA
| | - Gregory I Lang
- 1 Department of Biological Sciences, Lehigh University , Bethlehem, PA 18015 , USA
| |
Collapse
|
21
|
Pathogenicity of the H1N1 influenza virus enhanced by functional synergy between the NPV100I and NAD248N pair. PLoS One 2019; 14:e0217691. [PMID: 31150476 PMCID: PMC6544299 DOI: 10.1371/journal.pone.0217691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Accepted: 05/16/2019] [Indexed: 11/20/2022] Open
Abstract
By comparing and measuring covariations of viral protein sequences from isolates of the 2009 pH1N1 influenza A virus (IAV), specific substitutions that co-occur in the NP-NA pair were identified. To investigate the effect of these co-occurring substitution pairs, the V100I substitution in NP and the D248N substitution in NA were introduced into laboratory-adapted WSN IAVs. The recombinant WSN with the covarying NPV100I-NAD248N pair exhibited enhanced pathogenicity, as characterized by increased viral production, increased death and inflammation of host cells, and high mortality in infected mice. Although direct interactions between the NPV100I and NAD248N proteins were not detected, the RNA-binding ability of NPV100I was increased, which was further strengthened by NAD248N, in expression-plasmid-transfected cells. Additionally, the NAD248N protein was frequently recruited within lipid rafts, indirectly affecting the RNA-binding ability of NP as well as viral release. Altogether, our data indicate that the covarying NPV100I-NAD248N pair obtained from 2009 pH1N1 IAV sequence information function together to synergistically augment viral assembly and release, which may explain the observed enhanced viral pathogenicity.
Collapse
|
22
|
Savel D, Koyutürk M. Characterizing human genomic coevolution in locus-gene regulatory interactions. BioData Min 2019; 12:8. [PMID: 30923571 PMCID: PMC6419833 DOI: 10.1186/s13040-019-0195-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Accepted: 02/19/2019] [Indexed: 11/10/2022] Open
Abstract
Background Coevolution has been used to identify and predict interactions and functional relationships between proteins of many different organisms including humans. Current efforts in annotating the human genome increasingly show that non-coding DNA sequence has important functional and regulatory interactions. Furthermore, regulatory elements do not necessarily reside in close proximity of the coding region for their target genes. Results We characterize coevolution as it appears in locus-gene interactions in the human genome, focusing on expression Quantitative Trait - Locus (eQTL) interactions. Our results show that in these interactions the conservation status of the loci is predictive of the conservation status of their target genes. Furthermore, comparing the phylogenetic histories of intra-chromosomal pairs of loci and transcription start sites, we find that pairs that appear coevolved are enriched for cis-eQTL interactions. Exploring this property we found that coevolution might be useful in prioritizing association tests in cis-eQTL detection. Conclusions The relationship between the conservation status of pairs of loci and protein coding transcription start sites reveal correlations with regulatory interactions. Pairs that appear coevolved are enriched for intra-chromosomal regulatory interactions, thus our results suggest that measures of coevolution can be useful for prediction and detection of new interactions. Measures of coevolution are genome-wide and could potentially be used to prioritize the detection of distant or inter-chromosomal interactions such as trans-eQTL interactions in the human genome.
Collapse
Affiliation(s)
- Daniel Savel
- 1Department of Electrical Engineering and Computer Science, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, 44106 OH USA
| | - Mehmet Koyutürk
- 1Department of Electrical Engineering and Computer Science, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, 44106 OH USA.,2Center for Proteomics and Bioinformatics, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, 44106 OH USA
| |
Collapse
|
23
|
Jing X, Dong Q, Lu R, Dong Q. Protein Inter-Residue Contacts Prediction: Methods, Performances and Applications. Curr Bioinform 2019. [DOI: 10.2174/1574893613666181109130430] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:Protein inter-residue contacts prediction play an important role in the field of protein structure and function research. As a low-dimensional representation of protein tertiary structure, protein inter-residue contacts could greatly help de novo protein structure prediction methods to reduce the conformational search space. Over the past two decades, various methods have been developed for protein inter-residue contacts prediction.Objective:We provide a comprehensive and systematic review of protein inter-residue contacts prediction methods.Results:Protein inter-residue contacts prediction methods are roughly classified into five categories: correlated mutations methods, machine-learning methods, fusion methods, templatebased methods and 3D model-based methods. In this paper, firstly we describe the common definition of protein inter-residue contacts and show the typical application of protein inter-residue contacts. Then, we present a comprehensive review of the three main categories for protein interresidue contacts prediction: correlated mutations methods, machine-learning methods and fusion methods. Besides, we analyze the constraints for each category. Furthermore, we compare several representative methods on the CASP11 dataset and discuss performances of these methods in detail.Conclusion:Correlated mutations methods achieve better performances for long-range contacts, while the machine-learning method performs well for short-range contacts. Fusion methods could take advantage of the machine-learning and correlated mutations methods. Employing more effective fusion strategy could be helpful to further improve the performances of fusion methods.
Collapse
Affiliation(s)
- Xiaoyang Jing
- School of Computer Science, Fudan University, Shanghai, China
| | - Qimin Dong
- Vocational and Technical Education Center of Linxi County, Chifeng, Inner Mongolia, China
| | - Ruqian Lu
- School of Computer Science, Fudan University, Shanghai, China
| | - Qiwen Dong
- Faculty of Education, East China Normal University, Shanghai, China
| |
Collapse
|
24
|
Hosseini M, Pratas D, Pinho AJ. AC: A Compression Tool for Amino Acid Sequences. Interdiscip Sci 2019; 11:68-76. [PMID: 30721401 DOI: 10.1007/s12539-019-00322-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Revised: 01/23/2019] [Accepted: 01/28/2019] [Indexed: 10/27/2022]
|
25
|
Abstract
The comparative study of homologous proteins can provide abundant information about the functional and structural constraints on protein evolution. For example, an amino acid substitution that is deleterious may become permissive in the presence of another substitution at a second site of the protein. A popular approach for detecting coevolving residues is by looking for correlated substitution events on branches of the molecular phylogeny relating the protein-coding sequences. Here we describe a machine learning method (Bayesian graphical models) implemented in the open-source phylogenetic software package HyPhy, http://hyphy.org , for extracting a network of coevolving residues from a sequence alignment.
Collapse
|
26
|
Gil N, Fiser A. Identifying functionally informative evolutionary sequence profiles. Bioinformatics 2018; 34:1278-1286. [PMID: 29211823 PMCID: PMC5905606 DOI: 10.1093/bioinformatics/btx779] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Accepted: 11/29/2017] [Indexed: 01/06/2023] Open
Abstract
Motivation Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. Results We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases. Availability and implementation: A freely accessible program, including source code, implementing SAMMI is available at https://github.com/nelsongil92/SAMMI.git. Contact andras.fiser@einstein.yu.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nelson Gil
- Department of Systems & Computational Biology, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Andras Fiser
- Department of Systems & Computational Biology, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| |
Collapse
|
27
|
Nicoludis JM, Gaudet R. Applications of sequence coevolution in membrane protein biochemistry. BIOCHIMICA ET BIOPHYSICA ACTA. BIOMEMBRANES 2018; 1860:895-908. [PMID: 28993150 PMCID: PMC5807202 DOI: 10.1016/j.bbamem.2017.10.004] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Revised: 09/28/2017] [Accepted: 10/02/2017] [Indexed: 12/22/2022]
Abstract
Recently, protein sequence coevolution analysis has matured into a predictive powerhouse for protein structure and function. Direct methods, which use global statistical models of sequence coevolution, have enabled the prediction of membrane and disordered protein structures, protein complex architectures, and the functional effects of mutations in proteins. The field of membrane protein biochemistry and structural biology has embraced these computational techniques, which provide functional and structural information in an otherwise experimentally-challenging field. Here we review recent applications of protein sequence coevolution analysis to membrane protein structure and function and highlight the promising directions and future obstacles in these fields. We provide insights and guidelines for membrane protein biochemists who wish to apply sequence coevolution analysis to a given experimental system.
Collapse
Affiliation(s)
- John M Nicoludis
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, United States
| | - Rachelle Gaudet
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, 02138, United States.
| |
Collapse
|
28
|
Suplatov D, Sharapova Y, Timonina D, Kopylov K, Švedas V. The visualCMAT: A web-server to select and interpret correlated mutations/co-evolving residues in protein families. J Bioinform Comput Biol 2017; 16:1840005. [PMID: 29361894 DOI: 10.1142/s021972001840005x] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The visualCMAT web-server was designed to assist experimental research in the fields of protein/enzyme biochemistry, protein engineering, and drug discovery by providing an intuitive and easy-to-use interface to the analysis of correlated mutations/co-evolving residues. Sequence and structural information describing homologous proteins are used to predict correlated substitutions by the Mutual information-based CMAT approach, classify them into spatially close co-evolving pairs, which either form a direct physical contact or interact with the same ligand (e.g. a substrate or a crystallographic water molecule), and long-range correlations, annotate and rank binding sites on the protein surface by the presence of statistically significant co-evolving positions. The results of the visualCMAT are organized for a convenient visual analysis and can be downloaded to a local computer as a content-rich all-in-one PyMol session file with multiple layers of annotation corresponding to bioinformatic, statistical and structural analyses of the predicted co-evolution, or further studied online using the built-in interactive analysis tools. The online interactivity is implemented in HTML5 and therefore neither plugins nor Java are required. The visualCMAT web-server is integrated with the Mustguseal web-server capable of constructing large structure-guided sequence alignments of protein families and superfamilies using all available information about their structures and sequences in public databases. The visualCMAT web-server can be used to understand the relationship between structure and function in proteins, implemented at selecting hotspots and compensatory mutations for rational design and directed evolution experiments to produce novel enzymes with improved properties, and employed at studying the mechanism of selective ligand's binding and allosteric communication between topologically independent sites in protein structures. The web-server is freely available at https://biokinet.belozersky.msu.ru/visualcmat and there are no login requirements.
Collapse
Affiliation(s)
- Dmitry Suplatov
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| | - Yana Sharapova
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| | - Daria Timonina
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| | - Kirill Kopylov
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| | - Vytas Švedas
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| |
Collapse
|
29
|
Niu Y, Moghimyfiroozabad S, Safaie S, Yang Y, Jonas EA, Alavian KN. Phylogenetic Profiling of Mitochondrial Proteins and Integration Analysis of Bacterial Transcription Units Suggest Evolution of F1Fo ATP Synthase from Multiple Modules. J Mol Evol 2017; 85:219-233. [PMID: 29177973 PMCID: PMC5709465 DOI: 10.1007/s00239-017-9819-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Accepted: 11/11/2017] [Indexed: 11/26/2022]
Abstract
ATP synthase is a complex universal enzyme responsible for ATP synthesis across all kingdoms of life. The F-type ATP synthase has been suggested to have evolved from two functionally independent, catalytic (F1) and membrane bound (Fo), ancestral modules. While the modular evolution of the synthase is supported by studies indicating independent assembly of the two subunits, the presence of intermediate assembly products suggests a more complex evolutionary process. We analyzed the phylogenetic profiles of the human mitochondrial proteins and bacterial transcription units to gain additional insight into the evolution of the F-type ATP synthase complex. In this study, we report the presence of intermediary modules based on the phylogenetic profiles of the human mitochondrial proteins. The two main intermediary modules comprise the α3β3 hexamer in the F1 and the c-subunit ring in the Fo. A comprehensive analysis of bacterial transcription units of F1Fo ATP synthase revealed that while a long and constant order of F1Fo ATP synthase genes exists in a majority of bacterial genomes, highly conserved combinations of separate transcription units are present among certain bacterial classes and phyla. Based on our findings, we propose a model that includes the involvement of multiple modules in the evolution of F1Fo ATP synthase. The central and peripheral stalk subunits provide a link for the integration of the F1/Fo modules.
Collapse
Affiliation(s)
- Yulong Niu
- Division of Brain Sciences, Department of Medicine, Imperial College London, E508, Burlington Danes Hammersmith Hospital, DuCane Road, London, W12 0NN, UK
- Key Lab of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, People's Republic of China
- Department of Internal Medicine, Endocrinology, Yale University, New Haven, CT, USA
| | | | - Sepehr Safaie
- Department of Mathematics and Computer Science, The Bahá'í Institute for Higher Education (BIHE), Tehran, Iran
| | - Yi Yang
- Key Lab of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, People's Republic of China
| | - Elizabeth A Jonas
- Department of Internal Medicine, Endocrinology, Yale University, New Haven, CT, USA
| | - Kambiz N Alavian
- Division of Brain Sciences, Department of Medicine, Imperial College London, E508, Burlington Danes Hammersmith Hospital, DuCane Road, London, W12 0NN, UK.
- Department of Biology, The Bahá'í Institute for Higher Education (BIHE), Tehran, Iran.
- Department of Internal Medicine, Endocrinology, Yale University, New Haven, CT, USA.
| |
Collapse
|
30
|
Tully DC, Fares MA. Unravelling Selection Shifts among Foot-and-Mouth Disease virus (FMDV) Serotypes. Evol Bioinform Online 2017. [DOI: 10.1177/117693430600200009] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
FMDV virus has been increasingly recognised as the most economically severe animal virus with a remarkable degree of antigenic diversity. Using an integrative evolutionary and computational approach we have compelling evidence for heterogeneity in the selection forces shaping the evolution of the seven different FMDV serotypes. Our results show that positive Darwinian selection has governed the evolution of the major antigenic regions of serotypes A, Asia1, O, SAT1 and SAT2, but not C or SAT3. Co-evolution between sites from antigenic regions under positive selection pinpoints their functional communication to generate immune-escape mutants while maintaining their ability to recognise the host-cell receptors. Neural network and functional divergence analyses strongly point to selection shifts between the different serotypes. Our results suggest that, unlike African FMDV serotypes, serotypes with wide geographical distribution have accumulated compensatory mutations as a strategy to ameliorate the effect of slightly deleterious mutations fixed by genetic drift. This strategy may have provided the virus by a flexibility to generate immune-escape mutants and yet recognise host-cell receptors. African serotypes presented no evidence for compensatory mutations. Our results support heterogeneous selective constraints affecting the different serotypes. This points to the possible accelerated rates of evolution diverging serotypes sharing geographical locations as to ameliorate the competition for the host.
Collapse
Affiliation(s)
- Damien C. Tully
- Molecular Evolution and Bioinformatics Laboratory, Biology Department, National University of Ireland, Maynooth, Co. Kildare, Ireland
- Department of Genetics, Smurfit Institute, University of Dublin, Trinity College, Co. Dublin, Ireland
| | - Mario A. Fares
- Molecular Evolution and Bioinformatics Laboratory, Biology Department, National University of Ireland, Maynooth, Co. Kildare, Ireland
- Department of Genetics, Smurfit Institute, University of Dublin, Trinity College, Co. Dublin, Ireland
| |
Collapse
|
31
|
Jing X, Dong Q, Lu R. RRCRank: a fusion method using rank strategy for residue-residue contact prediction. BMC Bioinformatics 2017; 18:390. [PMID: 28865433 PMCID: PMC5581475 DOI: 10.1186/s12859-017-1811-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Accepted: 08/28/2017] [Indexed: 11/10/2022] Open
Abstract
Background In structural biology area, protein residue-residue contacts play a crucial role in protein structure prediction. Some researchers have found that the predicted residue-residue contacts could effectively constrain the conformational search space, which is significant for de novo protein structure prediction. In the last few decades, related researchers have developed various methods to predict residue-residue contacts, especially, significant performance has been achieved by using fusion methods in recent years. In this work, a novel fusion method based on rank strategy has been proposed to predict contacts. Unlike the traditional regression or classification strategies, the contact prediction task is regarded as a ranking task. First, two kinds of features are extracted from correlated mutations methods and ensemble machine-learning classifiers, and then the proposed method uses the learning-to-rank algorithm to predict contact probability of each residue pair. Results First, we perform two benchmark tests for the proposed fusion method (RRCRank) on CASP11 dataset and CASP12 dataset respectively. The test results show that the RRCRank method outperforms other well-developed methods, especially for medium and short range contacts. Second, in order to verify the superiority of ranking strategy, we predict contacts by using the traditional regression and classification strategies based on the same features as ranking strategy. Compared with these two traditional strategies, the proposed ranking strategy shows better performance for three contact types, in particular for long range contacts. Third, the proposed RRCRank has been compared with several state-of-the-art methods in CASP11 and CASP12. The results show that the RRCRank could achieve comparable prediction precisions and is better than three methods in most assessment metrics. Conclusions The learning-to-rank algorithm is introduced to develop a novel rank-based method for the residue-residue contact prediction of proteins, which achieves state-of-the-art performance based on the extensive assessment. Electronic supplementary material The online version of this article (10.1186/s12859-017-1811-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiaoyang Jing
- School of Computer Science, Fudan University, Shanghai, 200433, People's Republic of China
| | - Qiwen Dong
- School of Data Science and Engineering, East China Normal University, Shanghai, 200062, People's Republic of China.
| | - Ruqian Lu
- School of Computer Science, Fudan University, Shanghai, 200433, People's Republic of China
| |
Collapse
|
32
|
Mandloi S, Chakrabarti S. Protein sites with more coevolutionary connections tend to evolve slower, while more variable protein families acquire higher coevolutionary connections. F1000Res 2017; 6:453. [PMID: 28751967 PMCID: PMC5506539 DOI: 10.12688/f1000research.11251.2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/05/2017] [Indexed: 11/20/2022] Open
Abstract
Background: Amino acid exchanges within proteins sometimes compensate for one another and could therefore be co-evolved. It is essential to investigate the intricate relationship between the extent of coevolution and the evolutionary variability exerted at individual protein sites, as well as the whole protein. Methods: In this study, we have used a reliable set of coevolutionary connections (sites within 10Å spatial distance) and investigated their correlation with the evolutionary diversity within the respective protein sites. Results: Based on our observations, we propose an interesting hypothesis that higher numbers of coevolutionary connections are associated with lesser evolutionary variable protein sites, while higher numbers of the coevolutionary connections can be observed for a protein family that has higher evolutionary variability. Our findings also indicate that highly coevolved sites located in a solvent accessible state tend to be less evolutionary variable. This relationship reverts at the whole protein level where cytoplasmic and extracellular proteins show moderately higher anti-correlation between the number of coevolutionary connections and the average evolutionary conservation of the whole protein. Conclusions: Observations and hypothesis presented in this study provide intriguing insights towards understanding the critical relationship between coevolutionary and evolutionary changes observed within proteins. Our observations encourage further investigation to find out the reasons behind subtle variations in the relationship between coevolutionary connectivity and evolutionary diversity for proteins located at various cellular localizations and/or involved in different molecular-biological functions.
Collapse
Affiliation(s)
- Sapan Mandloi
- Department of Structural Biology and Bioinformatics Division, Council of Scientific and Industrial Research, Indian Institute of Chemical Biology, Kolkata, West Bengal, 700032, India
| | - Saikat Chakrabarti
- Department of Structural Biology and Bioinformatics Division, Council of Scientific and Industrial Research, Indian Institute of Chemical Biology, Kolkata, West Bengal, 700032, India
| |
Collapse
|
33
|
Nshogozabahizi JC, Dench J, Aris-Brosou S. Widespread Historical Contingency in Influenza Viruses. Genetics 2017; 205:409-420. [PMID: 28049709 PMCID: PMC5223518 DOI: 10.1534/genetics.116.193979] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2016] [Accepted: 11/04/2016] [Indexed: 11/18/2022] Open
Abstract
In systems biology and genomics, epistasis characterizes the impact that a substitution at a particular location in a genome can have on a substitution at another location. This phenomenon is often implicated in the evolution of drug resistance or to explain why particular "disease-causing" mutations do not have the same outcome in all individuals. Hence, uncovering these mutations and their locations in a genome is a central question in biology. However, epistasis is notoriously difficult to uncover, especially in fast-evolving organisms. Here, we present a novel statistical approach that replies on a model developed in ecology and that we adapt to analyze genetic data in fast-evolving systems such as the influenza A virus. We validate the approach using a two-pronged strategy: extensive simulations demonstrate a low-to-moderate sensitivity with excellent specificity and precision, while analyses of experimentally validated data recover known interactions, including in a eukaryotic system. We further evaluate the ability of our approach to detect correlated evolution during antigenic shifts or at the emergence of drug resistance. We show that in all cases, correlated evolution is prevalent in influenza A viruses, involving many pairs of sites linked together in chains; a hallmark of historical contingency. Strikingly, interacting sites are separated by large physical distances, which entails either long-range conformational changes or functional tradeoffs, for which we find support with the emergence of drug resistance. Our work paves a new way for the unbiased detection of epistasis in a wide range of organisms by performing whole-genome scans.
Collapse
Affiliation(s)
| | - Jonathan Dench
- Department of Biology, University of Ottawa, Ontario K1N 6N5, Canada
| | - Stéphane Aris-Brosou
- Department of Biology, University of Ottawa, Ontario K1N 6N5, Canada
- Department of Mathematics and Statistics, University of Ottawa, Ontario K1N 6N5, Canada
| |
Collapse
|
34
|
Affiliation(s)
- Jeffrey B. Joy
- BC Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
- University of British Columbia, Department of Medicine, Vancouver, British Columbia, Canada
| | - Richard H. Liang
- BC Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
| | | | - T. Nguyen
- BC Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
| | - Art F. Y. Poon
- BC Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
- University of British Columbia, Department of Medicine, Vancouver, British Columbia, Canada
| |
Collapse
|
35
|
Bendl J, Stourac J, Sebestova E, Vavra O, Musil M, Brezovsky J, Damborsky J. HotSpot Wizard 2.0: automated design of site-specific mutations and smart libraries in protein engineering. Nucleic Acids Res 2016; 44:W479-87. [PMID: 27174934 PMCID: PMC4987947 DOI: 10.1093/nar/gkw416] [Citation(s) in RCA: 62] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2016] [Accepted: 05/03/2016] [Indexed: 01/13/2023] Open
Abstract
HotSpot Wizard 2.0 is a web server for automated identification of hot spots and design of smart libraries for engineering proteins' stability, catalytic activity, substrate specificity and enantioselectivity. The server integrates sequence, structural and evolutionary information obtained from 3 databases and 20 computational tools. Users are guided through the processes of selecting hot spots using four different protein engineering strategies and optimizing the resulting library's size by narrowing down a set of substitutions at individual randomized positions. The only required input is a query protein structure. The results of the calculations are mapped onto the protein's structure and visualized with a JSmol applet. HotSpot Wizard lists annotated residues suitable for mutagenesis and can automatically design appropriate codons for each implemented strategy. Overall, HotSpot Wizard provides comprehensive annotations of protein structures and assists protein engineers with the rational design of site-specific mutations and focused libraries. It is freely available at http://loschmidt.chemi.muni.cz/hotspotwizard.
Collapse
Affiliation(s)
- Jaroslav Bendl
- Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment RECETOX, Masaryk University, 625 00 Brno, Czech Republic Department of Information Systems, Faculty of Information Technology, Brno University of Technology, 612 66 Brno, Czech Republic International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Jan Stourac
- Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment RECETOX, Masaryk University, 625 00 Brno, Czech Republic
| | - Eva Sebestova
- Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment RECETOX, Masaryk University, 625 00 Brno, Czech Republic
| | - Ondrej Vavra
- Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment RECETOX, Masaryk University, 625 00 Brno, Czech Republic
| | - Milos Musil
- Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment RECETOX, Masaryk University, 625 00 Brno, Czech Republic Department of Information Systems, Faculty of Information Technology, Brno University of Technology, 612 66 Brno, Czech Republic
| | - Jan Brezovsky
- Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment RECETOX, Masaryk University, 625 00 Brno, Czech Republic International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment RECETOX, Masaryk University, 625 00 Brno, Czech Republic International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| |
Collapse
|
36
|
Ondondo B, Murakoshi H, Clutton G, Abdul-Jawad S, Wee EGT, Gatanaga H, Oka S, McMichael AJ, Takiguchi M, Korber B, Hanke T. Novel Conserved-region T-cell Mosaic Vaccine With High Global HIV-1 Coverage Is Recognized by Protective Responses in Untreated Infection. Mol Ther 2016; 24:832-42. [PMID: 26743582 PMCID: PMC4886941 DOI: 10.1038/mt.2016.3] [Citation(s) in RCA: 90] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Accepted: 12/31/2015] [Indexed: 12/12/2022] Open
Abstract
An effective human immunodeficiency virus type 1 (HIV-1) vaccine is the best solution for halting the acquired immune deficiency syndrome epidemic. Here, we describe the design and preclinical immunogenicity of T-cell vaccine expressing novel immunogens tHIVconsvX, vectored by DNA, simian (chimpanzee) adenovirus, and poxvirus modified vaccinia virus Ankara (MVA), a combination highly immunogenic in humans. The tHIVconsvX immunogens combine the three leading strategies for elicitation of effective CD8(+) T cells: use of regions of HIV-1 proteins functionally conserved across all M group viruses (to make HIV-1 escape costly on viral fitness), inclusion of bivalent complementary mosaic immunogens (to maximize global epitope matching and breadth of responses, and block common escape paths), and inclusion of epitopes known to be associated with low viral load in infected untreated people (to induce field-proven protective responses). tHIVconsvX was highly immunogenic in two strains of mice. Furthermore, the magnitude and breadth of CD8(+) T-cell responses to tHIVconsvX-derived peptides in treatment-naive HIV-1(+) patients significantly correlated with high CD4(+) T-cell count and low viral load. Overall, the tHIVconsvX design, combining the mosaic and conserved-region approaches, provides an indisputably better coverage of global HIV-1 variants than previous T-cell vaccines. These immunogens delivered in a highly immunogenic framework of adenovirus prime and MVA boost are ready for clinical development.
Collapse
Affiliation(s)
- Beatrice Ondondo
- The Jenner Institute, University of Oxford, Roosevelt Drive, Oxford, UK
| | | | - Genevieve Clutton
- The Jenner Institute, University of Oxford, Roosevelt Drive, Oxford, UK
- Current address: Department of Microbiology and Immunology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | | | - Edmund G-T Wee
- The Jenner Institute, University of Oxford, Roosevelt Drive, Oxford, UK
| | - Hiroyuki Gatanaga
- Center for AIDS Research, Kumamoto University, Kumamoto, Japan
- AIDS Clinical Center, National Center for Global Health and Medicine, Tokyo, Japan
| | - Shinichi Oka
- Center for AIDS Research, Kumamoto University, Kumamoto, Japan
- AIDS Clinical Center, National Center for Global Health and Medicine, Tokyo, Japan
| | | | - Masafumi Takiguchi
- Center for AIDS Research, Kumamoto University, Kumamoto, Japan
- International Research Center for Medical Sciences, Kumamoto University, Kumamoto, Japan
| | - Bette Korber
- Los Alamo National Laboratory, Theoretical Biology and Biophysics, Los Alamos, New Mexico, USA
- The New Mexico Consortium, Los Alamos, New Mexico, USA
| | - Tomáš Hanke
- The Jenner Institute, University of Oxford, Roosevelt Drive, Oxford, UK
- International Research Center for Medical Sciences, Kumamoto University, Kumamoto, Japan
| |
Collapse
|
37
|
Baker FN, Porollo A. CoeViz: a web-based tool for coevolution analysis of protein residues. BMC Bioinformatics 2016; 17:119. [PMID: 26956673 PMCID: PMC4782369 DOI: 10.1186/s12859-016-0975-z] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2015] [Accepted: 03/01/2016] [Indexed: 11/30/2022] Open
Abstract
Background Proteins generally perform their function in a folded state. Residues forming an active site, whether it is a catalytic center or interaction interface, are frequently distant in a protein sequence. Hence, traditional sequence-based prediction methods focusing on a single residue (or a short window of residues) at a time may have difficulties in identifying and clustering the residues constituting a functional site, especially when a protein has multiple functions. Evolutionary information encoded in multiple sequence alignments is known to greatly improve sequence-based predictions. Identification of coevolving residues further advances the protein structure and function annotation by revealing cooperative pairs and higher order groupings of residues. Results We present a new web-based tool (CoeViz) that provides a versatile analysis and visualization of pairwise coevolution of amino acid residues. The tool computes three covariance metrics: mutual information, chi-square statistic, Pearson correlation, and one conservation metric: joint Shannon entropy. Implemented adjustments of covariance scores include phylogeny correction, corrections for sequence dissimilarity and alignment gaps, and the average product correction. Visualization of residue relationships is enhanced by hierarchical cluster trees, heat maps, circular diagrams, and the residue highlighting in protein sequence and 3D structure. Unlike other existing tools, CoeViz is not limited to analyzing conserved domains or protein families and can process long, unstructured and multi-domain proteins thousands of residues long. Two examples are provided to illustrate the use of the tool for identification of residues (1) involved in enzymatic function, (2) forming short linear functional motifs, and (3) constituting a structural domain. Conclusions CoeViz represents a practical resource for a quick sequence-based protein annotation for molecular biologists, e.g., for identifying putative functional clusters of residues and structural domains. CoeViz also can serve computational biologists as a resource of coevolution matrices, e.g., for developing machine learning-based prediction models. The presented tool is integrated in the POLYVIEW-2D server (http://polyview.cchmc.org/) and available from resulting pages of POLYVIEW-2D.
Collapse
Affiliation(s)
- Frazier N Baker
- Department of Electrical Engineering and Computing Systems, University of Cincinnati, 2901 Woodside Drive, Cincinnati, OH, 45221, USA. .,Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH, 45229, USA.
| | - Aleksey Porollo
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH, 45229, USA. .,Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH, 45229, USA.
| |
Collapse
|
38
|
Abdul-Jawad S, Ondondo B, van Hateren A, Gardner A, Elliott T, Korber B, Hanke T. Increased Valency of Conserved-mosaic Vaccines Enhances the Breadth and Depth of Epitope Recognition. Mol Ther 2016; 24:375-384. [PMID: 26581160 PMCID: PMC4817818 DOI: 10.1038/mt.2015.210] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2015] [Accepted: 11/09/2015] [Indexed: 12/19/2022] Open
Abstract
The biggest roadblock in development of effective vaccines against human immunodeficiency virus type 1 (HIV-1) is the virus genetic diversity. For T-cell vaccine, this can be tackled by focusing the vaccine-elicited T-cells on the highly functionally conserved regions of HIV-1 proteins, mutations in which typically cause a replicative fitness loss, and by computing multivalent mosaic proteins, which maximize the coverage of potential 9-mer T-cell epitopes of the input viral sequences. Our first conserved region vaccines HIVconsv employed clade alternating consensus sequences and showed promise in the initial clinical trials in terms of magnitude and breadth of elicited CD8(+) T-cells. Here, monitoring T-cells restricted by HLA-A*02:01 in transgenic mice, we assessed whether or not the tHIVconsv design (HIVconsv with a tissue plasminogen activator leader sequence) benefits from combining with a complementing conserved mosaic immunogen tHIVcmo, and compared the bivalent immunization to that with trivalent conserved mosaic vaccines. A hierarchy of tHIVconsv ≤ tHIVconsv+tHIVcmo < tCmo1+tCmo2+tCmo3 vaccinations for induction of CD8(+) T-cell responses was observed in terms of recognition of tested peptide variants. Thus, our HLA-A*02:01-restricted epitope data concur with previously published mouse and macaque observations and suggest that even conserved region vaccines benefit from oligovalent mosaic design.
Collapse
Affiliation(s)
| | | | - Andy van Hateren
- Faculty of Medicine and Institute for Life Science, University of Southampton, Southampton, UK
| | | | - Tim Elliott
- Faculty of Medicine and Institute for Life Science, University of Southampton, Southampton, UK
| | - Bette Korber
- Los Alamos National Laboratory, Theoretical Biology and Biophysics, Los Alamos, New Mexico, USA; The New Mexico Consortium, Los Alamos, New Mexico, USA
| | - Tomáš Hanke
- The Jenner Institute, University of Oxford, Oxford, UK; International Research Center for Medical Sciences, Kumamoto University, Kumamoto, Japan.
| |
Collapse
|
39
|
Domingo E. Interaction of Virus Populations with Their Hosts. VIRUS AS POPULATIONS 2016. [PMCID: PMC7150142 DOI: 10.1016/b978-0-12-800837-9.00004-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Viral population numbers are extremely large compared with those of their host species. Population bottlenecks are frequent during the life cycle of viruses and can reduce viral populations transiently to very few individuals. Viruses have to confront several types of constraints that can be divided in basal, cell-dependent, and organism-dependent constraints. Viruses overcome them exploiting a number of molecular mechanisms, with an important contribution of population numbers and genome variation. The adaptive potential of viruses is reflected in modifications of cell tropism and host range, escape to components of the host immune response, and capacity to alternate among different host species, among other phenotypic changes. Despite a fitness cost of most mutations required to overcome a selective constraint, viruses can find evolutionary pathways that ensure their survival in equilibrium with their hosts.
Collapse
|
40
|
Esmaielbeiki R, Krawczyk K, Knapp B, Nebel JC, Deane CM. Progress and challenges in predicting protein interfaces. Brief Bioinform 2016; 17:117-31. [PMID: 25971595 PMCID: PMC4719070 DOI: 10.1093/bib/bbv027] [Citation(s) in RCA: 100] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Revised: 03/18/2015] [Indexed: 12/31/2022] Open
Abstract
The majority of biological processes are mediated via protein-protein interactions. Determination of residues participating in such interactions improves our understanding of molecular mechanisms and facilitates the development of therapeutics. Experimental approaches to identifying interacting residues, such as mutagenesis, are costly and time-consuming and thus, computational methods for this purpose could streamline conventional pipelines. Here we review the field of computational protein interface prediction. We make a distinction between methods which address proteins in general and those targeted at antibodies, owing to the radically different binding mechanism of antibodies. We organize the multitude of currently available methods hierarchically based on required input and prediction principles to provide an overview of the field.
Collapse
|
41
|
Coevolution Analysis of HIV-1 Envelope Glycoprotein Complex. PLoS One 2015; 10:e0143245. [PMID: 26579711 PMCID: PMC4651434 DOI: 10.1371/journal.pone.0143245] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2015] [Accepted: 11/02/2015] [Indexed: 11/19/2022] Open
Abstract
The HIV-1 Env spike is the main protein complex that facilitates HIV-1 entry into CD4+ host cells. HIV-1 entry is a multistep process that is not yet completely understood. This process involves several protein-protein interactions between HIV-1 Env and a variety of host cell receptors along with many conformational changes within the spike. HIV-1 Env developed due to high mutation rates and plasticity escape strategies from immense immune pressure and entry inhibitors. We applied a coevolution and residue-residue contact detecting method to identify coevolution patterns within HIV-1 Env protein sequences representing all group M subtypes. We identified 424 coevolving residue pairs within HIV-1 Env. The majority of predicted pairs are residue-residue contacts and are proximal in 3D structure. Furthermore, many of the detected pairs have functional implications due to contributions in either CD4 or coreceptor binding, or variable loop, gp120-gp41, and interdomain interactions. This study provides a new dimension of information in HIV research. The identified residue couplings may not only be important in assisting gp120 and gp41 coordinate structure prediction, but also in designing new and effective entry inhibitors that incorporate mutation patterns of HIV-1 Env.
Collapse
|
42
|
From residue coevolution to protein conformational ensembles and functional dynamics. Proc Natl Acad Sci U S A 2015; 112:13567-72. [PMID: 26487681 DOI: 10.1073/pnas.1508584112] [Citation(s) in RCA: 101] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The analysis of evolutionary amino acid correlations has recently attracted a surge of renewed interest, also due to their successful use in de novo protein native structure prediction. However, many aspects of protein function, such as substrate binding and product release in enzymatic activity, can be fully understood only in terms of an equilibrium ensemble of alternative structures, rather than a single static structure. In this paper we combine coevolutionary data and molecular dynamics simulations to study protein conformational heterogeneity. To that end, we adapt the Boltzmann-learning algorithm to the analysis of homologous protein sequences and develop a coarse-grained protein model specifically tailored to convert the resulting contact predictions to a protein structural ensemble. By means of exhaustive sampling simulations, we analyze the set of conformations that are consistent with the observed residue correlations for a set of representative protein domains, showing that (i) the most representative structure is consistent with the experimental fold and (ii) the various regions of the sequence display different stability, related to multiple biologically relevant conformations and to the cooperativity of the coevolving pairs. Moreover, we show that the proposed protocol is able to reproduce the essential features of a protein folding mechanism as well as to account for regions involved in conformational transitions through the correct sampling of the involved conformers.
Collapse
|
43
|
Identification of Protein–Protein Interactions by Detecting Correlated Mutation at the Interface. J Chem Inf Model 2015; 55:2042-9. [DOI: 10.1021/acs.jcim.5b00320] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
|
44
|
The Patterns of Coevolution in Clade B HIV Envelope's N-Glycosylation Sites. PLoS One 2015; 10:e0128664. [PMID: 26110648 PMCID: PMC4482261 DOI: 10.1371/journal.pone.0128664] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2014] [Accepted: 04/29/2015] [Indexed: 11/19/2022] Open
Abstract
The co-evolution of the potential N-glycosylation sites of HIV Clade B gp120 was mapped onto the coevolution network of the protein structure using mean field direct coupling analysis (mfDCA). This was possible for 327 positions with suitable entropy and gap content. Indications of pressure to preserve the evolving glycan shield are seen as well as strong dependencies between the majority of the potential N-glycosylation sites and the rest of the structure. These findings indicate that although mainly an adaptation against antibody neutralization, the evolving glycan shield is structurally related to the core polypeptide, which, thus, is also under pressure to reflect the changes in the N-glycosylation. The map we propose fills the gap in previous attempts to tease out sequon evolution by providing a more general molecular context. Thus, it will help design strategies guiding HIV gp120 evolution in a rational way.
Collapse
|
45
|
Tse A, Verkhivker GM. Molecular Determinants Underlying Binding Specificities of the ABL Kinase Inhibitors: Combining Alanine Scanning of Binding Hot Spots with Network Analysis of Residue Interactions and Coevolution. PLoS One 2015; 10:e0130203. [PMID: 26075886 PMCID: PMC4468085 DOI: 10.1371/journal.pone.0130203] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2015] [Accepted: 05/17/2015] [Indexed: 12/20/2022] Open
Abstract
Quantifying binding specificity and drug resistance of protein kinase inhibitors is of fundamental importance and remains highly challenging due to complex interplay of structural and thermodynamic factors. In this work, molecular simulations and computational alanine scanning are combined with the network-based approaches to characterize molecular determinants underlying binding specificities of the ABL kinase inhibitors. The proposed theoretical framework unveiled a relationship between ligand binding and inhibitor-mediated changes in the residue interaction networks. By using topological parameters, we have described the organization of the residue interaction networks and networks of coevolving residues in the ABL kinase structures. This analysis has shown that functionally critical regulatory residues can simultaneously embody strong coevolutionary signal and high network centrality with a propensity to be energetic hot spots for drug binding. We have found that selective (Nilotinib) and promiscuous (Bosutinib, Dasatinib) kinase inhibitors can use their energetic hot spots to differentially modulate stability of the residue interaction networks, thus inhibiting or promoting conformational equilibrium between inactive and active states. According to our results, Nilotinib binding may induce a significant network-bridging effect and enhance centrality of the hot spot residues that stabilize structural environment favored by the specific kinase form. In contrast, Bosutinib and Dasatinib can incur modest changes in the residue interaction network in which ligand binding is primarily coupled only with the identity of the gate-keeper residue. These factors may promote structural adaptability of the active kinase states in binding with these promiscuous inhibitors. Our results have related ligand-induced changes in the residue interaction networks with drug resistance effects, showing that network robustness may be compromised by targeted mutations of key mediating residues. This study has outlined mechanisms by which inhibitor binding could modulate resilience and efficiency of allosteric interactions in the kinase structures, while preserving structural topology required for catalytic activity and regulation.
Collapse
Affiliation(s)
- Amanda Tse
- Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California, United States of America
| | - Gennady M. Verkhivker
- Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California, United States of America
- Chapman University School of Pharmacy, Irvine, California, United States of America
- * E-mail:
| |
Collapse
|
46
|
Aiamkitsumrit B, Sullivan NT, Nonnemacher MR, Pirrone V, Wigdahl B. Human Immunodeficiency Virus Type 1 Cellular Entry and Exit in the T Lymphocytic and Monocytic Compartments: Mechanisms and Target Opportunities During Viral Disease. Adv Virus Res 2015; 93:257-311. [PMID: 26111588 DOI: 10.1016/bs.aivir.2015.04.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
During the course of human immunodeficiency virus type 1 infection, a number of cell types throughout the body are infected, with the majority of cells representing CD4+ T cells and cells of the monocyte-macrophage lineage. Both types of cells express, to varying levels, the primary receptor molecule, CD4, as well as one or both of the coreceptors, CXCR4 and CCR5. Viral tropism is determined by both the coreceptor utilized for entry and the cell type infected. Although a single virus may have the capacity to infect both a CD4+ T cell and a cell of the monocyte-macrophage lineage, the mechanisms involved in both the entry of the virus into the cell and the viral egress from the cell during budding and viral release differ depending on the cell type. These host-virus interactions and processes can result in the differential targeting of different cell types by selected viral quasispecies and the overall amount of infectious virus released into the extracellular environment or by direct cell-to-cell spread of viral infectivity. This review covers the major steps of virus entry and egress with emphasis on the parts of the replication process that lead to differences in how the virus enters, replicates, and buds from different cellular compartments, such as CD4+ T cells and cells of the monocyte-macrophage lineage.
Collapse
Affiliation(s)
- Benjamas Aiamkitsumrit
- Department of Microbiology and Immunology, Drexel University College of Medicine, Philadelphia, Pennsylvania, USA; Center for Molecular Virology and Translational Neuroscience, Institute for Molecular Medicine and Infectious Disease, Drexel University College of Medicine, Philadelphia, Pennsylvania, USA
| | - Neil T Sullivan
- Department of Microbiology and Immunology, Drexel University College of Medicine, Philadelphia, Pennsylvania, USA; Center for Molecular Virology and Translational Neuroscience, Institute for Molecular Medicine and Infectious Disease, Drexel University College of Medicine, Philadelphia, Pennsylvania, USA
| | - Michael R Nonnemacher
- Department of Microbiology and Immunology, Drexel University College of Medicine, Philadelphia, Pennsylvania, USA; Center for Molecular Virology and Translational Neuroscience, Institute for Molecular Medicine and Infectious Disease, Drexel University College of Medicine, Philadelphia, Pennsylvania, USA
| | - Vanessa Pirrone
- Department of Microbiology and Immunology, Drexel University College of Medicine, Philadelphia, Pennsylvania, USA; Center for Molecular Virology and Translational Neuroscience, Institute for Molecular Medicine and Infectious Disease, Drexel University College of Medicine, Philadelphia, Pennsylvania, USA
| | - Brian Wigdahl
- Department of Microbiology and Immunology, Drexel University College of Medicine, Philadelphia, Pennsylvania, USA; Center for Molecular Virology and Translational Neuroscience, Institute for Molecular Medicine and Infectious Disease, Drexel University College of Medicine, Philadelphia, Pennsylvania, USA.
| |
Collapse
|
47
|
Abstract
Recent developments in the analysis of amino acid covariation are leading to breakthroughs in protein structure prediction, protein design, and prediction of the interactome. It is assumed that observed patterns of covariation are caused by molecular coevolution, where substitutions at one site affect the evolutionary forces acting at neighboring sites. Our theoretical and empirical results cast doubt on this assumption. We demonstrate that the strongest coevolutionary signal is a decrease in evolutionary rate and that unfeasibly long times are required to produce coordinated substitutions. We find that covarying substitutions are mostly found on different branches of the phylogenetic tree, indicating that they are independent events that may or may not be attributable to coevolution. These observations undermine the hypothesis that molecular coevolution is the primary cause of the covariation signal. In contrast, we find that the pairs of residues with the strongest covariation signal tend to have low evolutionary rates, and that it is this low rate that gives rise to the covariation signal. Slowly evolving residue pairs are disproportionately located in the protein’s core, which explains covariation methods’ ability to detect pairs of residues that are close in three dimensions. These observations lead us to propose the “coevolution paradox”: The strength of coevolution required to cause coordinated changes means the evolutionary rate is so low that such changes are highly unlikely to occur. As modern covariation methods may lead to breakthroughs in structural genomics, it is critical to recognize their biases and limitations.
Collapse
Affiliation(s)
- David Talavera
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom
| | - Simon C Lovell
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom
| | - Simon Whelan
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom Evolutionary Biology Centre, Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden
| |
Collapse
|
48
|
Flynn WF, Chang MW, Tan Z, Oliveira G, Yuan J, Okulicz JF, Torbett BE, Levy RM. Deep sequencing of protease inhibitor resistant HIV patient isolates reveals patterns of correlated mutations in Gag and protease. PLoS Comput Biol 2015; 11:e1004249. [PMID: 25894830 PMCID: PMC4404092 DOI: 10.1371/journal.pcbi.1004249] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Accepted: 03/19/2015] [Indexed: 11/18/2022] Open
Abstract
While the role of drug resistance mutations in HIV protease has been studied comprehensively, mutations in its substrate, Gag, have not been extensively cataloged. Using deep sequencing, we analyzed a unique collection of longitudinal viral samples from 93 patients who have been treated with therapies containing protease inhibitors (PIs). Due to the high sequence coverage within each sample, the frequencies of mutations at individual positions were calculated with high precision. We used this information to characterize the variability in the Gag polyprotein and its effects on PI-therapy outcomes. To examine covariation of mutations between two different sites using deep sequencing data, we developed an approach to estimate the tight bounds on the two-site bivariate probabilities in each viral sample, and the mutual information between pairs of positions based on all the bounds. Utilizing the new methodology we found that mutations in the matrix and p6 proteins contribute to continued therapy failure and have a major role in the network of strongly correlated mutations in the Gag polyprotein, as well as between Gag and protease. Although covariation is not direct evidence of structural propensities, we found the strongest correlations between residues on capsid and matrix of the same Gag protein were often due to structural proximity. This suggests that some of the strongest inter-protein Gag correlations are the result of structural proximity. Moreover, the strong covariation between residues in matrix and capsid at the N-terminus with p1 and p6 at the C-terminus is consistent with residue-residue contacts between these proteins at some point in the viral life cycle. Understanding the structure of HIV proteins and the function of drug-resistant mutations of these proteins is critical for the development of effective HIV treatments. Selected gag mutations have been shown to provide compensatory functions for protease resistance mutations and may directly contribute to the development of drug resistance. To determine associations between protease inhibitor mutations and gag, we utilized deep sequencing of HIV gag and protease from a collection of viral isolates from patients treated with highly active retroviral protease inhibitors. Deep sequencing allows for accurate measurement of mutation frequencies at each position, allowing estimation, using a novel method we developed, of the covariation between any two residues on gag. Using this information, we characterize the variation within gag and protease and identify the most strongly correlated pairs of inter- and intra-protein residues. Our results suggest that matrix and p1/p6 mutations form the core of a network of strongly correlated gag mutations and contribute to recurrent treatment failure. Extracting gag residue covariation information from the deep sequencing of patient viral samples may provide insight into structural aspects of the Gag polyprotein as well new areas for small molecule targeting to disrupt Gag function.
Collapse
Affiliation(s)
- William F. Flynn
- Department of Physics and Astronomy, Rutgers University, Piscataway, New Jersey, United States of America
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Max W. Chang
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, California, United States of America
| | - Zhiqiang Tan
- Department of Statistics, Rutgers University, Piscataway, New Jersey, United States of America
| | - Glenn Oliveira
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, California, United States of America
| | - Jinyun Yuan
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, California, United States of America
| | - Jason F. Okulicz
- Infectious Disease Service, San Antonio Military Medical Center, San Antonio, Texas, United States of America
| | - Bruce E. Torbett
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, California, United States of America
- * E-mail: (BET); (RML)
| | - Ronald M. Levy
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania, United States of America
- Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania, United States of America
- * E-mail: (BET); (RML)
| |
Collapse
|
49
|
Petitjean M, Badel A, Veitia RA, Vanet A. Synthetic lethals in HIV: ways to avoid drug resistance : Running title: Preventing HIV resistance. Biol Direct 2015; 10:17. [PMID: 25888435 PMCID: PMC4399722 DOI: 10.1186/s13062-015-0044-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Accepted: 02/23/2015] [Indexed: 12/19/2022] Open
Abstract
Background RNA viruses rapidly accumulate genetic variation, which can give rise to synthetic lethal (SL) and deleterious (SD) mutations. Synthetic lethal mutations (non-lethal when alone but lethal when combined in one genome) have been studied to develop cancer therapies. This principle can also be used against fast-evolving RNA-viruses. Indeed, targeting protein sites involved in SD + SL interactions with a drug would render any mutation of such sites, lethal. Results Here, we set up a strategy to detect intragenic pairs of SL and SD at the surface of the protein to predict less escapable drug target sites. For this, we detected SD + SL, studying HIV protease (PR) and reverse transcriptase (RT) sequence alignments from two groups of VIH+ individuals: treated with drugs (T) or not (NT). Using a series of statistical approaches, we were able to propose bona fide SD + SL couples. When focusing on spatially close co-variant SD + SL couples at the surface of the protein, we found 5 SD + SL groups (2 in the protease and 3 in the reverse transcriptase), which could be good candidates to form pockets to accommodate potential drugs. Conclusions Thus, designing drugs targeting these specific SD + SL groups would not allow the virus to mutate any residue involved in such groups without losing an essential function. Moreover, we also show that the selection pressure induced by the treatment leads to the appearance of new mutations, which change the mutational landscape of the protein. This drives the existence of differential SD + SL couples between the drug-treated and non-treated groups. Thus, new anti-viral drugs should be designed differently to target such groups. Reviewers This article was reviewed by Neil Greenspan Csaba Pal and István Simon. Electronic supplementary material The online version of this article (doi:10.1186/s13062-015-0044-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Michel Petitjean
- Univ Paris Diderot, Sorbonne Paris Cité, F-75013, Paris, France. .,MTI, INSERM UMR-S 973, F-75013, Paris, France.
| | - Anne Badel
- Univ Paris Diderot, Sorbonne Paris Cité, F-75013, Paris, France. .,MTI, INSERM UMR-S 973, F-75013, Paris, France.
| | - Reiner A Veitia
- Univ Paris Diderot, Sorbonne Paris Cité, F-75013, Paris, France. .,CNRS, UMR7592, Institut Jacques Monod, F-75013, Paris, France.
| | - Anne Vanet
- Univ Paris Diderot, Sorbonne Paris Cité, F-75013, Paris, France. .,CNRS, UMR7592, Institut Jacques Monod, F-75013, Paris, France. .,Atelier de Bio Informatique, F-75005, Paris, France.
| |
Collapse
|
50
|
Minguez P, Letunic I, Parca L, Garcia-Alonso L, Dopazo J, Huerta-Cepas J, Bork P. PTMcode v2: a resource for functional associations of post-translational modifications within and between proteins. Nucleic Acids Res 2014; 43:D494-502. [PMID: 25361965 PMCID: PMC4383916 DOI: 10.1093/nar/gku1081] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
The post-translational regulation of proteins is mainly driven by two molecular events, their modification by several types of moieties and their interaction with other proteins. These two processes are interdependent and together are responsible for the function of the protein in a particular cell state. Several databases focus on the prediction and compilation of protein–protein interactions (PPIs) and no less on the collection and analysis of protein post-translational modifications (PTMs), however, there are no resources that concentrate on describing the regulatory role of PTMs in PPIs. We developed several methods based on residue co-evolution and proximity to predict the functional associations of pairs of PTMs that we apply to modifications in the same protein and between two interacting proteins. In order to make data available for understudied organisms, PTMcode v2 (http://ptmcode.embl.de) includes a new strategy to propagate PTMs from validated modified sites through orthologous proteins. The second release of PTMcode covers 19 eukaryotic species from which we collected more than 300 000 experimentally verified PTMs (>1 300 000 propagated) of 69 types extracting the post-translational regulation of >100 000 proteins and >100 000 interactions. In total, we report 8 million associations of PTMs regulating single proteins and over 9.4 million interplays tuning PPIs.
Collapse
Affiliation(s)
- Pablo Minguez
- European Molecular Biology Laboratory (EMBL), Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Ivica Letunic
- Biobyte solutions GmbH, Bothestr 142, 69117 Heidelberg, Germany
| | - Luca Parca
- European Molecular Biology Laboratory (EMBL), Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Luz Garcia-Alonso
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Joaquin Dopazo
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Jaime Huerta-Cepas
- European Molecular Biology Laboratory (EMBL), Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Peer Bork
- European Molecular Biology Laboratory (EMBL), Meyerhofstrasse 1, 69117 Heidelberg, Germany Max-Delbruck-Centre for Molecular Medicine, Berlin-Buch, Germany
| |
Collapse
|