1
|
Almalki B, Liao L. Transmembrane Homodimers Interface Identification: Predicting Interface Residues in Alpha-Helical Transmembrane Protein Homodimers Using Sequential and Structural Features. Int J Mol Sci 2025; 26:4270. [PMID: 40362505 PMCID: PMC12073085 DOI: 10.3390/ijms26094270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2025] [Revised: 04/11/2025] [Accepted: 04/17/2025] [Indexed: 05/15/2025] Open
Abstract
Most bitopic transmembrane proteins associate with one another through interface residues to form dimers, which facilitate or activate specific cellular functions. Therefore, accurately identifying interface residues in a given dimer is crucial for understanding its function and has been a challenging pursuit for many computational methods. These methods can be broadly categorized into two approaches: general-purpose ones for dimerization and specialized ones for interface residues. In this study, we develop a machine learning method that integrates both approaches by integrating sequential and structural features extracted from predicted structures and various domains. The results from cross-validation on a benchmark dataset show that our method, despite utilizing significantly fewer features, outperforms the state-of-the-art methods by more than three percentage points in performance, as measured by the F1 score. Furthermore, we evaluated the performance of the proposed model on a benchmark dataset as compared to the state-of-the-art multimeric structure predictors, including RoseTTAFold2, AlphaFold2Multimer, and PREDDIMER. The results show the superiority of the proposed model by outperforming all the other models, highlighting the effectiveness of integrating both structural and sequential features within the proposed framework.
Collapse
Affiliation(s)
| | - Li Liao
- Department of Computer and Information Sciences, University of Delaware, Smith Hall, 18 Amstel Avenue, Newark, DE 19716, USA;
| |
Collapse
|
2
|
Guan X, Tang QY, Ren W, Chen M, Wang W, Wolynes PG, Li W. Predicting protein conformational motions using energetic frustration analysis and AlphaFold2. Proc Natl Acad Sci U S A 2024; 121:e2410662121. [PMID: 39163334 PMCID: PMC11363347 DOI: 10.1073/pnas.2410662121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Accepted: 07/16/2024] [Indexed: 08/22/2024] Open
Abstract
Proteins perform their biological functions through motion. Although high throughput prediction of the three-dimensional static structures of proteins has proved feasible using deep-learning-based methods, predicting the conformational motions remains a challenge. Purely data-driven machine learning methods encounter difficulty for addressing such motions because available laboratory data on conformational motions are still limited. In this work, we develop a method for generating protein allosteric motions by integrating physical energy landscape information into deep-learning-based methods. We show that local energetic frustration, which represents a quantification of the local features of the energy landscape governing protein allosteric dynamics, can be utilized to empower AlphaFold2 (AF2) to predict protein conformational motions. Starting from ground state static structures, this integrative method generates alternative structures as well as pathways of protein conformational motions, using a progressive enhancement of the energetic frustration features in the input multiple sequence alignment sequences. For a model protein adenylate kinase, we show that the generated conformational motions are consistent with available experimental and molecular dynamics simulation data. Applying the method to another two proteins KaiB and ribose-binding protein, which involve large-amplitude conformational changes, can also successfully generate the alternative conformations. We also show how to extract overall features of the AF2 energy landscape topography, which has been considered by many to be black box. Incorporating physical knowledge into deep-learning-based structure prediction algorithms provides a useful strategy to address the challenges of dynamic structure prediction of allosteric proteins.
Collapse
Affiliation(s)
- Xingyue Guan
- Department of Physics, National Laboratory of Solid State Microstructure, Nanjing University, Nanjing210093, China
- Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang325000, China
| | - Qian-Yuan Tang
- Department of Physics, Hong Kong Baptist University, Kowloon Tong, Hong Kong Special Administrative Region999077, China
| | - Weitong Ren
- Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang325000, China
| | | | - Wei Wang
- Department of Physics, National Laboratory of Solid State Microstructure, Nanjing University, Nanjing210093, China
| | - Peter G. Wolynes
- Center for Theoretical Biological Physics, Rice University, Houston, TX77005
| | - Wenfei Li
- Department of Physics, National Laboratory of Solid State Microstructure, Nanjing University, Nanjing210093, China
- Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang325000, China
| |
Collapse
|
3
|
Kinshuk S, Li L, Meckes B, Chan CTY. Sequence-Based Protein Design: A Review of Using Statistical Models to Characterize Coevolutionary Traits for Developing Hybrid Proteins as Genetic Sensors. Int J Mol Sci 2024; 25:8320. [PMID: 39125888 PMCID: PMC11312098 DOI: 10.3390/ijms25158320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 07/23/2024] [Accepted: 07/26/2024] [Indexed: 08/12/2024] Open
Abstract
Statistical analyses of homologous protein sequences can identify amino acid residue positions that co-evolve to generate family members with different properties. Based on the hypothesis that the coevolution of residue positions is necessary for maintaining protein structure, coevolutionary traits revealed by statistical models provide insight into residue-residue interactions that are important for understanding protein mechanisms at the molecular level. With the rapid expansion of genome sequencing databases that facilitate statistical analyses, this sequence-based approach has been used to study a broad range of protein families. An emerging application of this approach is to design hybrid transcriptional regulators as modular genetic sensors for novel wiring between input signals and genetic elements to control outputs. Among many allosterically regulated regulator families, the members contain structurally conserved and functionally independent protein domains, including a DNA-binding module (DBM) for interacting with a specific genetic element and a ligand-binding module (LBM) for sensing an input signal. By hybridizing a DBM and an LBM from two different family members, a hybrid regulator can be created with a new combination of signal-detection and DNA-recognition properties not present in natural systems. In this review, we present recent advances in the development of hybrid regulators and their applications in cellular engineering, especially focusing on the use of statistical analyses for characterizing DBM-LBM interactions and hybrid regulator design. Based on these studies, we then discuss the current limitations and potential directions for enhancing the impact of this sequence-based design approach.
Collapse
Affiliation(s)
- Sahaj Kinshuk
- Department of Biomedical Engineering, College of Engineering, University of North Texas, 3940 N Elm Street, Denton, TX 76207, USA; (S.K.); (L.L.); (B.M.)
| | - Lin Li
- Department of Biomedical Engineering, College of Engineering, University of North Texas, 3940 N Elm Street, Denton, TX 76207, USA; (S.K.); (L.L.); (B.M.)
| | - Brian Meckes
- Department of Biomedical Engineering, College of Engineering, University of North Texas, 3940 N Elm Street, Denton, TX 76207, USA; (S.K.); (L.L.); (B.M.)
- BioDiscovery Institute, University of North Texas, 1155 Union Circle #305220, Denton, TX 76203, USA
| | - Clement T. Y. Chan
- Department of Biomedical Engineering, College of Engineering, University of North Texas, 3940 N Elm Street, Denton, TX 76207, USA; (S.K.); (L.L.); (B.M.)
- BioDiscovery Institute, University of North Texas, 1155 Union Circle #305220, Denton, TX 76203, USA
| |
Collapse
|
4
|
Si Y, Yan C. Protein language model-embedded geometric graphs power inter-protein contact prediction. eLife 2024; 12:RP92184. [PMID: 38564241 PMCID: PMC10987090 DOI: 10.7554/elife.92184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024] Open
Abstract
Accurate prediction of contacting residue pairs between interacting proteins is very useful for structural characterization of protein-protein interactions. Although significant improvement has been made in inter-protein contact prediction recently, there is still a large room for improving the prediction accuracy. Here we present a new deep learning method referred to as PLMGraph-Inter for inter-protein contact prediction. Specifically, we employ rotationally and translationally invariant geometric graphs obtained from structures of interacting proteins to integrate multiple protein language models, which are successively transformed by graph encoders formed by geometric vector perceptrons and residual networks formed by dimensional hybrid residual blocks to predict inter-protein contacts. Extensive evaluation on multiple test sets illustrates that PLMGraph-Inter outperforms five top inter-protein contact prediction methods, including DeepHomo, GLINTER, CDPred, DeepHomo2, and DRN-1D2D_Inter, by large margins. In addition, we also show that the prediction of PLMGraph-Inter can complement the result of AlphaFold-Multimer. Finally, we show leveraging the contacts predicted by PLMGraph-Inter as constraints for protein-protein docking can dramatically improve its performance for protein complex structure prediction.
Collapse
Affiliation(s)
- Yunda Si
- School of Physics, Huazhong University of Science and TechnologyWuhanChina
| | - Chengfei Yan
- School of Physics, Huazhong University of Science and TechnologyWuhanChina
| |
Collapse
|
5
|
Shibata M, Lin X, Onuchic JN, Yura K, Cheng RR. Residue coevolution and mutational landscape for OmpR and NarL response regulator subfamilies. Biophys J 2024; 123:681-692. [PMID: 38291753 PMCID: PMC10995415 DOI: 10.1016/j.bpj.2024.01.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 12/31/2023] [Accepted: 01/24/2024] [Indexed: 02/01/2024] Open
Abstract
DNA-binding response regulators (DBRRs) are a broad class of proteins that operate in tandem with their partner kinase proteins to form two-component signal transduction systems in bacteria. Typical DBRRs are composed of two domains where the conserved N-terminal domain accepts transduced signals and the evolutionarily diverse C-terminal domain binds to DNA. These domains are assumed to be functionally independent, and hence recombination of the two domains should yield novel DBRRs of arbitrary input/output response, which can be used as biosensors. This idea has been proved to be successful in some cases; yet, the error rate is not trivial. Improvement of the success rate of this technique requires a deeper understanding of the linker-domain and inter-domain residue interactions, which have not yet been thoroughly examined. Here, we studied residue coevolution of DBRRs of the two main subfamilies (OmpR and NarL) using large collections of bacterial amino acid sequences to extensively investigate the evolutionary signatures of linker-domain and inter-domain residue interactions. Coevolutionary analysis uncovered evolutionarily selected linker-domain and inter-domain residue interactions of known experimental structures, as well as previously unknown inter-domain residue interactions. We examined the possibility of these inter-domain residue interactions as contacts that stabilize an inactive conformation of the DBRR where DNA binding is inhibited for both subfamilies. The newly gained insights on linker-domain/inter-domain residue interactions and shared inactivation mechanisms improve the understanding of the functional mechanism of DBRRs, providing clues to efficiently create functional DBRR-based biosensors. Additionally, we show the feasibility of applying coevolutionary landscape models to predict the functionality of domain-swapped DBRR proteins. The presented result demonstrates that sequence information can be used to filter out bioengineered DBRR proteins that are predicted to be nonfunctional due to a high negative predictive value.
Collapse
Affiliation(s)
- Mayu Shibata
- Graduate School of Humanities and Sciences, Ochanomizu University, Bunkyo, Tokyo, Japan; Center for Theoretical Biological Physics, Rice University, Houston Texas
| | - Xingcheng Lin
- Department of Physics, North Carolina State University, Raleigh, North Carolina; Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina
| | - José N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston Texas; Department of Physics and Astronomy, Chemistry, and Biosciences, Rice University, Houston, Texas
| | - Kei Yura
- Graduate School of Humanities and Sciences, Ochanomizu University, Bunkyo, Tokyo, Japan; Center for Interdisciplinary AI and Data Science, Ochanomizu University, Bunkyo, Tokyo, Japan; Graduate School of Advanced Science and Engineering, Waseda University, Shinjuku, Tokyo, Japan
| | - Ryan R Cheng
- Department of Chemistry, University of Kentucky, Lexington, Kentucky.
| |
Collapse
|
6
|
Kotev M, Diaz Gonzalez C. Molecular Dynamics and Other HPC Simulations for Drug Discovery. Methods Mol Biol 2024; 2716:265-291. [PMID: 37702944 DOI: 10.1007/978-1-0716-3449-3_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/14/2023]
Abstract
High performance computing (HPC) is taking an increasingly important place in drug discovery. It makes possible the simulation of complex biochemical systems with high precision in a short time, thanks to the use of sophisticated algorithms. It promotes the advancement of knowledge in fields that are inaccessible or difficult to access through experimentation and it contributes to accelerating the discovery of drugs for unmet medical needs while reducing costs. Herein, we report how computational performance has evolved over the past years, and then we detail three domains where HPC is essential. Molecular dynamics (MD) is commonly used to explore the flexibility of proteins, thus generating a better understanding of different possible approaches to modulate their activity. Modeling and simulation of biopolymer complexes enables the study of protein-protein interactions (PPI) in healthy and disease states, thus helping the identification of targets of pharmacological interest. Virtual screening (VS) also benefits from HPC to predict in a short time, among millions or billions of virtual chemical compounds, the best potential ligands that will be tested in relevant assays to start a rational drug design process.
Collapse
Affiliation(s)
- Martin Kotev
- Evotec SE, Integrated Drug Discovery, Molecular Architects, Campus Curie, Toulouse, France
| | | |
Collapse
|
7
|
Wayment-Steele HK, Ojoawo A, Otten R, Apitz JM, Pitsawong W, Hömberger M, Ovchinnikov S, Colwell L, Kern D. Predicting multiple conformations via sequence clustering and AlphaFold2. Nature 2024; 625:832-839. [PMID: 37956700 PMCID: PMC10808063 DOI: 10.1038/s41586-023-06832-9] [Citation(s) in RCA: 163] [Impact Index Per Article: 163.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 11/03/2023] [Indexed: 11/15/2023]
Abstract
AlphaFold2 (ref. 1) has revolutionized structural biology by accurately predicting single structures of proteins. However, a protein's biological function often depends on multiple conformational substates2, and disease-causing point mutations often cause population changes within these substates3,4. We demonstrate that clustering a multiple-sequence alignment by sequence similarity enables AlphaFold2 to sample alternative states of known metamorphic proteins with high confidence. Using this method, named AF-Cluster, we investigated the evolutionary distribution of predicted structures for the metamorphic protein KaiB5 and found that predictions of both conformations were distributed in clusters across the KaiB family. We used nuclear magnetic resonance spectroscopy to confirm an AF-Cluster prediction: a cyanobacteria KaiB variant is stabilized in the opposite state compared with the more widely studied variant. To test AF-Cluster's sensitivity to point mutations, we designed and experimentally verified a set of three mutations predicted to flip KaiB from Rhodobacter sphaeroides from the ground to the fold-switched state. Finally, screening for alternative states in protein families without known fold switching identified a putative alternative state for the oxidoreductase Mpt53 in Mycobacterium tuberculosis. Further development of such bioinformatic methods in tandem with experiments will probably have a considerable impact on predicting protein energy landscapes, essential for illuminating biological function.
Collapse
Affiliation(s)
- Hannah K Wayment-Steele
- Department of Biochemistry, Brandeis University and Howard Hughes Medical Institute, Waltham, MA, USA
| | - Adedolapo Ojoawo
- Department of Biochemistry, Brandeis University and Howard Hughes Medical Institute, Waltham, MA, USA
| | - Renee Otten
- Department of Biochemistry, Brandeis University and Howard Hughes Medical Institute, Waltham, MA, USA
- Treeline Biosciences, Watertown, MA, USA
| | - Julia M Apitz
- Department of Biochemistry, Brandeis University and Howard Hughes Medical Institute, Waltham, MA, USA
| | - Warintra Pitsawong
- Department of Biochemistry, Brandeis University and Howard Hughes Medical Institute, Waltham, MA, USA
- Biomolecular Discovery, Relay Therapeutics, Cambridge, MA, USA
| | - Marc Hömberger
- Department of Biochemistry, Brandeis University and Howard Hughes Medical Institute, Waltham, MA, USA
- Treeline Biosciences, Watertown, MA, USA
| | | | - Lucy Colwell
- Google Research, Cambridge, MA, USA
- Cambridge University, Cambridge, UK
| | - Dorothee Kern
- Department of Biochemistry, Brandeis University and Howard Hughes Medical Institute, Waltham, MA, USA.
| |
Collapse
|
8
|
Dichio V, Zeng HL, Aurell E. Statistical genetics in and out of quasi-linkage equilibrium. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2023; 86:052601. [PMID: 36944245 DOI: 10.1088/1361-6633/acc5fa] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 03/21/2023] [Indexed: 06/18/2023]
Abstract
This review is about statistical genetics, an interdisciplinary topic between statistical physics and population biology. The focus is on the phase ofquasi-linkage equilibrium(QLE). Our goals here are to clarify under which conditions the QLE phase can be expected to hold in population biology and how the stability of the QLE phase is lost. The QLE state, which has many similarities to a thermal equilibrium state in statistical mechanics, was discovered by M Kimura for a two-locus two-allele model, and was extended and generalized to the global genome scale byNeher&Shraiman (2011). What we will refer to as the Kimura-Neher-Shraiman theory describes a population evolving due to the mutations, recombination, natural selection and possibly genetic drift. A QLE phase exists at sufficiently high recombination rate (r) and/or mutation ratesµwith respect to selection strength. We show how in QLE it is possible to infer the epistatic parameters of the fitness function from the knowledge of the (dynamical) distribution of genotypes in a population. We further consider the breakdown of the QLE regime for high enough selection strength. We review recent results for the selection-mutation and selection-recombination dynamics. Finally, we identify and characterize a new phase which we call the non-random coexistence where variability persists in the population without either fixating or disappearing.
Collapse
Affiliation(s)
- Vito Dichio
- Sorbonne Université, Paris Brain Institute-ICM, CNRS, Inria, Inserm, AP-HP, Hôpital de la Pitié Salpêtrière, F-75013 Paris, France
| | - Hong-Li Zeng
- School of Science, Nanjing University of Posts and Telecommunications, New Energy Technology Engineering Laboratory of Jiangsu Province, Nanjing 210023, People's Republic of China
| | - Erik Aurell
- Department of Computational Science and Technology, KTH-Royal Institute of Technology, AlbaNova University Center, SE-106 91 Stockholm, Sweden
| |
Collapse
|
9
|
Malbranke C, Bikard D, Cocco S, Monasson R, Tubiana J. Machine learning for evolutionary-based and physics-inspired protein design: Current and future synergies. Curr Opin Struct Biol 2023; 80:102571. [PMID: 36947951 DOI: 10.1016/j.sbi.2023.102571] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 01/29/2023] [Accepted: 02/07/2023] [Indexed: 03/24/2023]
Abstract
Computational protein design facilitates the discovery of novel proteins with prescribed structure and functionality. Exciting designs were recently reported using novel data-driven methodologies that can be roughly divided into two categories: evolutionary-based and physics-inspired approaches. The former infer characteristic sequence features shared by sets of evolutionary-related proteins, such as conserved or coevolving positions, and recombine them to generate candidates with similar structure and function. The latter approaches estimate key biochemical properties, such as structure free energy, conformational entropy, or binding affinities using machine learning surrogates, and optimize them to yield improved designs. Here, we review recent progress along both tracks, discuss their strengths and weaknesses, and highlight opportunities for synergistic approaches.
Collapse
Affiliation(s)
- Cyril Malbranke
- Laboratory of Physics of the Ecole Normale Supérieure, PSL Research, CNRS UMR 8023, Sorbonne Université, Université de Paris, Paris, France; Institut Pasteur, Université Paris Cité, CNRS UMR 6047, Synthetic Biology, 75015 Paris, France.
| | - David Bikard
- Institut Pasteur, Université Paris Cité, CNRS UMR 6047, Synthetic Biology, 75015 Paris, France
| | - Simona Cocco
- Laboratory of Physics of the Ecole Normale Supérieure, PSL Research, CNRS UMR 8023, Sorbonne Université, Université de Paris, Paris, France
| | - Rémi Monasson
- Laboratory of Physics of the Ecole Normale Supérieure, PSL Research, CNRS UMR 8023, Sorbonne Université, Université de Paris, Paris, France
| | - Jérôme Tubiana
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
10
|
Si Y, Yan C. Improved inter-protein contact prediction using dimensional hybrid residual networks and protein language models. Brief Bioinform 2023; 24:7033302. [PMID: 36759333 DOI: 10.1093/bib/bbad039] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Revised: 01/13/2023] [Accepted: 01/18/2023] [Indexed: 02/11/2023] Open
Abstract
The knowledge of contacting residue pairs between interacting proteins is very useful for the structural characterization of protein-protein interactions (PPIs). However, accurately identifying the tens of contacting ones from hundreds of thousands of inter-protein residue pairs is extremely challenging, and performances of the state-of-the-art inter-protein contact prediction methods are still quite limited. In this study, we developed a deep learning method for inter-protein contact prediction, which is referred to as DRN-1D2D_Inter. Specifically, we employed pretrained protein language models to generate structural information-enriched input features to residual networks formed by dimensional hybrid residual blocks to perform inter-protein contact prediction. Extensively bechmarking DRN-1D2D_Inter on multiple datasets, including both heteromeric PPIs and homomeric PPIs, we show DRN-1D2D_Inter consistently and significantly outperformed two state-of-the-art inter-protein contact prediction methods, including GLINTER and DeepHomo, although both the latter two methods leveraged the native structures of interacting proteins in the prediction, and DRN-1D2D_Inter made the prediction purely from sequences. We further show that applying the predicted contacts as constraints for protein-protein docking can significantly improve its performance for protein complex structure prediction.
Collapse
Affiliation(s)
- Yunda Si
- School of Physics, Huazhong University of Science and Technology, China
| | - Chengfei Yan
- School of Physics, Huazhong University of Science and Technology, China
| |
Collapse
|
11
|
Karamanos TK. Chasing long-range evolutionary couplings in the AlphaFold era. Biopolymers 2023; 114:e23530. [PMID: 36752285 PMCID: PMC10909459 DOI: 10.1002/bip.23530] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 01/26/2023] [Accepted: 01/27/2023] [Indexed: 02/09/2023]
Abstract
Coevolution between protein residues is normally interpreted as direct contact. However, the evolutionary record of a protein sequence contains rich information that may include long-range functional couplings, couplings that report on homo-oligomeric states or even conformational changes. Due to the complexity of the sequence space and the lack of structural information on various members of a protein family, it has been difficult to effectively mine the additional information encoded in a multiple sequence alignment (MSA). Here, taking advantage of the recent release of the AlphaFold (AF) database we attempt to identify coevolutionary couplings that cannot be explained simply by spatial proximity. We propose a simple computational method that performs direct coupling analysis on a MSA and searches for couplings that are not satisfied in any of the AF models of members of the identified protein family. Application of this method on 2012 protein families suggests that ~12% of the total identified coevolving residue pairs are spatially distant and more likely to be disordered than their contacting counterparts. We expect that this analysis will help improve the quality of coevolutionary distance restraints used for structure determination and will be useful in identifying potentially functional/allosteric cross-talk between distant residues.
Collapse
|
12
|
Lin P, Yan Y, Huang SY. DeepHomo2.0: improved protein-protein contact prediction of homodimers by transformer-enhanced deep learning. Brief Bioinform 2023; 24:6849483. [PMID: 36440949 DOI: 10.1093/bib/bbac499] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 10/08/2022] [Accepted: 10/21/2022] [Indexed: 11/30/2022] Open
Abstract
Protein-protein interactions play an important role in many biological processes. However, although structure prediction for monomer proteins has achieved great progress with the advent of advanced deep learning algorithms like AlphaFold, the structure prediction for protein-protein complexes remains an open question. Taking advantage of the Transformer model of ESM-MSA, we have developed a deep learning-based model, named DeepHomo2.0, to predict protein-protein interactions of homodimeric complexes by leveraging the direct-coupling analysis (DCA) and Transformer features of sequences and the structure features of monomers. DeepHomo2.0 was extensively evaluated on diverse test sets and compared with eight state-of-the-art methods including protein language model-based, DCA-based and machine learning-based methods. It was shown that DeepHomo2.0 achieved a high precision of >70% with experimental monomer structures and >60% with predicted monomer structures for the top 10 predicted contacts on the test sets and outperformed the other eight methods. Moreover, even the version without using structure information, named DeepHomoSeq, still achieved a good precision of >55% for the top 10 predicted contacts. Integrating the predicted contacts into protein docking significantly improved the structure prediction of realistic Critical Assessment of Protein Structure Prediction homodimeric complexes. DeepHomo2.0 and DeepHomoSeq are available at http://huanglab.phys.hust.edu.cn/DeepHomo2/.
Collapse
Affiliation(s)
- Peicong Lin
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Yumeng Yan
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| |
Collapse
|
13
|
Improved Protein Real-Valued Distance Prediction Using Deep Residual Dense Network (DRDN). Protein J 2022; 41:468-476. [PMID: 36008645 DOI: 10.1007/s10930-022-10067-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/15/2022] [Indexed: 10/15/2022]
Abstract
Three-dimensional protein structure prediction is one of the major challenges in bioinformatics. According to recent research findings, real-valued distance prediction plays a vital role in determining the unique three-dimensional protein structure. This paper proposes a novel methodology involving a deep residual dense network (DRDN) for predicting protein real-valued distance. The features extracted from the given query protein sequence and its corresponding homologous sequences are used for training the model. Multi-aligned homologous sequences for each query protein sequence are retrieved from five different databases using DeepMSA, HHblits, and HITS_PR_HHblits methods. The proposed method yielded outcomes of 3.89, 0.23, 0.45, and 0.63, respectively, corresponding to the evaluation metrics such as Absolute Error, Relative Error, High-accuracy Pairwise Distance Test (PDA), and Pairwise Distance Test (PDT). Further, the contact map is computed based on CASP criteria by converting the predicted real-valued distance, and it is evaluated using the precision metric. It is observed that precision of long-range top L/5 contact prediction on the CASP13 dataset by the proposed method, RaptorX, Zhang, trRosetta, JinboXu & JinLu, and Deepdist are 0.834, 0.657, 0.70, 0.785, 0.786, and 0.812, respectively. Also, Top-L/5 contact prediction on the CASP14 dataset evaluated using average precision resulted in 0.847, 0.707, 0.752, 0.783, 0.792, 0.817, and 0.825 respectively, corresponding to the proposed method, Zhang, RaptorX, trRosetta, Deepdist, JinboXu & JinLu, and Alphafold2.
Collapse
|
14
|
Zerihun MB, Pucci F, Schug A. CoCoNet-boosting RNA contact prediction by convolutional neural networks. Nucleic Acids Res 2021; 49:12661-12672. [PMID: 34871451 PMCID: PMC8682773 DOI: 10.1093/nar/gkab1144] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 10/27/2021] [Accepted: 11/05/2021] [Indexed: 11/24/2022] Open
Abstract
Co-evolutionary models such as direct coupling analysis (DCA) in combination with machine learning (ML) techniques based on deep neural networks are able to predict accurate protein contact or distance maps. Such information can be used as constraints in structure prediction and massively increase prediction accuracy. Unfortunately, the same ML methods cannot readily be applied to RNA as they rely on large structural datasets only available for proteins. Here, we demonstrate how the available smaller data for RNA can be used to improve prediction of RNA contact maps. We introduce an algorithm called CoCoNet that is based on a combination of a Coevolutionary model and a shallow Convolutional Neural Network. Despite its simplicity and the small number of trained parameters, the method boosts the positive predictive value (PPV) of predicted contacts by about 70% with respect to DCA as tested by cross-validation of about eighty RNA structures. However, the direct inclusion of the CoCoNet contacts in 3D modeling tools does not result in a proportional increase of the 3D RNA structure prediction accuracy. Therefore, we suggest that the field develops, in addition to contact PPV, metrics which estimate the expected impact for 3D structure modeling tools better. CoCoNet is freely available and can be found at https://github.com/KIT-MBS/coconet.
Collapse
Affiliation(s)
- Mehari B Zerihun
- John von Neumann Institute for Computing, Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428 Jülich, Germany.,Steinbuch Centre for Computing, Karlsruhe Institute of Technology, 76344 Eggenstein-Leopoldshafen, Germany
| | - Fabrizio Pucci
- John von Neumann Institute for Computing, Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428 Jülich, Germany.,Computational Biology and Bioinformatics, Université Libre de Bruxelles 1050, Brussels, Belgium
| | - Alexander Schug
- John von Neumann Institute for Computing, Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428 Jülich, Germany.,Faculty of Biology, University of Duisburg-Essen, 45117 Essen, Germany
| |
Collapse
|
15
|
Xie Z, Xu J. Deep graph learning of inter-protein contacts. Bioinformatics 2021; 38:947-953. [PMID: 34755837 PMCID: PMC8796373 DOI: 10.1093/bioinformatics/btab761] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Revised: 10/06/2021] [Accepted: 11/04/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Inter-protein (interfacial) contact prediction is very useful for in silico structural characterization of protein-protein interactions. Although deep learning has been applied to this problem, its accuracy is not as good as intra-protein contact prediction. RESULTS We propose a new deep learning method GLINTER (Graph Learning of INTER-protein contacts) for interfacial contact prediction of dimers, leveraging a rotational invariant representation of protein tertiary structures and a pretrained language model of multiple sequence alignments. Tested on the 13th and 14th CASP-CAPRI datasets, the average top L/10 precision achieved by GLINTER is 54% on the homodimers and 52% on all the dimers, much higher than 30% obtained by the latest deep learning method DeepHomo on the homodimers and 15% obtained by BIPSPI on all the dimers. Our experiments show that GLINTER-predicted contacts help improve selection of docking decoys. AVAILABILITY AND IMPLEMENTATION The software is available at https://github.com/zw2x/glinter. The datasets are available at https://github.com/zw2x/glinter/data. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ziwei Xie
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| | - Jinbo Xu
- To whom correspondence should be addressed.
| |
Collapse
|
16
|
Gao M, Lund-Andersen P, Morehead A, Mahmud S, Chen C, Chen X, Giri N, Roy RS, Quadir F, Effler TC, Prout R, Abraham S, Elwasif W, Haas NQ, Skolnick J, Cheng J, Sedova A. High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function. WORKSHOP ON MACHINE LEARNING IN HPC ENVIRONMENTS. WORKSHOP ON MACHINE LEARNING IN HPC ENVIRONMENTS 2021; 2021:46-57. [PMID: 35112110 PMCID: PMC8802329 DOI: 10.1109/mlhpc54614.2021.00010] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Computational biology is one of many scientific disciplines ripe for innovation and acceleration with the advent of high-performance computing (HPC). In recent years, the field of machine learning has also seen significant benefits from adopting HPC practices. In this work, we present a novel HPC pipeline that incorporates various machine-learning approaches for structure-based functional annotation of proteins on the scale of whole genomes. Our pipeline makes extensive use of deep learning and provides computational insights into best practices for training advanced deep-learning models for high-throughput data such as proteomics data. We showcase methodologies our pipeline currently supports and detail future tasks for our pipeline to envelop, including large-scale sequence comparison using SAdLSA and prediction of protein tertiary structures using AlphaFold2.
Collapse
Affiliation(s)
- Mu Gao
- Georgia Institute of Technology, Atlanta, GA
| | | | | | | | - Chen Chen
- University of Missouri, Columbia, MO
| | - Xiao Chen
- University of Missouri, Columbia, MO
| | | | | | | | | | - Ryan Prout
- Oak Ridge National Laboratory, Oak Ridge, TN
| | | | | | | | | | | | - Ada Sedova
- Oak Ridge National Laboratory, Oak Ridge, TN
| |
Collapse
|
17
|
Mehrabiani KM, Cheng RR, Onuchic JN. Expanding Direct Coupling Analysis to Identify Heterodimeric Interfaces from Limited Protein Sequence Data. J Phys Chem B 2021; 125:11408-11417. [PMID: 34618469 DOI: 10.1021/acs.jpcb.1c07145] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Direct coupling analysis (DCA) is a global statistical approach that uses information encoded in protein sequence data to predict spatial contacts in a three-dimensional structure of a folded protein. DCA has been widely used to predict the monomeric fold at amino acid resolution and to identify biologically relevant interaction sites within a folded protein. Going beyond single proteins, DCA has also been used to identify spatial contacts that stabilize the interaction in protein complex formation. However, extracting this higher order information necessary to predict dimer contacts presents a significant challenge. A DCA evolutionary signal is much stronger at the single protein level (intraprotein contacts) than at the protein-protein interface (interprotein contacts). Therefore, if DCA-derived information is to be used to predict the structure of these complexes, there is a need to identify statistically significant DCA predictions. We propose a simple Z-score measure that can filter good predictions despite noisy, limited data. This new methodology not only improves our prediction ability but also provides a quantitative measure for the validity of the prediction.
Collapse
Affiliation(s)
- Kareem M Mehrabiani
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States.,Systems, Synthetic, and Physical Biology, Rice University, Houston, Texas 77005, United States
| | - Ryan R Cheng
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - José N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States.,Systems, Synthetic, and Physical Biology, Rice University, Houston, Texas 77005, United States.,Department of Physics & Astronomy, Rice University, Houston, Texas 77005, United States.,Department of Chemistry, Rice University, Houston, Texas 77005, United States.,Department of Biosciences, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
18
|
Sameer H, Victor G, Katalin S, Henrik A. Elucidation of ligand binding and dimerization of NADPH:protochlorophyllide (Pchlide) oxidoreductase from pea (Pisum sativum L.) by structural analysis and simulations. Proteins 2021; 89:1300-1314. [PMID: 34021929 DOI: 10.1002/prot.26151] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 02/18/2021] [Accepted: 05/11/2021] [Indexed: 11/07/2022]
Abstract
NADPH:protochlorophyllide (Pchlide) oxidoreductase (POR) is a key enzyme of chlorophyll biosynthesis in angiosperms. It is one of few known photoenzymes, which catalyzes the light-activated trans-reduction of the C17-C18 double bond of Pchlide's porphyrin ring. Due to the light requirement, dark-grown angiosperms cannot synthesize chlorophyll. No crystal structure of POR is available, so to improve understanding of the protein's three-dimensional structure, its dimerization, and binding of ligands (both the cofactor NADPH and substrate Pchlide), we computationally investigated the sequence and structural relationships among homologous proteins identified through database searches. The results indicate that α4 and α7 helices of monomers form the interface of POR dimers. On the basis of conserved residues, we predicted 11 functionally important amino acids that play important roles in POR binding to NADPH. Structural comparison of available crystal structures revealed that they participate in formation of binding pockets that accommodate the Pchlide ligand, and that five atoms of the closed tetrapyrrole are involved in non-bonding interactions. However, we detected no clear pattern in the physico-chemical characteristics of the amino acids they interact with. Thus, we hypothesize that interactions of these atoms in the Pchlide porphyrin ring are important to hold the ligand within the POR binding site. Analysis of Pchlide binding in POR by molecular docking and PELE simulations revealed that the orientation of the nicotinamide group is important for Pchlide binding. These findings highlight the complexity of interactions of porphyrin-containing ligands with proteins, and we suggest that fit-inducing processes play important roles in POR-Pchlide interactions.
Collapse
Affiliation(s)
- Hassan Sameer
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden
| | - Guallar Victor
- ICREA, Passeig Lluís Companys 23, Barcelona, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Solymosi Katalin
- Department of Plant Anatomy, Institute of Biology, Eötvös Loránd University, Budapest, Hungary
| | - Aronsson Henrik
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden
| |
Collapse
|
19
|
Yan Y, Huang SY. Accurate prediction of inter-protein residue-residue contacts for homo-oligomeric protein complexes. Brief Bioinform 2021; 22:bbab038. [PMID: 33693482 PMCID: PMC8425427 DOI: 10.1093/bib/bbab038] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2020] [Revised: 01/09/2021] [Indexed: 12/14/2022] Open
Abstract
Protein-protein interactions play a fundamental role in all cellular processes. Therefore, determining the structure of protein-protein complexes is crucial to understand their molecular mechanisms and develop drugs targeting the protein-protein interactions. Recently, deep learning has led to a breakthrough in intra-protein contact prediction, achieving an unusual high accuracy in recent Critical Assessment of protein Structure Prediction (CASP) structure prediction challenges. However, due to the limited number of known homologous protein-protein interactions and the challenge to generate joint multiple sequence alignments of two interacting proteins, the advances in inter-protein contact prediction remain limited. Here, we have proposed a deep learning model to predict inter-protein residue-residue contacts across homo-oligomeric protein interfaces, named as DeepHomo. Unlike previous deep learning approaches, we integrated intra-protein distance map and inter-protein docking pattern, in addition to evolutionary coupling, sequence conservation, and physico-chemical information of monomers. DeepHomo was extensively tested on both experimentally determined structures and realistic CASP-Critical Assessment of Predicted Interaction (CAPRI) targets. It was shown that DeepHomo achieved a high precision of >60% for the top predicted contact and outperformed state-of-the-art direct-coupling analysis and machine learning-based approaches. Integrating predicted inter-chain contacts into protein-protein docking significantly improved the docking accuracy on the benchmark dataset of realistic homo-dimeric targets from CASP-CAPRI experiments. DeepHomo is available at http://huanglab.phys.hust.edu.cn/DeepHomo/.
Collapse
Affiliation(s)
- Yumeng Yan
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, PR China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, PR China
| |
Collapse
|
20
|
Dapkūnas J, Olechnovič K, Venclovas Č. Modeling of protein complexes in CASP14 with emphasis on the interaction interface prediction. Proteins 2021; 89:1834-1843. [PMID: 34176161 PMCID: PMC9292421 DOI: 10.1002/prot.26167] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Revised: 06/21/2021] [Accepted: 06/23/2021] [Indexed: 01/08/2023]
Abstract
The goal of CASP experiments is to monitor the progress in the protein structure prediction field. During the 14th CASP edition we aimed to test our capabilities of predicting structures of protein complexes. Our protocol for modeling protein assemblies included both template‐based modeling and free docking. Structural templates were identified using sensitive sequence‐based searches. If sequence‐based searches failed, we performed structure‐based template searches using selected CASP server models. In the absence of reliable templates we applied free docking starting from monomers generated by CASP servers. We evaluated and ranked models of protein complexes using an improved version of our protein structure quality assessment method, VoroMQA, taking into account both interaction interface and global structure scores. If reliable templates could be identified, generally accurate models of protein assemblies were generated with the exception of an antibody‐antigen interaction. The success of free docking mainly depended on the accuracy of initial subunit models and on the scoring of docking solutions. To put our overall results in perspective, we analyzed our performance in the context of other CASP groups. Although the subunits in our assembly models often were not of the top quality, these models had, overall, the best‐predicted intersubunit interfaces according to several accuracy measures. We attribute our relative success primarily to the emphasis on the interaction interface when modeling and scoring.
Collapse
Affiliation(s)
- Justas Dapkūnas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| |
Collapse
|
21
|
Mishra SK, Cooper CJ, Parks JM, Mitchell JC. Hotspot Coevolution Is a Key Identifier of Near-Native Protein Complexes. J Phys Chem B 2021; 125:6058-6067. [PMID: 34077660 DOI: 10.1021/acs.jpcb.0c11525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Protein-protein interactions play a key role in mediating numerous biological functions, with more than half the proteins in living organisms existing as either homo- or hetero-oligomeric assemblies. Protein subunits that form oligomers minimize the free energy of the complex, but exhaustive computational search-based docking methods have not comprehensively addressed the challenge of distinguishing a natively bound complex from non-native forms. Current protein docking approaches address this problem by sampling multiple binding modes in proteins and scoring each mode, with the lowest-energy (or highest scoring) binding mode being regarded as a near-native complex. However, high-scoring modes often match poorly with the true bound form, suggesting a need for improvement of the scoring function. In this study, we propose a scoring function, KFC-E, that accounts for both conservation and coevolution of putative binding hotspot residues at protein-protein interfaces. We tested KFC-E on four benchmark sets of unbound examples and two benchmark sets of bound examples, with the results demonstrating a clear improvement over scores that examine conservation and coevolution across the entire interface.
Collapse
Affiliation(s)
- Sambit K Mishra
- Biosciences Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, Tennessee 37831-6038, United States
| | - Connor J Cooper
- Biosciences Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, Tennessee 37831-6038, United States
| | - Jerry M Parks
- Biosciences Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, Tennessee 37831-6038, United States
| | - Julie C Mitchell
- Biosciences Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, Tennessee 37831-6038, United States
| |
Collapse
|
22
|
DNCON2_Inter: predicting interchain contacts for homodimeric and homomultimeric protein complexes using multiple sequence alignments of monomers and deep learning. Sci Rep 2021; 11:12295. [PMID: 34112907 PMCID: PMC8192766 DOI: 10.1038/s41598-021-91827-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 05/28/2021] [Indexed: 12/13/2022] Open
Abstract
Deep learning methods that achieved great success in predicting intrachain residue-residue contacts have been applied to predict interchain contacts between proteins. However, these methods require multiple sequence alignments (MSAs) of a pair of interacting proteins (dimers) as input, which are often difficult to obtain because there are not many known protein complexes available to generate MSAs of sufficient depth for a pair of proteins. In recognizing that multiple sequence alignments of a monomer that forms homomultimers contain the co-evolutionary signals of both intrachain and interchain residue pairs in contact, we applied DNCON2 (a deep learning-based protein intrachain residue-residue contact predictor) to predict both intrachain and interchain contacts for homomultimers using multiple sequence alignment (MSA) and other co-evolutionary features of a single monomer followed by discrimination of interchain and intrachain contacts according to the tertiary structure of the monomer. We name this tool DNCON2_Inter. Allowing true-positive predictions within two residue shifts, the best average precision was obtained for the Top-L/10 predictions of 22.9% for homodimers and 17.0% for higher-order homomultimers. In some instances, especially where interchain contact densities are high, DNCON2_Inter predicted interchain contacts with 100% precision. We also developed Con_Complex, a complex structure reconstruction tool that uses predicted contacts to produce the structure of the complex. Using Con_Complex, we show that the predicted contacts can be used to accurately construct the structure of some complexes. Our experiment demonstrates that monomeric multiple sequence alignments can be used with deep learning to predict interchain contacts of homomeric proteins.
Collapse
|
23
|
On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins. PLoS Comput Biol 2021; 17:e1008957. [PMID: 34029316 PMCID: PMC8177639 DOI: 10.1371/journal.pcbi.1008957] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Revised: 06/04/2021] [Accepted: 04/09/2021] [Indexed: 12/04/2022] Open
Abstract
Coevolution-based contact prediction, either directly by coevolutionary couplings resulting from global statistical sequence models or using structural supervision and deep learning, has found widespread application in protein-structure prediction from sequence. However, one of the basic assumptions in global statistical modeling is that sequences form an at least approximately independent sample of an unknown probability distribution, which is to be learned from data. In the case of protein families, this assumption is obviously violated by phylogenetic relations between protein sequences. It has turned out to be notoriously difficult to take phylogenetic correlations into account in coevolutionary model learning. Here, we propose a complementary approach: we develop strategies to randomize or resample sequence data, such that conservation patterns and phylogenetic relations are preserved, while intrinsic (i.e. structure- or function-based) coevolutionary couplings are removed. A comparison between the results of Direct Coupling Analysis applied to real and to resampled data shows that the largest coevolutionary couplings, i.e. those used for contact prediction, are only weakly influenced by phylogeny. However, the phylogeny-induced spurious couplings in the resampled data are compatible in size with the first false-positive contact predictions from real data. Dissecting functional from phylogeny-induced couplings might therefore extend accurate contact predictions to the range of intermediate-size couplings. Many homologous protein families contain thousands of highly diverged amino-acid sequences, which fold into close-to-identical three-dimensional structures and fulfill almost identical biological tasks. Global coevolutionary models, like those inferred by the Direct Coupling Analysis (DCA), assume that families can be considered as samples of some unknown statistical model, and that the parameters of these models represent evolutionary constraints acting on protein sequences. To learn these models from data, DCA and related approaches have to also assume that the distinct sequences in a protein family are close to independent, while in reality they are characterized by involved hierarchical phylogenetic relationships. Here we propose Null models for sequence alignments, which maintain patterns of amino-acid conservation and phylogeny contained in the data, but destroy any coevolutionary couplings, frequently used in protein structure prediction. We find that phylogeny actually induces spurious non-zero couplings. These are, however, significantly smaller that the largest couplings derived from natural sequences, and therefore have only little influence on the first predicted contacts. However, in the range of intermediate couplings, they may lead to statistically significant effects. Dissecting phylogenetic from functional couplings might therefore extend the range of accurately predicted structural contacts down to smaller coupling strengths than those currently used.
Collapse
|
24
|
Schmidt M, Hamacher K. Identification of biophysical interaction patterns in direct coupling analysis. Phys Rev E 2021; 103:042418. [PMID: 34005861 DOI: 10.1103/physreve.103.042418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2020] [Accepted: 03/27/2021] [Indexed: 11/07/2022]
Abstract
Direct-coupling analysis is a statistical learning method for protein contact prediction based on sequence information alone. The maximum entropy principle leads to an effective inverse Potts model. Predictions on contacts are based on fitted local fields and couplings from an empirical multiple sequence alignment. Typically, the l_{2} norm of the resulting two-body couplings is used for contact prediction. However, this procedure discards important information. In this paper we show that the usage of the full fields and coupling information improves prediction accuracy.
Collapse
Affiliation(s)
- Michael Schmidt
- Department of Physics, TU Darmstadt, Karolinenpl. 5, 64289 Darmstadt, Germany
| | - Kay Hamacher
- Department of Physics, TU Darmstadt, Karolinenpl. 5, 64289 Darmstadt, Germany.,Department of Biology, TU Darmstadt, Schnittspahnstr. 10, 64287 Darmstadt, Germany.,Department of Computer Science, TU Darmstadt, Karolinenpl. 5, 64289 Darmstadt, Germany
| |
Collapse
|
25
|
Zeng HL, Aurell E. Inferring genetic fitness from genomic data. Phys Rev E 2021; 101:052409. [PMID: 32575265 DOI: 10.1103/physreve.101.052409] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Accepted: 05/04/2020] [Indexed: 11/07/2022]
Abstract
The genetic composition of a naturally developing population is considered as due to mutation, selection, genetic drift, and recombination. Selection is modeled as single-locus terms (additive fitness) and two-loci terms (pairwise epistatic fitness). The problem is posed to infer epistatic fitness from population-wide whole-genome data from a time series of a developing population. We generate such data in silico and show that in the quasilinkage equilibrium phase of Kimura, Neher, and Shraiman, which pertains at high enough recombination rates and low enough mutation rates, epistatic fitness can be quantitatively correctly inferred using inverse Ising-Potts methods.
Collapse
Affiliation(s)
- Hong-Li Zeng
- School of Science, and New Energy Technology Engineering Laboratory of Jiangsu Province, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.,Nordita, Royal Institute of Technology, and Stockholm University, SE-10691 Stockholm, Sweden
| | - Erik Aurell
- KTH-Royal Institute of Technology, AlbaNova University Center, SE-106 91 Stockholm, Sweden.,Faculty of Physics, Astronomy and Applied Computer Science, Jagiellonian University, 30-348 Kraków, Poland
| |
Collapse
|
26
|
AGR2-AGR3 hetero-oligomeric complexes: Identification and characterization. Bioelectrochemistry 2021; 140:107808. [PMID: 33848875 DOI: 10.1016/j.bioelechem.2021.107808] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 03/14/2021] [Accepted: 03/22/2021] [Indexed: 01/13/2023]
Abstract
In this paper we compare electrochemical behavior of two homolog proteins, namely anterior gradient 2 (AGR2) and anterior gradient 3 (AGR3), playing an important role in cancer cell biology. The slight variation in their protein structures has an impact on protein adsorption and orientation at charged surface and also enables AGR2 and AGR3 to form heterocomplexes. We confirm interaction between AGR2 and AGR3 (i) in vitro by immunochemical and constant current chronopotentiometric stripping (CPS) analysis and (ii) in vivo by bioluminescence resonance energy transfer (BRET) assay. Mutation of AGR2 in dimerization domain (E60A) prevents development of wild type AGR2 dimers and also negatively affects interaction with wild type AGR3 as shown by CPS analysis. Beside new information about AGR2 and AGR3 protein including their joint interaction, our work introduces possible applications of CPS in bioanalysis of protein complexes, including those relatively unstable, but important in the cancer research.
Collapse
|
27
|
Wang Y, Correa Marrero M, Medema MH, van Dijk ADJ. Coevolution-based prediction of protein-protein interactions in polyketide biosynthetic assembly lines. Bioinformatics 2021; 36:4846-4853. [PMID: 32592463 DOI: 10.1093/bioinformatics/btaa595] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Revised: 05/20/2020] [Accepted: 06/19/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Polyketide synthases (PKSs) are enzymes that generate diverse molecules of great pharmaceutical importance, including a range of clinically used antimicrobials and antitumor agents. Many polyketides are synthesized by cis-AT modular PKSs, which are organized in assembly lines, in which multiple enzymes line up in a specific order. This order is defined by specific protein-protein interactions (PPIs). The unique modular structure and catalyzing mechanism of these assembly lines makes their products predictable and also spurred combinatorial biosynthesis studies to produce novel polyketides using synthetic biology. However, predicting the interactions of PKSs, and thereby inferring the order of their assembly line, is still challenging, especially for cases in which this order is not reflected by the ordering of the PKS-encoding genes in the genome. RESULTS Here, we introduce PKSpop, which uses a coevolution-based PPI algorithm to infer protein order in PKS assembly lines. Our method accurately predicts protein orders (93% accuracy). Additionally, we identify new residue pairs that are key in determining interaction specificity, and show that coevolution of N- and C-terminal docking domains of PKSs is significantly more predictive for PPIs than coevolution between ketosynthase and acyl carrier protein domains. AVAILABILITY AND IMPLEMENTATION The code is available on http://www.bif.wur.nl/ (under 'Software'). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | - Aalt D J van Dijk
- Bioinformatics Group.,Department of Plant Sciences Biometris, Wageningen University & Research, 6708 PB Wageningen, The Netherlands
| |
Collapse
|
28
|
Thadani NN, Zhou Q, Reyes Gamas K, Butler S, Bueno C, Schafer NP, Morcos F, Wolynes PG, Suh J. Frustration and Direct-Coupling Analyses to Predict Formation and Function of Adeno-Associated Virus. Biophys J 2020; 120:489-503. [PMID: 33359833 DOI: 10.1016/j.bpj.2020.12.018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 11/08/2020] [Accepted: 12/08/2020] [Indexed: 01/03/2023] Open
Abstract
Adeno-associated virus (AAV) is a promising gene therapy vector because of its efficient gene delivery and relatively mild immunogenicity. To improve delivery target specificity, researchers use combinatorial and rational library design strategies to generate novel AAV capsid variants. These approaches frequently propose high proportions of nonforming or noninfective capsid protein sequences that reduce the effective depth of synthesized vector DNA libraries, thereby raising the discovery cost of novel vectors. We evaluated two computational techniques for their ability to estimate the impact of residue mutations on AAV capsid protein-protein interactions and thus predict changes in vector fitness, reasoning that these approaches might inform the design of functionally enriched AAV libraries and accelerate therapeutic candidate identification. The Frustratometer computes an energy function derived from the energy landscape theory of protein folding. Direct-coupling analysis (DCA) is a statistical framework that captures residue coevolution within proteins. We applied the Frustratometer to select candidate protein residues predicted to favor assembled or disassembled capsid states, then predicted mutation effects at these sites using the Frustratometer and DCA. Capsid mutants were experimentally assessed for changes in virus formation, stability, and transduction ability. The Frustratometer-based metric showed a counterintuitive correlation with viral stability, whereas a DCA-derived metric was highly correlated with virus transduction ability in the small population of residues studied. Our results suggest that coevolutionary models may be able to elucidate complex capsid residue-residue interaction networks essential for viral function, but further study is needed to understand the relationship between protein energy simulations and viral capsid metastability.
Collapse
Affiliation(s)
| | - Qin Zhou
- Department of Biological Sciences, University of Texas at Dallas, Richardson, Texas
| | | | - Susan Butler
- Department of Bioengineering, Rice University, Houston, Texas
| | - Carlos Bueno
- Center for Theoretical Biological Physics, Rice University, Houston, Texas; Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas
| | - Nicholas P Schafer
- Center for Theoretical Biological Physics, Rice University, Houston, Texas; Department of Chemistry, Rice University, Houston, Texas
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, Texas; Center for Systems Biology, University of Texas at Dallas, Richardson, Texas; Department of Bioengineering, University of Texas at Dallas, Richardson, Texas
| | - Peter G Wolynes
- Center for Theoretical Biological Physics, Rice University, Houston, Texas; Department of Chemistry, Rice University, Houston, Texas; Department of Biosciences, Rice University, Houston, Texas; Department of Physics, Rice University, Houston, Texas
| | - Junghae Suh
- Department of Bioengineering, Rice University, Houston, Texas; Department of Biosciences, Rice University, Houston, Texas; Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas; Systems, Synthetic, and Physical Biology Program, Rice University, Houston, Texas.
| |
Collapse
|
29
|
Voronin A, Weiel M, Schug A. Including residual contact information into replica-exchange MD simulations significantly enriches native-like conformations. PLoS One 2020; 15:e0242072. [PMID: 33196676 PMCID: PMC7668583 DOI: 10.1371/journal.pone.0242072] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 10/27/2020] [Indexed: 11/19/2022] Open
Abstract
Proteins are complex biomolecules which perform critical tasks in living organisms. Knowledge of a protein's structure is essential for understanding its physiological function in detail. Despite the incredible progress in experimental techniques, protein structure determination is still expensive, time-consuming, and arduous. That is why computer simulations are often used to complement or interpret experimental data. Here, we explore how in silico protein structure determination based on replica-exchange molecular dynamics (REMD) can benefit from including contact information derived from theoretical and experimental sources, such as direct coupling analysis or NMR spectroscopy. To reflect the influence from erroneous and noisy data we probe how false-positive contacts influence the simulated ensemble. Specifically, we integrate varying numbers of randomly selected native and non-native contacts and explore how such a bias can guide simulations towards the native state. We investigate the number of contacts needed for a significant enrichment of native-like conformations and show the capabilities and limitations of this method. Adhering to a threshold of approximately 75% true-positive contacts within a simulation, we obtain an ensemble with native-like conformations of high quality. We find that contact-guided REMD is capable of delivering physically reasonable models of a protein's structure.
Collapse
Affiliation(s)
- Arthur Voronin
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany
- Department of Physics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Marie Weiel
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany
- Department of Physics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Alexander Schug
- Institute for Advanced Simulation, Jülich Supercomputing Center, Jülich, Germany
- Faculty of Biology, University of Duisburg-Essen, Duisburg, Germany
| |
Collapse
|
30
|
Muscat M, Croce G, Sarti E, Weigt M. FilterDCA: Interpretable supervised contact prediction using inter-domain coevolution. PLoS Comput Biol 2020; 16:e1007621. [PMID: 33035205 PMCID: PMC7577475 DOI: 10.1371/journal.pcbi.1007621] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Revised: 10/21/2020] [Accepted: 08/20/2020] [Indexed: 12/03/2022] Open
Abstract
Predicting three-dimensional protein structure and assembling protein complexes using sequence information belongs to the most prominent tasks in computational biology. Recently substantial progress has been obtained in the case of single proteins using a combination of unsupervised coevolutionary sequence analysis with structurally supervised deep learning. While reaching impressive accuracies in predicting residue-residue contacts, deep learning has a number of disadvantages. The need for large structural training sets limits the applicability to multi-protein complexes; and their deep architecture makes the interpretability of the convolutional neural networks intrinsically hard. Here we introduce FilterDCA, a simpler supervised predictor for inter-domain and inter-protein contacts. It is based on the fact that contact maps of proteins show typical contact patterns, which results from secondary structure and are reflected by patterns in coevolutionary analysis. We explicitly integrate averaged contacts patterns with coevolutionary scores derived by Direct Coupling Analysis, improving performance over standard coevolutionary analysis, while remaining fully transparent and interpretable. The FilterDCA code is available at http://gitlab.lcqb.upmc.fr/muscat/FilterDCA. The de novo prediction of tertiary and quaternary protein structures has recently seen important advances, by combining unsupervised, purely sequence-based coevolutionary analyses with structure-based supervision using deep learning for contact-map prediction. While showing impressive performance, deep-learning methods require large training sets and pose severe obstacles for their interpretability. Here we construct a simple, transparent and therefore fully interpretable inter-domain contact predictor, which uses the results of coevolutionary Direct Coupling Analysis in combination with explicitly constructed filters reflecting typical contact patterns in a training set of known protein structures, and which improves the accuracy of predicted contacts significantly. Our approach thereby sheds light on the question how contact information is encoded in coevolutionary signals.
Collapse
Affiliation(s)
- Maureen Muscat
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative – LCQB, 75005 Paris, France
| | - Giancarlo Croce
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative – LCQB, 75005 Paris, France
| | - Edoardo Sarti
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative – LCQB, 75005 Paris, France
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative – LCQB, 75005 Paris, France
- * E-mail:
| |
Collapse
|
31
|
Zhang TH, Dai L, Barton JP, Du Y, Tan Y, Pang W, Chakraborty AK, Lloyd-Smith JO, Sun R. Predominance of positive epistasis among drug resistance-associated mutations in HIV-1 protease. PLoS Genet 2020; 16:e1009009. [PMID: 33085662 PMCID: PMC7605711 DOI: 10.1371/journal.pgen.1009009] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Revised: 11/02/2020] [Accepted: 07/24/2020] [Indexed: 12/12/2022] Open
Abstract
Drug-resistant mutations often have deleterious impacts on replication fitness, posing a fitness cost that can only be overcome by compensatory mutations. However, the role of fitness cost in the evolution of drug resistance has often been overlooked in clinical studies or in vitro selection experiments, as these observations only capture the outcome of drug selection. In this study, we systematically profile the fitness landscape of resistance-associated sites in HIV-1 protease using deep mutational scanning. We construct a mutant library covering combinations of mutations at 11 sites in HIV-1 protease, all of which are associated with resistance to protease inhibitors in clinic. Using deep sequencing, we quantify the fitness of thousands of HIV-1 protease mutants after multiple cycles of replication in human T cells. Although the majority of resistance-associated mutations have deleterious effects on viral replication, we find that epistasis among resistance-associated mutations is predominantly positive. Furthermore, our fitness data are consistent with genetic interactions inferred directly from HIV sequence data of patients. Fitness valleys formed by strong positive epistasis reduce the likelihood of reversal of drug resistance mutations. Overall, our results support the view that strong compensatory effects are involved in the emergence of clinically observed resistance mutations and provide insights to understanding fitness barriers in the evolution and reversion of drug resistance.
Collapse
Affiliation(s)
- Tian-hao Zhang
- Molecular Biology Institute, University of California, Los Angeles, CA 90095, USA
| | - Lei Dai
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - John P. Barton
- Department of Physics and Astronomy, University of California, Riverside, CA 92521, USA
| | - Yushen Du
- School of Medicine, ZheJiang University, Hangzhou, 210000, China
- Molecular and Medical Pharmacology, University of California, Los Angeles, CA 90095, USA
| | - Yuxiang Tan
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Wenwen Pang
- Department of Public Health Laboratory Science, West China School of Public Health, Sichuan University, Chengdu 610041, China
| | - Arup K. Chakraborty
- Institute for Medical Engineering and Science, Departments of Chemical Engineering, Physics, & Chemistry, Massachusetts Institute of Technology, MA 21309, USA
- Ragon Institute of MGH, MIT, & Harvard, Cambridge, MA 21309, USA
| | - James O. Lloyd-Smith
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095, USA
| | - Ren Sun
- Molecular and Medical Pharmacology, University of California, Los Angeles, CA 90095, USA
| |
Collapse
|
32
|
Yu L, Zhang W, Luo W, Dupont RL, Xu Y, Wang Y, Tu B, Xu H, Wang X, Fang Q, Yang Y, Wang C, Wang C. Molecular recognition of human islet amyloid polypeptide assembly by selective oligomerization of thioflavin T. SCIENCE ADVANCES 2020; 6:eabc1449. [PMID: 32821844 PMCID: PMC7406363 DOI: 10.1126/sciadv.abc1449] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Accepted: 06/22/2020] [Indexed: 06/11/2023]
Abstract
Selective oligomerization is a common phenomenon existing widely in the formation of intricate biological structures in nature. The precise design of drug molecules with an oligomerization state that specifically recognizes its receptor, however, remains substantially challenging. Here, we used scanning tunneling microscopy (STM) to identify the oligomerization states of an amyloid probe thioflavin T (ThT) on hIAPP8-37 assembly to be exclusively even numbers. We demonstrate that both adhesive interactions between ThT and the protein substrate and cohesive interactions among ThT molecules govern the oligomerization state of the bounded ThT. Specifically, the work of the cohesive interaction between two head/tail ThTs is determined to be 6.4 k B T, around 50% larger than that of the cohesive interaction between two side-by-side ThTs (4.2 k B T). Overall, our STM imaging and theoretical understanding at the single-molecule level provide valuable insights into the design of drug compounds using the selective oligomerization of molecular probes to recognize protein self-assembly.
Collapse
Affiliation(s)
- Lanlan Yu
- State Key Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100005, P. R. China
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College, Beijing 100005, P. R. China
| | - Wenbo Zhang
- State Key Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100005, P. R. China
| | - Wendi Luo
- CAS Key Laboratory of Biological Effects of Nanomaterials and Nanosafety, CAS Key Laboratory of Standardization and Measurement for Nanotechnology, Laboratory of Theoretical and Computational Nanoscience, CAS Center for Excellence in Nanoscience, National Center for Nanoscience and Technology, Beijing 100190, P. R. China
- Sino-Danish Center for Education and Research, University of Chinese Academy of Sciences, Beijing 100190, P. R. China
| | - Robert L. Dupont
- William G. Lowrie Department of Chemical and Biomolecular Engineering, The Ohio State University, Columbus, OH 43210, USA
| | - Yang Xu
- William G. Lowrie Department of Chemical and Biomolecular Engineering, The Ohio State University, Columbus, OH 43210, USA
| | - Yibing Wang
- State Key Laboratory of Bioreactor Engineering, Biomedical Nanotechnology Center, Shanghai Collaborative Innovation Center for Biomanufacturing Technology, School of Biotechnology, East China University of Science and Technology, Shanghai 200237, P. R. China
| | - Bin Tu
- CAS Key Laboratory of Biological Effects of Nanomaterials and Nanosafety, CAS Key Laboratory of Standardization and Measurement for Nanotechnology, Laboratory of Theoretical and Computational Nanoscience, CAS Center for Excellence in Nanoscience, National Center for Nanoscience and Technology, Beijing 100190, P. R. China
| | - Haiyan Xu
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College, Beijing 100005, P. R. China
| | - Xiaoguang Wang
- William G. Lowrie Department of Chemical and Biomolecular Engineering, The Ohio State University, Columbus, OH 43210, USA
| | - Qiaojun Fang
- CAS Key Laboratory of Biological Effects of Nanomaterials and Nanosafety, CAS Key Laboratory of Standardization and Measurement for Nanotechnology, Laboratory of Theoretical and Computational Nanoscience, CAS Center for Excellence in Nanoscience, National Center for Nanoscience and Technology, Beijing 100190, P. R. China
| | - Yanlian Yang
- CAS Key Laboratory of Biological Effects of Nanomaterials and Nanosafety, CAS Key Laboratory of Standardization and Measurement for Nanotechnology, Laboratory of Theoretical and Computational Nanoscience, CAS Center for Excellence in Nanoscience, National Center for Nanoscience and Technology, Beijing 100190, P. R. China
| | - Chen Wang
- CAS Key Laboratory of Biological Effects of Nanomaterials and Nanosafety, CAS Key Laboratory of Standardization and Measurement for Nanotechnology, Laboratory of Theoretical and Computational Nanoscience, CAS Center for Excellence in Nanoscience, National Center for Nanoscience and Technology, Beijing 100190, P. R. China
| | - Chenxuan Wang
- State Key Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100005, P. R. China
| |
Collapse
|
33
|
Correa Marrero M, Immink RGH, de Ridder D, van Dijk ADJ. Improved inference of intermolecular contacts through protein-protein interaction prediction using coevolutionary analysis. Bioinformatics 2020; 35:2036-2042. [PMID: 30398547 DOI: 10.1093/bioinformatics/bty924] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Revised: 10/11/2018] [Accepted: 11/05/2018] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION Predicting residue-residue contacts between interacting proteins is an important problem in bioinformatics. The growing wealth of sequence data can be used to infer these contacts through correlated mutation analysis on multiple sequence alignments of interacting homologs of the proteins of interest. This requires correct identification of pairs of interacting proteins for many species, in order to avoid introducing noise (i.e. non-interacting sequences) in the analysis that will decrease predictive performance. RESULTS We have designed Ouroboros, a novel algorithm to reduce such noise in intermolecular contact prediction. Our method iterates between weighting proteins according to how likely they are to interact based on the correlated mutations signal, and predicting correlated mutations based on the weighted sequence alignment. We show that this approach accurately discriminates between protein interaction versus non-interaction and simultaneously improves the prediction of intermolecular contact residues compared to a naive application of correlated mutation analysis. This requires no training labels concerning interactions or contacts. Furthermore, the method relaxes the assumption of one-to-one interaction of previous approaches, allowing for the study of many-to-many interactions. AVAILABILITY AND IMPLEMENTATION Source code and test data are available at www.bif.wur.nl/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Richard G H Immink
- Laboratory of Molecular Biology, Department of Plant Sciences.,Bioscience, Wageningen Plant Research
| | | | - Aalt D J van Dijk
- Bioinformatics Group, Department of Plant Sciences.,Bioscience, Wageningen Plant Research.,Biometris, Department of Plant Sciences, Wageningen University & Research, Wageningen PB, The Netherlands
| |
Collapse
|
34
|
Andreani J, Quignot C, Guerois R. Structural prediction of protein interactions and docking using conservation and coevolution. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2020. [DOI: 10.1002/wcms.1470] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Jessica Andreani
- Université Paris‐Saclay CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC) Gif‐sur‐Yvette France
| | - Chloé Quignot
- Université Paris‐Saclay CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC) Gif‐sur‐Yvette France
| | - Raphael Guerois
- Université Paris‐Saclay CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC) Gif‐sur‐Yvette France
| |
Collapse
|
35
|
Feng J, Shukla D. FingerprintContacts: Predicting Alternative Conformations of Proteins from Coevolution. J Phys Chem B 2020; 124:3605-3615. [PMID: 32283936 DOI: 10.1021/acs.jpcb.9b11869] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Proteins are dynamic molecules which perform diverse molecular functions by adopting different three-dimensional structures. Recent progress in residue-residue contacts prediction opens up new avenues for the de novo protein structure prediction from sequence information. However, it is still difficult to predict more than one conformation from residue-residue contacts alone. This is due to the inability to deconvolve the complex signals of residue-residue contacts, i.e., spatial contacts relevant for protein folding, conformational diversity, and ligand binding. Here, we introduce a machine learning based method, called FingerprintContacts, for extending the capabilities of residue-residue contacts. This algorithm leverages the features of residue-residue contacts, that is, (1) a single conformation outperforms the others in the structural prediction using all the top ranking residue-residue contacts as structural constraints and (2) conformation specific contacts rank lower and constitute a small fraction of residue-residue contacts. We demonstrate the capabilities of FingerprintContacts on eight ligand binding proteins with varying conformational motions. Furthermore, FingerprintContacts identifies small clusters of residue-residue contacts which are preferentially located in the dynamically fluctuating regions. With the rapid growth in protein sequence information, we expect FingerprintContacts to be a powerful first step in structural understanding of protein functional mechanisms.
Collapse
|
36
|
Fantini M, Lisi S, De Los Rios P, Cattaneo A, Pastore A. Protein Structural Information and Evolutionary Landscape by In Vitro Evolution. Mol Biol Evol 2020; 37:1179-1192. [PMID: 31670785 PMCID: PMC7086169 DOI: 10.1093/molbev/msz256] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Protein structure is tightly intertwined with function according to the laws of evolution. Understanding how structure determines function has been the aim of structural biology for decades. Here, we have wondered instead whether it is possible to exploit the function for which a protein was evolutionary selected to gain information on protein structure and on the landscape explored during the early stages of molecular and natural evolution. To answer to this question, we developed a new methodology, which we named CAMELS (Coupling Analysis by Molecular Evolution Library Sequencing), that is able to obtain the in vitro evolution of a protein from an artificial selection based on function. We were able to observe with CAMELS many features of the TEM-1 beta-lactamase local fold exclusively by generating and sequencing large libraries of mutational variants. We demonstrated that we can, whenever a functional phenotypic selection of a protein is available, sketch the structural and evolutionary landscape of a protein without utilizing purified proteins, collecting physical measurements, or relying on the pool of natural protein variants.
Collapse
Affiliation(s)
- Marco Fantini
- BioSNS Laboratory of Biology, Scuola Normale Superiore (SNS), Pisa, Italy
| | - Simonetta Lisi
- BioSNS Laboratory of Biology, Scuola Normale Superiore (SNS), Pisa, Italy
| | - Paolo De Los Rios
- Institute of Physics, School of Basic Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Antonino Cattaneo
- BioSNS Laboratory of Biology, Scuola Normale Superiore (SNS), Pisa, Italy
- European Brain Research Institute, Rome, Italy
| | - Annalisa Pastore
- Department of Clinical and Basic Neuroscience, Maurice Wohl Institute, King's College London, London, United Kingdom
- Dementia Research Institute, King’s College London, London, United Kingdom
| |
Collapse
|
37
|
Koukos P, Bonvin A. Integrative Modelling of Biomolecular Complexes. J Mol Biol 2020; 432:2861-2881. [DOI: 10.1016/j.jmb.2019.11.009] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2019] [Revised: 11/12/2019] [Accepted: 11/13/2019] [Indexed: 12/31/2022]
|
38
|
Epistatic contributions promote the unification of incompatible models of neutral molecular evolution. Proc Natl Acad Sci U S A 2020; 117:5873-5882. [PMID: 32123092 PMCID: PMC7084075 DOI: 10.1073/pnas.1913071117] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Mathematical models of evolution help us understand mechanisms driving protein-sequence change. Previous models recapitulate a disjoint subset of statistical features of natural sequences. We present a neutral evolution model that unifies features including extreme variance of the molecular clock’s tick rate and the observation of an evolutionary Stokes shift, an irreversible effect of mutations in the fitness landscape during sequence evolution. We show that interactions between amino acid sites, which inform our fitness metric, are required to observe these features. These interactions are inferred by using direct coupling analysis, which has been successfully utilized to predict protein structures, dynamics, and complexes from coevolutionary information. We anticipate our model will have applications in phylogenetics, ancestral reconstruction of sequences, and protein design. We introduce a model of amino acid sequence evolution that accounts for the statistical behavior of real sequences induced by epistatic interactions. We base the model dynamics on parameters derived from multiple sequence alignments analyzed by using direct coupling analysis methodology. Known statistical properties such as overdispersion, heterotachy, and gamma-distributed rate-across-sites are shown to be emergent properties of this model while being consistent with neutral evolution theory, thereby unifying observations from previously disjointed evolutionary models of sequences. The relationship between site restriction and heterotachy is characterized by tracking the effective alphabet dynamics of sites. We also observe an evolutionary Stokes shift in the fitness of sequences that have undergone evolution under our simulation. By analyzing the structural information of some proteins, we corroborate that the strongest Stokes shifts derive from sites that physically interact in networks near biochemically important regions. Perspectives on the implementation of our model in the context of the molecular clock are discussed.
Collapse
|
39
|
Malinverni D, Barducci A. Coevolutionary Analysis of Protein Subfamilies by Sequence Reweighting. ENTROPY (BASEL, SWITZERLAND) 2020; 21:1127. [PMID: 32002010 PMCID: PMC6992422 DOI: 10.3390/e21111127] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Accepted: 11/14/2019] [Indexed: 01/07/2023]
Abstract
Extracting structural information from sequence co-variation has become a common computational biology practice in the recent years, mainly due to the availability of large sequence alignments of protein families. However, identifying features that are specific to sub-classes and not shared by all members of the family using sequence-based approaches has remained an elusive problem. We here present a coevolutionary-based method to differentially analyze subfamily specific structural features by a continuous sequence reweighting (SR) approach. We introduce the underlying principles and test its predictive capabilities on the Response Regulator family, whose subfamilies have been previously shown to display distinct, specific homo-dimerization patterns. Our results show that this reweighting scheme is effective in assigning structural features known a priori to subfamilies, even when sequence data is relatively scarce. Furthermore, sequence reweighting allows assessing if individual structural contacts pertain to specific subfamilies and it thus paves the way for the identification specificity-determining contacts from sequence variation data.
Collapse
Affiliation(s)
- Duccio Malinverni
- Medical Research Council (MRC) Laboratory of Molecular Biology, Cambridge CB20QH, UK
| | - Alessandro Barducci
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| |
Collapse
|
40
|
Sala D, Cerofolini L, Fragai M, Giachetti A, Luchinat C, Rosato A. A protocol to automatically calculate homo-oligomeric protein structures through the integration of evolutionary constraints and NMR ambiguous contacts. Comput Struct Biotechnol J 2019; 18:114-124. [PMID: 31969972 PMCID: PMC6961069 DOI: 10.1016/j.csbj.2019.12.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 11/20/2019] [Accepted: 12/06/2019] [Indexed: 12/15/2022] Open
Abstract
Protein assemblies are involved in many important biological processes. Solid-state NMR (SSNMR) spectroscopy is a technique suitable for the structural characterization of samples with high molecular weight and thus can be applied to such assemblies. A significant bottleneck in terms of both effort and time required is the manual identification of unambiguous intermolecular contacts. This is particularly challenging for homo-oligomeric complexes, where simple uniform labeling may not be effective. We tackled this challenge by exploiting coevolution analysis to extract information on homo-oligomeric interfaces from NMR-derived ambiguous contacts. After removing the evolutionary couplings (ECs) that are already satisfied by the 3D structure of the monomer, the predicted ECs are matched with the automatically generated list of experimental contacts. This approach provides a selection of potential interface residues that is used directly in monomer-monomer docking calculations. We validated the protocol on tetrameric L-asparaginase II and dimeric Sod1.
Collapse
Affiliation(s)
- Davide Sala
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Linda Cerofolini
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Marco Fragai
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | - Andrea Giachetti
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Claudio Luchinat
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | - Antonio Rosato
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| |
Collapse
|
41
|
Tomar JS, Hosur RV. Polyamine acetylation and substrate-induced oligomeric states in histone acetyltransferase of multiple drug resistant Acinetobacter baumannii. Biochimie 2019; 168:268-276. [PMID: 31786230 DOI: 10.1016/j.biochi.2019.11.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Accepted: 11/25/2019] [Indexed: 11/16/2022]
Abstract
Histone acetyltransferase (Hpa2) is an unusual acetyltransferase, with a wide range of substrates; including histones, polyamines and aminoglycosides antibiotic. Hpa2 belongs to GNAT superfamily and GNATs are well known for the formation of homo-oligomers. However, the reason behind their oligomerization remained unexplored. Here, oligomeric states of Hpa2 were explored, to understand the functional significance of oligomerization. Biochemical analysis suggests that Hpa2 exists as dimer in solution and self-assembles into tetramer in the spermine, spermidine and kanamycin bound form. Stability analysis with denaturants concludes that homo-oligomerization of Hpa2 relies on bound substrate and not on experimental conditions. Homo-oligomerization in Hpa2 depicts direct correlation with its polyamine acetylating capacity. This correlation and in silico model structures suggest that oligomerization of Hpa2 is associated with the hastening of acetylation process. Interestingly, polyamine acetylation down regulates biofilms formation in E. coli BL21/Hpa2-transformants cells. Therefore, we propose that Hpa2 manipulates survival strategies of the bacterium via polyamines and antibiotics acetylation.
Collapse
Affiliation(s)
- Jyoti Singh Tomar
- Department of Chemical Sciences, Tata Institute of Fundamental Research, Mumbai, India.
| | - Ramakrishna Vijayacharya Hosur
- Department of Chemical Sciences, Tata Institute of Fundamental Research, Mumbai, India; UM-DAE Centre for Excellence in Basic Sciences, University Campus Mumbai, India.
| |
Collapse
|
42
|
Large multiple sequence alignments with a root-to-leaf regressive method. Nat Biotechnol 2019; 37:1466-1470. [PMID: 31792410 PMCID: PMC6894943 DOI: 10.1038/s41587-019-0333-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 10/29/2019] [Indexed: 01/04/2023]
Abstract
Multiple sequence alignments (MSAs) are used for structural1,2 and evolutionary predictions1,2, but the complexity of aligning large datasets requires the use of approximate solutions3, including the progressive algorithm4. Progressive MSA methods start by aligning the most similar sequences and subsequently incorporate the remaining sequences, from leaf-to-root, based on a guide-tree. Their accuracy declines substantially as the number of sequences is scaled up5. We introduce a regressive algorithm that enables MSA of up to 1.4 million sequences on a standard workstation and substantially improves accuracy on datasets larger than 10,000 sequences. Our regressive algorithm works the other way around to the progressive algorithm and begins by aligning the most dissimilar sequences. It uses an efficient divide-and-conquer strategy to run third-party alignment methods in linear time, regardless of their original complexity. Our approach will enable analyses of extremely large genomic datasets such as the recently announced Earth BioGenome Project, which comprises 1.5 million eukaryotic genomes6.
Collapse
|
43
|
Peng M, Maier M, Esch J, Schug A, Rabe KS. Direct coupling analysis improves the identification of beneficial amino acid mutations for the functional thermostabilization of a delicate decarboxylase. Biol Chem 2019; 400:1519-1527. [PMID: 31472057 DOI: 10.1515/hsz-2019-0156] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 08/09/2019] [Indexed: 12/26/2022]
Abstract
The optimization of enzyme properties for specific reaction conditions enables their tailored use in biotechnology. Predictions using established computer-based methods, however, remain challenging, especially regarding physical parameters such as thermostability without concurrent loss of activity. Employing established computational methods such as energy calculations using FoldX can lead to the identification of beneficial single amino acid substitutions for the thermostabilization of enzymes. However, these methods require a three-dimensional (3D)-structure of the enzyme. In contrast, coevolutionary analysis is a computational method, which is solely based on sequence data. To enable a comparison, we employed coevolutionary analysis together with structure-based approaches to identify mutations, which stabilize an enzyme while retaining its activity. As an example, we used the delicate dimeric, thiamine pyrophosphate dependent enzyme ketoisovalerate decarboxylase (Kivd) and experimentally determined its stability represented by a T50 value indicating the temperature where 50% of enzymatic activity remained after incubation for 10 min. Coevolutionary analysis suggested 12 beneficial mutations, which were not identified by previously established methods, out of which four mutations led to a functional Kivd with an increased T50 value of up to 3.9°C.
Collapse
Affiliation(s)
- Martin Peng
- Institute for Biological Interfaces I, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, D-76344 Eggenstein-Leopoldshafen, Germany
| | - Manfred Maier
- Institute for Biological Interfaces I, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, D-76344 Eggenstein-Leopoldshafen, Germany
| | - Jan Esch
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, D-76344 Eggenstein-Leopoldshafen, Germany
| | - Alexander Schug
- Institute for Advanced Simulation, Jülich Supercomputing Center, D-52428 Jülich, Germany
| | - Kersten S Rabe
- Institute for Biological Interfaces I, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, D-76344 Eggenstein-Leopoldshafen, Germany
| |
Collapse
|
44
|
Abstract
Homologous sequence alignments contain important information about the constraints that shape protein family evolution. Correlated changes between different residues, for instance, can be highly predictive of physical contacts within three-dimensional structures. Detecting such co-evolutionary signals via direct coupling analysis is particularly challenging given the shared phylogenetic history and uneven sampling of different lineages from which protein sequences are derived. Current best practices for mitigating such effects include sequence-identity-based weighting of input sequences and post-hoc re-scaling of evolutionary coupling scores. However, numerous weighting schemes have been previously developed for other applications, and it is unknown whether any of these schemes may better account for phylogenetic artifacts in evolutionary coupling analyses. Here, we show across a dataset of 150 diverse protein families that the current best practices out-perform several alternative sequence- and tree-based weighting methods. Nevertheless, we find that sequence weighting in general provides only a minor benefit relative to post-hoc transformations that re-scale the derived evolutionary couplings. While our findings do not rule out the possibility that an as-yet-untested weighting method may show improved results, the similar predictive accuracies that we observe across distinct weighting methods suggests that there may be little room for further improvement on top of existing strategies.
Collapse
|
45
|
Croce G, Gueudré T, Ruiz Cuevas MV, Keidel V, Figliuzzi M, Szurmant H, Weigt M. A multi-scale coevolutionary approach to predict interactions between protein domains. PLoS Comput Biol 2019; 15:e1006891. [PMID: 31634362 PMCID: PMC6822775 DOI: 10.1371/journal.pcbi.1006891] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Revised: 10/31/2019] [Accepted: 09/27/2019] [Indexed: 11/18/2022] Open
Abstract
Interacting proteins and protein domains coevolve on multiple scales, from their correlated presence across species, to correlations in amino-acid usage. Genomic databases provide rapidly growing data for variability in genomic protein content and in protein sequences, calling for computational predictions of unknown interactions. We first introduce the concept of direct phyletic couplings, based on global statistical models of phylogenetic profiles. They strongly increase the accuracy of predicting pairs of related protein domains beyond simpler correlation-based approaches like phylogenetic profiling (80% vs. 30-50% positives out of the 1000 highest-scoring pairs). Combined with the direct coupling analysis of inter-protein residue-residue coevolution, we provide multi-scale evidence for direct but unknown interaction between protein families. An in-depth discussion shows these to be biologically sensible and directly experimentally testable. Negative phyletic couplings highlight alternative solutions for the same functionality, including documented cases of convergent evolution. Thereby our work proves the strong potential of global statistical modeling approaches to genome-wide coevolutionary analysis, far beyond the established use for individual protein complexes and domain-domain interactions.
Collapse
Affiliation(s)
- Giancarlo Croce
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| | | | - Maria Virginia Ruiz Cuevas
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| | - Victoria Keidel
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona CA, United States of America
| | - Matteo Figliuzzi
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| | - Hendrik Szurmant
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona CA, United States of America
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| |
Collapse
|
46
|
Shimagaki K, Weigt M. Selection of sequence motifs and generative Hopfield-Potts models for protein families. Phys Rev E 2019; 100:032128. [PMID: 31639992 DOI: 10.1103/physreve.100.032128] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Indexed: 06/10/2023]
Abstract
Statistical models for families of evolutionary related proteins have recently gained interest: In particular, pairwise Potts models as those inferred by the direct-coupling analysis have been able to extract information about the three-dimensional structure of folded proteins and about the effect of amino acid substitutions in proteins. These models are typically requested to reproduce the one- and two-point statistics of the amino acid usage in a protein family, i.e., to capture the so-called residue conservation and covariation statistics of proteins of common evolutionary origin. Pairwise Potts models are the maximum-entropy models achieving this. Although being successful, these models depend on huge numbers of ad hoc introduced parameters, which have to be estimated from finite amounts of data and whose biophysical interpretation remains unclear. Here, we propose an approach to parameter reduction, which is based on selecting collective sequence motifs. It naturally leads to the formulation of statistical sequence models in terms of Hopfield-Potts models. These models can be accurately inferred using a mapping to restricted Boltzmann machines and persistent contrastive divergence. We show that, when applied to protein data, even 20-40 patterns are sufficient to obtain statistically close-to-generative models. The Hopfield patterns form interpretable sequence motifs and may be used to clusterize amino acid sequences into functional subfamilies. However, the distributed collective nature of these motifs intrinsically limits the ability of Hopfield-Potts models in predicting contact maps, showing the necessity of developing models going beyond the Hopfield-Potts models discussed here.
Collapse
Affiliation(s)
- Kai Shimagaki
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative-LCQB, Paris, France
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative-LCQB, Paris, France
| |
Collapse
|
47
|
Marchant A, Cisneros AF, Dubé AK, Gagnon-Arsenault I, Ascencio D, Jain H, Aubé S, Eberlein C, Evans-Yamamoto D, Yachie N, Landry CR. The role of structural pleiotropy and regulatory evolution in the retention of heteromers of paralogs. eLife 2019; 8:46754. [PMID: 31454312 PMCID: PMC6711710 DOI: 10.7554/elife.46754] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Accepted: 08/11/2019] [Indexed: 01/07/2023] Open
Abstract
Gene duplication is a driver of the evolution of new functions. The duplication of genes encoding homomeric proteins leads to the formation of homomers and heteromers of paralogs, creating new complexes after a single duplication event. The loss of these heteromers may be required for the two paralogs to evolve independent functions. Using yeast as a model, we find that heteromerization is frequent among duplicated homomers and correlates with functional similarity between paralogs. Using in silico evolution, we show that for homomers and heteromers sharing binding interfaces, mutations in one paralog can have structural pleiotropic effects on both interactions, resulting in highly correlated responses of the complexes to selection. Therefore, heteromerization could be preserved indirectly due to selection for the maintenance of homomers, thus slowing down functional divergence between paralogs. We suggest that paralogs can overcome the obstacle of structural pleiotropy by regulatory evolution at the transcriptional and post-translational levels.
Collapse
Affiliation(s)
- Axelle Marchant
- Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Canada.,PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada.,Département de biologie, Université Laval, Québec, Canada
| | - Angel F Cisneros
- Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Canada.,PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada
| | - Alexandre K Dubé
- Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Canada.,PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada.,Département de biologie, Université Laval, Québec, Canada
| | - Isabelle Gagnon-Arsenault
- Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Canada.,PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada.,Département de biologie, Université Laval, Québec, Canada
| | - Diana Ascencio
- Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Canada.,PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada.,Département de biologie, Université Laval, Québec, Canada
| | - Honey Jain
- Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Canada.,PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada.,Department of Biological Sciences, Birla Institute of Technology and Sciences, Pilani, India
| | - Simon Aubé
- Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Canada.,PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada
| | - Chris Eberlein
- PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada.,Département de biologie, Université Laval, Québec, Canada
| | - Daniel Evans-Yamamoto
- Research Center for Advanced Science and Technology, University of Tokyo, Tokyo, Japan.,Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan.,Graduate School of Media and Governance, Keio University, Fujisawa, Japan
| | - Nozomu Yachie
- Research Center for Advanced Science and Technology, University of Tokyo, Tokyo, Japan.,Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan.,Graduate School of Media and Governance, Keio University, Fujisawa, Japan.,Department of Biological Sciences, Graduate School of Science, University of Tokyo, Tokyo, Japan
| | - Christian R Landry
- Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Canada.,PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada.,Département de biologie, Université Laval, Québec, Canada
| |
Collapse
|
48
|
McBride Z, Chen D, Lee Y, Aryal UK, Xie J, Szymanski DB. A Label-free Mass Spectrometry Method to Predict Endogenous Protein Complex Composition. Mol Cell Proteomics 2019; 18:1588-1606. [PMID: 31186290 PMCID: PMC6683005 DOI: 10.1074/mcp.ra119.001400] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 06/05/2019] [Indexed: 12/15/2022] Open
Abstract
Information on the composition of protein complexes can accelerate mechanistic analyses of cellular systems. Protein complex composition identifies genes that function together and provides clues about regulation within and between cellular pathways. Cytosolic protein complexes control metabolic flux, signal transduction, protein abundance, and the activities of cytoskeletal and endomembrane systems. It has been estimated that one third of all cytosolic proteins in leaves exist in an oligomeric state, yet the composition of nearly all remain unknown. Subunits of stable protein complexes copurify, and combinations of mass-spectrometry-based protein correlation profiling and bioinformatic analyses have been used to predict protein complex subunits. Because of uncertainty regarding the power or availability of bioinformatic data to inform protein complex predictions across diverse species, it would be highly advantageous to predict composition based on elution profile data alone. Here we describe a mass spectrometry-based protein correlation profiling approach to predict the composition of hundreds of protein complexes based on biochemical data. Extracts were obtained from an intact organ and separated in parallel by size and charge under nondenaturing conditions. More than 1000 proteins with reproducible elution profiles across all replicates were subjected to clustering analyses. The resulting dendrograms were used to predict the composition of known and novel protein complexes, including many that are likely to assemble through self-interaction. An array of validation experiments demonstrated that this new method can drive protein complex discovery, guide hypothesis testing, and enable systems-level analyses of protein complex dynamics in any organism with a sequenced genome.
Collapse
Affiliation(s)
- Zachary McBride
- ‡Department of Botany and Plant Pathology, Purdue University, West Lafayette, Indiana
| | - Donglai Chen
- §Department of Statistics, Purdue University, West Lafayette, Indiana
| | - Youngwoo Lee
- ‡Department of Botany and Plant Pathology, Purdue University, West Lafayette, Indiana
| | - Uma K Aryal
- ¶Purdue Proteomics Facility, Bindley Biosciences Center, Discovery Park, Purdue University, West Lafayette, Indiana
| | - Jun Xie
- §Department of Statistics, Purdue University, West Lafayette, Indiana
| | - Daniel B Szymanski
- ‡Department of Botany and Plant Pathology, Purdue University, West Lafayette, Indiana; ‖Department of Biological Sciences,Purdue University, West Lafayette, Indiana.
| |
Collapse
|
49
|
Astl L, Verkhivker GM. Data-driven computational analysis of allosteric proteins by exploring protein dynamics, residue coevolution and residue interaction networks. Biochim Biophys Acta Gen Subj 2019:S0304-4165(19)30179-5. [PMID: 31330173 DOI: 10.1016/j.bbagen.2019.07.008] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2019] [Revised: 07/15/2019] [Accepted: 07/17/2019] [Indexed: 02/07/2023]
Abstract
BACKGROUND Computational studies of allosteric interactions have witnessed a recent renaissance fueled by the growing interest in modeling of the complex molecular assemblies and biological networks. Allosteric interactions in protein structures allow for molecular communication in signal transduction networks. METHODS In this work, we performed a large scale comprehensive and multi-faceted analysis of >300 diverse allosteric proteins and complexes with allosteric modulators. By modeling and exploring coarse-grained dynamics, residue coevolution, and residue interaction networks for allosteric proteins, we have determined unifying molecular signatures shared by allosteric systems. RESULTS The results of this study have suggested that allosteric inhibitors and allosteric activators may differentially affect global dynamics and network organization of protein systems, leading to diverse allosteric mechanisms. By using structural and functional data on protein kinases, we present a detailed case study that that included atomic-level analysis of coevolutionary networks in kinases bound with allosteric inhibitors and activators. CONCLUSIONS We have found that coevolutionary networks can form direct communication pathways connecting functional regions and can recapitulate key regulatory sites and interactions responsible for allosteric signaling in the studied protein systems. The results of this computational investigation are compared with the experimental studies and reveal molecular signatures of known regulatory hotspots in protein kinases. GENERAL SIGNIFICANCE This study has shown that allosteric inhibitors and allosteric activators can have a different effect on residue interaction networks and can exploit distinct regulatory mechanisms, which could open up opportunities for probing allostery and new drug combinations with broad range of activities.
Collapse
Affiliation(s)
- Lindy Astl
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA 92618, United States of America
| | - Gennady M Verkhivker
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA 92618, United States of America; Department of Pharmacology, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, United States of America.
| |
Collapse
|
50
|
Szurmant H. Evolutionary couplings of amino acid residues reveal structure and function of bacterial signaling proteins. Mol Microbiol 2019; 112:432-437. [PMID: 31102561 DOI: 10.1111/mmi.14282] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/15/2019] [Indexed: 12/12/2022]
Abstract
The genomic era along with major advances in high-throughput sequencing technology has led to a rapid expansion of the genomic and consequently the protein sequence space. Bacterial extracytoplasmic function sigma factors have emerged as an important group of signaling proteins in bacteria involved in many regulatory decisions, most notably the adaptation to cell envelope stress. Their wide prevalence and amplification among bacterial genomes has led to sub-group classification and the realization of diverse signaling mechanisms. Mathematical frameworks have been developed to utilize extensive protein sequence alignments to extract co-evolutionary signals of interaction. This has proven useful in a number of different biological fields, including de novo structure prediction, protein-protein partner identification and the elucidation of alternative protein conformations for signal proteins, to name a few. The mathematical tools, commonly referred to under the name 'Direct Coupling Analysis' have now been applied to deduce molecular mechanisms of activation for sub-groups of extracytoplasmic sigma factors adding to previous successes on bacterial two-component signaling proteins. The amplification of signal transduction protein genes in bacterial genomes made them the first to be amenable to this approach but the sequences are available now to aid the molecular microbiologist, no matter their protein pathway of interest.
Collapse
Affiliation(s)
- Hendrik Szurmant
- Basic Medical Science, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona, CA, USA
| |
Collapse
|