1
|
Rennie ML, Oliver MR. Emerging frontiers in protein structure prediction following the AlphaFold revolution. J R Soc Interface 2025; 22:20240886. [PMID: 40233800 PMCID: PMC11999738 DOI: 10.1098/rsif.2024.0886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2024] [Revised: 02/04/2025] [Accepted: 03/10/2025] [Indexed: 04/17/2025] Open
Abstract
Models of protein structures enable molecular understanding of biological processes. Current protein structure prediction tools lie at the interface of biology, chemistry and computer science. Millions of protein structure models have been generated in a very short space of time through a revolution in protein structure prediction driven by deep learning, led by AlphaFold. This has provided a wealth of new structural information. Interpreting these predictions is critical to determining where and when this information is useful. But proteins are not static nor do they act alone, and structures of proteins interacting with other proteins and other biomolecules are critical to a complete understanding of their biological function at the molecular level. This review focuses on the application of state-of-the-art protein structure prediction to these advanced applications. We also suggest a set of guidelines for reporting AlphaFold predictions.
Collapse
|
2
|
Sun J, Ru J, Cribbs AP, Xiong D. PyPropel: a Python-based tool for efficiently processing and characterising protein data. BMC Bioinformatics 2025; 26:70. [PMID: 40025421 PMCID: PMC11871610 DOI: 10.1186/s12859-025-06079-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Accepted: 02/10/2025] [Indexed: 03/04/2025] Open
Abstract
BACKGROUND The volume of protein sequence data has grown exponentially in recent years, driven by advancements in metagenomics. Despite this, a substantial proportion of these sequences remain poorly annotated, underscoring the need for robust bioinformatics tools to facilitate efficient characterisation and annotation for functional studies. RESULTS We present PyPropel, a Python-based computational tool developed to streamline the large-scale analysis of protein data, with a particular focus on applications in machine learning. PyPropel integrates sequence and structural data pre-processing, feature generation, and post-processing for model performance evaluation and visualisation, offering a comprehensive solution for handling complex protein datasets. CONCLUSION PyPropel provides added value over existing tools by offering a unified workflow that encompasses the full spectrum of protein research, from raw data pre-processing to functional annotation and model performance analysis, thereby supporting efficient protein function studies.
Collapse
Affiliation(s)
- Jianfeng Sun
- Botnar Research Centre, University of Oxford, Headington, Oxford, OX3 7LD, UK.
| | - Jinlong Ru
- Chair of Prevention of Microbial Diseases, School of Life Sciences Weihenstephan, Technical University of Munich, 85354, Freising, Germany
| | - Adam P Cribbs
- Botnar Research Centre, University of Oxford, Headington, Oxford, OX3 7LD, UK
| | - Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, 14853, USA.
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, USA.
| |
Collapse
|
3
|
Vural O, Jololian L. Machine learning approaches for predicting protein-ligand binding sites from sequence data. FRONTIERS IN BIOINFORMATICS 2025; 5:1520382. [PMID: 39963299 PMCID: PMC11830693 DOI: 10.3389/fbinf.2025.1520382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Accepted: 01/10/2025] [Indexed: 02/20/2025] Open
Abstract
Proteins, composed of amino acids, are crucial for a wide range of biological functions. Proteins have various interaction sites, one of which is the protein-ligand binding site, essential for molecular interactions and biochemical reactions. These sites enable proteins to bind with other molecules, facilitating key biological functions. Accurate prediction of these binding sites is pivotal in computational drug discovery, helping to identify therapeutic targets and facilitate treatment development. Machine learning has made significant contributions to this field by improving the prediction of protein-ligand interactions. This paper reviews studies that use machine learning to predict protein-ligand binding sites from sequence data, focusing on recent advancements. The review examines various embedding methods and machine learning architectures, addressing current challenges and the ongoing debates in the field. Additionally, research gaps in the existing literature are highlighted, and potential future directions for advancing the field are discussed. This study provides a thorough overview of sequence-based approaches for predicting protein-ligand binding sites, offering insights into the current state of research and future possibilities.
Collapse
Affiliation(s)
- Orhun Vural
- Department of Electrical and Computer Engineering, The University of Alabama at Birmingham, Birmingham, AL, United States
| | | |
Collapse
|
4
|
Erckert K, Rost B. Assessing the role of evolutionary information for enhancing protein language model embeddings. Sci Rep 2024; 14:20692. [PMID: 39237735 PMCID: PMC11377704 DOI: 10.1038/s41598-024-71783-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Accepted: 08/30/2024] [Indexed: 09/07/2024] Open
Abstract
Embeddings from protein Language Models (pLMs) are replacing evolutionary information from multiple sequence alignments (MSAs) as the most successful input for protein prediction. Is this because embeddings capture evolutionary information? We tested various approaches to explicitly incorporate evolutionary information into embeddings on various protein prediction tasks. While older pLMs (SeqVec, ProtBert) significantly improved through MSAs, the more recent pLM ProtT5 did not benefit. For most tasks, pLM-based outperformed MSA-based methods, and the combination of both even decreased performance for some (intrinsic disorder). We highlight the effectiveness of pLM-based methods and find limited benefits from integrating MSAs.
Collapse
Affiliation(s)
- Kyra Erckert
- TUM School of Computation, Information and Technology, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748, Garching/Munich, Germany.
- TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany.
| | - Burkhard Rost
- TUM School of Computation, Information and Technology, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748, Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching/Munich, Germany
- TUM School of Life Sciences Weihenstephan (TUM-WZW), Alte Akademie 8, Freising, Germany
| |
Collapse
|
5
|
McGuffin LJ, Alharbi SMA. ModFOLD9: A Web Server for Independent Estimates of 3D Protein Model Quality. J Mol Biol 2024; 436:168531. [PMID: 39237204 DOI: 10.1016/j.jmb.2024.168531] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 02/19/2024] [Accepted: 03/06/2024] [Indexed: 09/07/2024]
Abstract
Accurate models of protein tertiary structures are now available from numerous advanced prediction methods, although the accuracy of each method often varies depending on the specific protein target. Additionally, many models may still contain significant local errors. Therefore, reliable, independent model quality estimates are essential both for identifying errors and selecting the very best models for further biological investigations. ModFOLD9 is a leading independent server for detecting the local errors in models produced by any method, and it can accurately discriminate between high-quality models from multiple alternative approaches. ModFOLD9 incorporates several new scores from deep learning-based approaches, leading to greatly improved prediction accuracy compared with earlier versions of the server. ModFOLD9 is continuously independently benchmarked, and it is shown to be highly competitive with other public servers. ModFOLD9 is freely available at https://www.reading.ac.uk/bioinf/ModFOLD/.
Collapse
|
6
|
Ahdritz G, Bouatta N, Floristean C, Kadyan S, Xia Q, Gerecke W, O'Donnell TJ, Berenberg D, Fisk I, Zanichelli N, Zhang B, Nowaczynski A, Wang B, Stepniewska-Dziubinska MM, Zhang S, Ojewole A, Guney ME, Biderman S, Watkins AM, Ra S, Lorenzo PR, Nivon L, Weitzner B, Ban YEA, Chen S, Zhang M, Li C, Song SL, He Y, Sorger PK, Mostaque E, Zhang Z, Bonneau R, AlQuraishi M. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Nat Methods 2024; 21:1514-1524. [PMID: 38744917 PMCID: PMC11645889 DOI: 10.1038/s41592-024-02272-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 04/03/2024] [Indexed: 05/16/2024]
Abstract
AlphaFold2 revolutionized structural biology with the ability to predict protein structures with exceptionally high accuracy. Its implementation, however, lacks the code and data required to train new models. These are necessary to (1) tackle new tasks, like protein-ligand complex structure prediction, (2) investigate the process by which the model learns and (3) assess the model's capacity to generalize to unseen regions of fold space. Here we report OpenFold, a fast, memory efficient and trainable implementation of AlphaFold2. We train OpenFold from scratch, matching the accuracy of AlphaFold2. Having established parity, we find that OpenFold is remarkably robust at generalizing even when the size and diversity of its training set is deliberately limited, including near-complete elisions of classes of secondary structure elements. By analyzing intermediate structures produced during training, we also gain insights into the hierarchical manner in which OpenFold learns to fold. In sum, our studies demonstrate the power and utility of OpenFold, which we believe will prove to be a crucial resource for the protein modeling community.
Collapse
Affiliation(s)
- Gustaf Ahdritz
- Department of Systems Biology, Columbia University, New York, NY, USA
- Harvard University, Cambridge, MA, USA
| | - Nazim Bouatta
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA.
| | | | - Sachin Kadyan
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Qinghui Xia
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - William Gerecke
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA
| | | | - Daniel Berenberg
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
| | - Ian Fisk
- Flatiron Institute, New York, NY, USA
| | | | - Bo Zhang
- Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT, USA
| | | | | | | | | | | | | | - Stella Biderman
- EleutherAI, New York, NY, USA
- Booz Allen Hamilton, McLean, VA, USA
| | | | - Stephen Ra
- Prescient Design, Genentech, New York, NY, USA
| | | | | | | | | | | | - Minjia Zhang
- University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | | | | | | | - Peter K Sorger
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA
| | | | - Zhao Zhang
- Rutgers University, New Brunswick, NJ, USA
| | | | | |
Collapse
|
7
|
Si Y, Zou J, Gao Y, Chuai G, Liu Q, Chen L. Foundation models in molecular biology. BIOPHYSICS REPORTS 2024; 10:135-151. [PMID: 39027316 PMCID: PMC11252241 DOI: 10.52601/bpr.2024.240006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 03/04/2024] [Indexed: 07/20/2024] Open
Abstract
Determining correlations between molecules at various levels is an important topic in molecular biology. Large language models have demonstrated a remarkable ability to capture correlations from large amounts of data in the field of natural language processing as well as image generation, and correlations captured from data using large language models can also be applicable to solving a wide range of specific tasks, hence large language models are also referred to as foundation models. The massive amount of data that exists in the field of molecular biology provides an excellent basis for the development of foundation models, and the recent emergence of foundation models in the field of molecular biology has really pushed the entire field forward. We summarize the foundation models developed based on RNA sequence data, DNA sequence data, protein sequence data, single-cell transcriptome data, and spatial transcriptome data respectively, and further discuss the research directions for the development of foundation models in molecular biology.
Collapse
Affiliation(s)
- Yunda Si
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
| | - Jiawei Zou
- Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, China
| | - Yicheng Gao
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China
| | - Guohui Chuai
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China
| | - Qi Liu
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China
| | - Luonan Chen
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
- Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
8
|
Zheng W. Predicting hotspots for disease-causing single nucleotide variants using sequences-based coevolution, network analysis, and machine learning. PLoS One 2024; 19:e0302504. [PMID: 38743747 PMCID: PMC11093321 DOI: 10.1371/journal.pone.0302504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 04/05/2024] [Indexed: 05/16/2024] Open
Abstract
To enable personalized medicine, it is important yet highly challenging to accurately predict disease-causing mutations in target proteins at high throughput. Previous computational methods have been developed using evolutionary information in combination with various biochemical and structural features of protein residues to discriminate neutral vs. deleterious mutations. However, the power of these methods is often limited because they either assume known protein structures or treat residues independently without fully considering their interactions. To address the above limitations, we build upon recent progress in machine learning, network analysis, and protein language models, and develop a sequences-based variant site prediction workflow based on the protein residue contact networks: 1. We employ and integrate various methods of building protein residue networks using state-of-the-art coevolution analysis tools (RaptorX, DeepMetaPSICOV, and SPOT-Contact) powered by deep learning. 2. We use machine learning algorithms (Random Forest, Gradient Boosting, and Extreme Gradient Boosting) to optimally combine 20 network centrality scores to jointly predict key residues as hot spots for disease mutations. 3. Using a dataset of 107 proteins rich in disease mutations, we rigorously evaluate the network scores individually and collectively (via machine learning). This work supports a promising strategy of combining an ensemble of network scores based on different coevolution analysis methods (and optionally predictive scores from other methods) via machine learning to predict hotspot sites of disease mutations, which will inform downstream applications of disease diagnosis and targeted drug design.
Collapse
Affiliation(s)
- Wenjun Zheng
- Department of Physics, State University of New York at Buffalo, Buffalo, NY, United States of America
| |
Collapse
|
9
|
Si Y, Yan C. Protein language model-embedded geometric graphs power inter-protein contact prediction. eLife 2024; 12:RP92184. [PMID: 38564241 PMCID: PMC10987090 DOI: 10.7554/elife.92184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024] Open
Abstract
Accurate prediction of contacting residue pairs between interacting proteins is very useful for structural characterization of protein-protein interactions. Although significant improvement has been made in inter-protein contact prediction recently, there is still a large room for improving the prediction accuracy. Here we present a new deep learning method referred to as PLMGraph-Inter for inter-protein contact prediction. Specifically, we employ rotationally and translationally invariant geometric graphs obtained from structures of interacting proteins to integrate multiple protein language models, which are successively transformed by graph encoders formed by geometric vector perceptrons and residual networks formed by dimensional hybrid residual blocks to predict inter-protein contacts. Extensive evaluation on multiple test sets illustrates that PLMGraph-Inter outperforms five top inter-protein contact prediction methods, including DeepHomo, GLINTER, CDPred, DeepHomo2, and DRN-1D2D_Inter, by large margins. In addition, we also show that the prediction of PLMGraph-Inter can complement the result of AlphaFold-Multimer. Finally, we show leveraging the contacts predicted by PLMGraph-Inter as constraints for protein-protein docking can dramatically improve its performance for protein complex structure prediction.
Collapse
Affiliation(s)
- Yunda Si
- School of Physics, Huazhong University of Science and TechnologyWuhanChina
| | - Chengfei Yan
- School of Physics, Huazhong University of Science and TechnologyWuhanChina
| |
Collapse
|
10
|
Baker K, Hughes N, Bhattacharya S. An interactive visualization tool for educational outreach in protein contact map overlap analysis. FRONTIERS IN BIOINFORMATICS 2024; 4:1358550. [PMID: 38562910 PMCID: PMC10982686 DOI: 10.3389/fbinf.2024.1358550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 03/04/2024] [Indexed: 04/04/2024] Open
Abstract
Recent advancements in contact map-based protein three-dimensional (3D) structure prediction have been driven by the evolution of deep learning algorithms. However, the gap in accessible software tools for novices in this domain remains a significant challenge. This study introduces GoFold, a novel, standalone graphical user interface (GUI) designed for beginners to perform contact map overlap (CMO) problems for better template selection. Unlike existing tools that cater more to research needs or assume foundational knowledge, GoFold offers an intuitive, user-friendly platform with comprehensive tutorials. It stands out in its ability to visually represent the CMO problem, allowing users to input proteins in various formats and explore the CMO problem. The educational value of GoFold is demonstrated through benchmarking against the state-of-the-art contact map overlap method, map_align, using two datasets: PSICOV and CAMEO. GoFold exhibits superior performance in terms of TM-score and Z-score metrics across diverse qualities of contact maps and target difficulties. Notably, GoFold runs efficiently on personal computers without any third-party dependencies, thereby making it accessible to the general public for promoting citizen science. The tool is freely available for download for macOS, Linux, and Windows.
Collapse
Affiliation(s)
- Kevan Baker
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Nathaniel Hughes
- Department of Computer Science and Computer Information Systems, Auburn University at Montgomery, Montgomery, AL, United States
| | - Sutanu Bhattacharya
- Department of Computer Science and Computer Information Systems, Auburn University at Montgomery, Montgomery, AL, United States
| |
Collapse
|
11
|
Zhao C, Wang S. AttCON: With better MSAs and attention mechanism for accurate protein contact map prediction. Comput Biol Med 2024; 169:107822. [PMID: 38091726 DOI: 10.1016/j.compbiomed.2023.107822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 11/19/2023] [Accepted: 12/04/2023] [Indexed: 02/08/2024]
Abstract
Protein contact map prediction is a critical and vital step in protein structure prediction, and its accuracy is highly contingent upon the feature representations of protein sequence information and the efficacy of deep learning models. In this paper, we propose an algorithm, DeepMSA+, to generate protein multiple sequence alignments (MSAs) and to construct feature representations based on co-evolutionary information and sequence information derived from MSAs. We also propose an improved deep learning model, AttCON, for training input features to predict protein contact map. The model incorporates an attention module, and by comparing different attention modules, we find a parameter-free attention module suitable for contact map prediction. Additionally, we use the Focal Loss function to better address the data imbalance issue in protein contact map. We also developed a weighted evaluation index (W score) for model evaluation, which takes into account a wide range of metrics. W score is comprehensive in its scope, with a particular focus on the precision of predictions for medium-range and long-range contacts. Experimental results show that AttCON achieves good precision results on datasets from CASP11 to CASP15. Compared to some state-of-the-art methods, it achieves an average improvement of over 5% in both medium-range and long-range predictions, and W score is improved by an average of 2 points.
Collapse
Affiliation(s)
- Che Zhao
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| | - Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China; Yunnan Key Laboratory of Intelligent Systems and Computing, Yunnan University, Kunming, 650504, Yunnan, China.
| |
Collapse
|
12
|
Xu B, Chen Y, Xue W. Computational Protein Design - Where it goes? Curr Med Chem 2024; 31:2841-2854. [PMID: 37272467 DOI: 10.2174/0929867330666230602143700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Revised: 02/18/2023] [Accepted: 03/15/2023] [Indexed: 06/06/2023]
Abstract
Proteins have been playing a critical role in the regulation of diverse biological processes related to human life. With the increasing demand, functional proteins are sparse in this immense sequence space. Therefore, protein design has become an important task in various fields, including medicine, food, energy, materials, etc. Directed evolution has recently led to significant achievements. Molecular modification of proteins through directed evolution technology has significantly advanced the fields of enzyme engineering, metabolic engineering, medicine, and beyond. However, it is impossible to identify desirable sequences from a large number of synthetic sequences alone. As a result, computational methods, including data-driven machine learning and physics-based molecular modeling, have been introduced to protein engineering to produce more functional proteins. This review focuses on recent advances in computational protein design, highlighting the applicability of different approaches as well as their limitations.
Collapse
Affiliation(s)
- Binbin Xu
- Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Yingjun Chen
- Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Weiwei Xue
- Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| |
Collapse
|
13
|
Lensink MF, Brysbaert G, Raouraoua N, Bates PA, Giulini M, Honorato RV, van Noort C, Teixeira JMC, Bonvin AMJJ, Kong R, Shi H, Lu X, Chang S, Liu J, Guo Z, Chen X, Morehead A, Roy RS, Wu T, Giri N, Quadir F, Chen C, Cheng J, Del Carpio CA, Ichiishi E, Rodriguez‐Lumbreras LA, Fernandez‐Recio J, Harmalkar A, Chu L, Canner S, Smanta R, Gray JJ, Li H, Lin P, He J, Tao H, Huang S, Roel‐Touris J, Jimenez‐Garcia B, Christoffer CW, Jain AJ, Kagaya Y, Kannan H, Nakamura T, Terashi G, Verburgt JC, Zhang Y, Zhang Z, Fujuta H, Sekijima M, Kihara D, Khan O, Kotelnikov S, Ghani U, Padhorny D, Beglov D, Vajda S, Kozakov D, Negi SS, Ricciardelli T, Barradas‐Bautista D, Cao Z, Chawla M, Cavallo L, Oliva R, Yin R, Cheung M, Guest JD, Lee J, Pierce BG, Shor B, Cohen T, Halfon M, Schneidman‐Duhovny D, Zhu S, Yin R, Sun Y, Shen Y, Maszota‐Zieleniak M, Bojarski KK, Lubecka EA, Marcisz M, Danielsson A, Dziadek L, Gaardlos M, Gieldon A, Liwo A, Samsonov SA, Slusarz R, Zieba K, Sieradzan AK, Czaplewski C, Kobayashi S, Miyakawa Y, Kiyota Y, Takeda‐Shitaka M, Olechnovic K, Valancauskas L, Dapkunas J, Venclovas C, et alLensink MF, Brysbaert G, Raouraoua N, Bates PA, Giulini M, Honorato RV, van Noort C, Teixeira JMC, Bonvin AMJJ, Kong R, Shi H, Lu X, Chang S, Liu J, Guo Z, Chen X, Morehead A, Roy RS, Wu T, Giri N, Quadir F, Chen C, Cheng J, Del Carpio CA, Ichiishi E, Rodriguez‐Lumbreras LA, Fernandez‐Recio J, Harmalkar A, Chu L, Canner S, Smanta R, Gray JJ, Li H, Lin P, He J, Tao H, Huang S, Roel‐Touris J, Jimenez‐Garcia B, Christoffer CW, Jain AJ, Kagaya Y, Kannan H, Nakamura T, Terashi G, Verburgt JC, Zhang Y, Zhang Z, Fujuta H, Sekijima M, Kihara D, Khan O, Kotelnikov S, Ghani U, Padhorny D, Beglov D, Vajda S, Kozakov D, Negi SS, Ricciardelli T, Barradas‐Bautista D, Cao Z, Chawla M, Cavallo L, Oliva R, Yin R, Cheung M, Guest JD, Lee J, Pierce BG, Shor B, Cohen T, Halfon M, Schneidman‐Duhovny D, Zhu S, Yin R, Sun Y, Shen Y, Maszota‐Zieleniak M, Bojarski KK, Lubecka EA, Marcisz M, Danielsson A, Dziadek L, Gaardlos M, Gieldon A, Liwo A, Samsonov SA, Slusarz R, Zieba K, Sieradzan AK, Czaplewski C, Kobayashi S, Miyakawa Y, Kiyota Y, Takeda‐Shitaka M, Olechnovic K, Valancauskas L, Dapkunas J, Venclovas C, Wallner B, Yang L, Hou C, He X, Guo S, Jiang S, Ma X, Duan R, Qui L, Xu X, Zou X, Velankar S, Wodak SJ. Impact of AlphaFold on structure prediction of protein complexes: The CASP15-CAPRI experiment. Proteins 2023; 91:1658-1683. [PMID: 37905971 PMCID: PMC10841881 DOI: 10.1002/prot.26609] [Show More Authors] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 09/22/2023] [Accepted: 09/28/2023] [Indexed: 11/02/2023]
Abstract
We present the results for CAPRI Round 54, the 5th joint CASP-CAPRI protein assembly prediction challenge. The Round offered 37 targets, including 14 homodimers, 3 homo-trimers, 13 heterodimers including 3 antibody-antigen complexes, and 7 large assemblies. On average ~70 CASP and CAPRI predictor groups, including more than 20 automatics servers, submitted models for each target. A total of 21 941 models submitted by these groups and by 15 CAPRI scorer groups were evaluated using the CAPRI model quality measures and the DockQ score consolidating these measures. The prediction performance was quantified by a weighted score based on the number of models of acceptable quality or higher submitted by each group among their five best models. Results show substantial progress achieved across a significant fraction of the 60+ participating groups. High-quality models were produced for about 40% of the targets compared to 8% two years earlier. This remarkable improvement is due to the wide use of the AlphaFold2 and AlphaFold2-Multimer software and the confidence metrics they provide. Notably, expanded sampling of candidate solutions by manipulating these deep learning inference engines, enriching multiple sequence alignments, or integration of advanced modeling tools, enabled top performing groups to exceed the performance of a standard AlphaFold2-Multimer version used as a yard stick. This notwithstanding, performance remained poor for complexes with antibodies and nanobodies, where evolutionary relationships between the binding partners are lacking, and for complexes featuring conformational flexibility, clearly indicating that the prediction of protein complexes remains a challenging problem.
Collapse
Affiliation(s)
- Marc F. Lensink
- Univ. Lille, CNRS, UMR8576 – UGSF – Unité de Glycobiologie Structurale et FonctionnelleLilleFrance
| | - Guillaume Brysbaert
- Univ. Lille, CNRS, UMR8576 – UGSF – Unité de Glycobiologie Structurale et FonctionnelleLilleFrance
| | - Nessim Raouraoua
- Univ. Lille, CNRS, UMR8576 – UGSF – Unité de Glycobiologie Structurale et FonctionnelleLilleFrance
| | - Paul A. Bates
- Biomolecular Modeling LaboratoryThe Francis Crick InstituteLondonUK
| | - Marco Giulini
- Bijvoet Center for Biomolecular Research, Faculty of Science – ChemistryUtrecht UniversityUtrechtThe Netherlands
| | - Rodrigo V. Honorato
- Bijvoet Center for Biomolecular Research, Faculty of Science – ChemistryUtrecht UniversityUtrechtThe Netherlands
| | - Charlotte van Noort
- Bijvoet Center for Biomolecular Research, Faculty of Science – ChemistryUtrecht UniversityUtrechtThe Netherlands
| | - Joao M. C. Teixeira
- Bijvoet Center for Biomolecular Research, Faculty of Science – ChemistryUtrecht UniversityUtrechtThe Netherlands
| | - Alexandre M. J. J. Bonvin
- Bijvoet Center for Biomolecular Research, Faculty of Science – ChemistryUtrecht UniversityUtrechtThe Netherlands
| | - Ren Kong
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information EngineeringJiangsu University of TechnologyChangzhouChina
| | - Hang Shi
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information EngineeringJiangsu University of TechnologyChangzhouChina
| | - Xufeng Lu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information EngineeringJiangsu University of TechnologyChangzhouChina
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information EngineeringJiangsu University of TechnologyChangzhouChina
| | - Jian Liu
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Zhiye Guo
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Xiao Chen
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Alex Morehead
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Raj S. Roy
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Tianqi Wu
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Nabin Giri
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Farhan Quadir
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Chen Chen
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Jianlin Cheng
- Dept. of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | | | - Eichiro Ichiishi
- International University of Health and Welfare (IUHV Hospital)Nasushiobara‐CityJapan
| | - Luis A. Rodriguez‐Lumbreras
- Instituto de Ciencias de la Vida y del Vino (ICVV)CSIC ‐ Universidad de La Rioja ‐ Gobierno de La RiojaLogronoSpain
- Barcelona Supercomputing Center (BSC)BarcelonaSpain
| | - Juan Fernandez‐Recio
- Instituto de Ciencias de la Vida y del Vino (ICVV)CSIC ‐ Universidad de La Rioja ‐ Gobierno de La RiojaLogronoSpain
- Barcelona Supercomputing Center (BSC)BarcelonaSpain
| | - Ameya Harmalkar
- Dept. of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Lee‐Shin Chu
- Dept. of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Sam Canner
- Dept. of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Rituparna Smanta
- Dept. of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Jeffrey J. Gray
- Dept. of Chemical and Biomolecular EngineeringJohns Hopkins UniversityBaltimoreMarylandUSA
- Program in Molecular BiophysicsJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Hao Li
- School of PhysicsHuazhong University of Science and TechnologyWuhanChina
| | - Peicong Lin
- School of PhysicsHuazhong University of Science and TechnologyWuhanChina
| | - Jiahua He
- School of PhysicsHuazhong University of Science and TechnologyWuhanChina
| | - Huanyu Tao
- School of PhysicsHuazhong University of Science and TechnologyWuhanChina
| | - Sheng‐You Huang
- School of PhysicsHuazhong University of Science and TechnologyWuhanChina
| | - Jorge Roel‐Touris
- Protein Design and Modeling Lab, Dept. of Structural BiologyMolecular Biology Institute of Barcelona (IBMB‐CSIC)BarcelonaSpain
| | | | | | - Anika J. Jain
- Dept. of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
| | - Yuki Kagaya
- Dept. of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
| | - Harini Kannan
- Dept. of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
- Dept. of Biotechnology, Bhupat and Jyoti Mehta School of BiosciencesIndian Institute of Technology MadrasChennaiIndia
| | - Tsukasa Nakamura
- Dept. of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
| | - Genki Terashi
- Dept. of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
| | - Jacob C. Verburgt
- Dept. of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
| | - Yuanyuan Zhang
- Dept. of Computer SciencePurdue UniversityWest LafayetteIndianaUSA
| | - Zicong Zhang
- Dept. of Computer SciencePurdue UniversityWest LafayetteIndianaUSA
| | - Hayato Fujuta
- Dept. of Biotechnology, Bhupat and Jyoti Mehta School of BiosciencesIndian Institute of Technology MadrasChennaiIndia
| | | | - Daisuke Kihara
- Dept. of Computer SciencePurdue UniversityWest LafayetteIndianaUSA
- Dept. of Biological SciencesPurdue UniversityWest LafayetteIndianaUSA
| | | | | | | | | | | | | | | | - Surendra S. Negi
- Sealy Center for Structural Biology and Molecular BiophysicsUniversity of Texas Medical BranchGalvestonTexasUSA
| | | | | | - Zhen Cao
- King Abdullah University of Science and Technology (KAUST)Saudi Arabia
| | - Mohit Chawla
- King Abdullah University of Science and Technology (KAUST)Saudi Arabia
| | - Luigi Cavallo
- King Abdullah University of Science and Technology (KAUST)Saudi Arabia
- Department of Chemistry and BiologyUniversity of SalernoFiscianoItaly
| | | | - Rui Yin
- University of Maryland Institute for Bioscience and Biotechnology ResearchRockvilleMarylandUSA
- Dept. of Cell Biology and Molecular GeneticsUniversity of MarylandCollege ParkMarylandUSA
| | - Melyssa Cheung
- University of Maryland Institute for Bioscience and Biotechnology ResearchRockvilleMarylandUSA
- Dept. of Chemistry and BiochemistryUniversity of MarylandCollege ParkMarylandUSA
| | - Johnathan D. Guest
- University of Maryland Institute for Bioscience and Biotechnology ResearchRockvilleMarylandUSA
- Dept. of Cell Biology and Molecular GeneticsUniversity of MarylandCollege ParkMarylandUSA
| | - Jessica Lee
- University of Maryland Institute for Bioscience and Biotechnology ResearchRockvilleMarylandUSA
- Dept. of Cell Biology and Molecular GeneticsUniversity of MarylandCollege ParkMarylandUSA
| | - Brian G. Pierce
- University of Maryland Institute for Bioscience and Biotechnology ResearchRockvilleMarylandUSA
- Dept. of Cell Biology and Molecular GeneticsUniversity of MarylandCollege ParkMarylandUSA
| | - Ben Shor
- School of Computer Science and EngineeringThe Hebrew University of JerusalemJerusalemIsrael
| | - Tomer Cohen
- School of Computer Science and EngineeringThe Hebrew University of JerusalemJerusalemIsrael
| | - Matan Halfon
- School of Computer Science and EngineeringThe Hebrew University of JerusalemJerusalemIsrael
| | | | - Shaowen Zhu
- Department of Electrical and Computer EngineeringTexas A&M UniversityCollege StationTexasUSA
| | - Rujie Yin
- Department of Electrical and Computer EngineeringTexas A&M UniversityCollege StationTexasUSA
| | - Yuanfei Sun
- Department of Electrical and Computer EngineeringTexas A&M UniversityCollege StationTexasUSA
| | - Yang Shen
- Department of Electrical and Computer EngineeringTexas A&M UniversityCollege StationTexasUSA
- Department of Computer Science and EngineeringTexas A&M UniversityCollege StationTexasUSA
- Institute of Biosciences and Technology and Department of Translational Medical SciencesTexas A&M UniversityHoustonTexasUSA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Yuta Miyakawa
- School of PharmacyKitasato UniversityMinato‐kuTokyoJapan
| | - Yasuomi Kiyota
- School of PharmacyKitasato UniversityMinato‐kuTokyoJapan
| | | | - Kliment Olechnovic
- Institute of Biotechnology, Life Sciences CenterVilnius UniversityVilniusLithuania
| | - Lukas Valancauskas
- Institute of Biotechnology, Life Sciences CenterVilnius UniversityVilniusLithuania
| | - Justas Dapkunas
- Institute of Biotechnology, Life Sciences CenterVilnius UniversityVilniusLithuania
| | - Ceslovas Venclovas
- Institute of Biotechnology, Life Sciences CenterVilnius UniversityVilniusLithuania
| | - Bjorn Wallner
- Bioinformatics Division, Department of Physics, Chemistry, and BiologyLinkoping UniversityLinköpingSweden
| | - Lin Yang
- National Key Laboratory of Science and Technology on Advanced Composites in Special Environments, Center for Composite Materials and StructuresHarbin Institute of TechnologyHarbinChina
- School of Aerospace, Mechanical and Mechatronic EngineeringThe University of SydneyNew South WalesAustralia
| | - Chengyu Hou
- School of Electronics and Information EngineeringHarbin Institute of TechnologyHarbinChina
| | - Xiaodong He
- National Key Laboratory of Science and Technology on Advanced Composites in Special Environments, Center for Composite Materials and StructuresHarbin Institute of TechnologyHarbinChina
- Shenzhen STRONG Advanced Materials Research Institute Col, LtdShenzhenPeople's Republic of China
| | - Shuai Guo
- National Key Laboratory of Science and Technology on Advanced Composites in Special Environments, Center for Composite Materials and StructuresHarbin Institute of TechnologyHarbinChina
| | - Shenda Jiang
- National Key Laboratory of Science and Technology on Advanced Composites in Special Environments, Center for Composite Materials and StructuresHarbin Institute of TechnologyHarbinChina
| | - Xiaoliang Ma
- National Key Laboratory of Science and Technology on Advanced Composites in Special Environments, Center for Composite Materials and StructuresHarbin Institute of TechnologyHarbinChina
| | - Rui Duan
- Dalton Cardiovascular Research CenterUniversity of MissouriColumbiaMissouriUSA
| | - Liming Qui
- Dalton Cardiovascular Research CenterUniversity of MissouriColumbiaMissouriUSA
| | - Xianjin Xu
- Dalton Cardiovascular Research CenterUniversity of MissouriColumbiaMissouriUSA
| | - Xiaoqin Zou
- Dalton Cardiovascular Research CenterUniversity of MissouriColumbiaMissouriUSA
- Dept. of Physics and AstronomyUniversity of MissouriColumbiaMissouriUSA
- Dept. of BiochemistryUniversity of MissouriColumbiaMissouriUSA
- Institute for Data Science and InformaticsUniversity of MissouriColumbiaMissouriUSA
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI)HinxtonCambridgeUK
| | | |
Collapse
|
14
|
Wei Q, Wang R, Jiang Y, Wei L, Sun Y, Geng J, Su R. ConPep: Prediction of peptide contact maps with pre-trained biological language model and multi-view feature extracting strategy. Comput Biol Med 2023; 167:107631. [PMID: 37948966 DOI: 10.1016/j.compbiomed.2023.107631] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 10/16/2023] [Accepted: 10/23/2023] [Indexed: 11/12/2023]
Abstract
The accurate prediction of peptide contact maps remains a challenging task due to the difficulty in obtaining the interactive information between residues on short sequences. To address this challenge, we propose ConPep, a deep learning framework designed for predicting the contact map of peptides based on sequences only. To sufficiently incorporate the sequential semantic information between residues in peptide sequences, we use a pre-trained biological language model and transfer prior knowledge from large scale databases. Additionally, to extract and integrate sequential local information and residue-based global correlations, our model incorporates Bidirectional Gated Recurrent Unit and attention mechanisms. They can obtain multi-view features and thus enhance the accuracy and robustness of our prediction. Comparative results on independent tests demonstrate that our proposed method significantly outperforms state-of-the-art methods even with short peptides. Notably, our method exhibits superior performance at the sequence level, suggesting the robust ability of our model compared with the multiple sequence alignment (MSA) analysis-based methods. We expect it can be meaningful research for facilitating the wide use of our method.
Collapse
Affiliation(s)
- Qingxin Wei
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Ruheng Wang
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Yi Jiang
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China; Centre for Artificial Intelligence driven Drug Discovery, Faculty of Applied Science, Macao Polytechnic University, Macao SAR, China
| | - Yu Sun
- Beidahuang Industry Group General Hospital, Harbin, China.
| | - Jie Geng
- Department of Cardiology, Tianjin Chest Hospital, Tianjin, China.
| | - Ran Su
- College of Intelligence and Computing, Tianjin University, Tianjin, China.
| |
Collapse
|
15
|
Larrea-Sebal A, Jebari-Benslaiman S, Galicia-Garcia U, Jose-Urteaga AS, Uribe KB, Benito-Vicente A, Martín C. Predictive Modeling and Structure Analysis of Genetic Variants in Familial Hypercholesterolemia: Implications for Diagnosis and Protein Interaction Studies. Curr Atheroscler Rep 2023; 25:839-859. [PMID: 37847331 PMCID: PMC10618353 DOI: 10.1007/s11883-023-01154-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/15/2023] [Indexed: 10/18/2023]
Abstract
PURPOSE OF REVIEW Familial hypercholesterolemia (FH) is a hereditary condition characterized by elevated levels of low-density lipoprotein cholesterol (LDL-C), which increases the risk of cardiovascular disease if left untreated. This review aims to discuss the role of bioinformatics tools in evaluating the pathogenicity of missense variants associated with FH. Specifically, it highlights the use of predictive models based on protein sequence, structure, evolutionary conservation, and other relevant features in identifying genetic variants within LDLR, APOB, and PCSK9 genes that contribute to FH. RECENT FINDINGS In recent years, various bioinformatics tools have emerged as valuable resources for analyzing missense variants in FH-related genes. Tools such as REVEL, Varity, and CADD use diverse computational approaches to predict the impact of genetic variants on protein function. These tools consider factors such as sequence conservation, structural alterations, and receptor binding to aid in interpreting the pathogenicity of identified missense variants. While these predictive models offer valuable insights, the accuracy of predictions can vary, especially for proteins with unique characteristics that might not be well represented in the databases used for training. This review emphasizes the significance of utilizing bioinformatics tools for assessing the pathogenicity of FH-associated missense variants. Despite their contributions, a definitive diagnosis of a genetic variant necessitates functional validation through in vitro characterization or cascade screening. This step ensures the precise identification of FH-related variants, leading to more accurate diagnoses. Integrating genetic data with reliable bioinformatics predictions and functional validation can enhance our understanding of the genetic basis of FH, enabling improved diagnosis, risk stratification, and personalized treatment for affected individuals. The comprehensive approach outlined in this review promises to advance the management of this inherited disorder, potentially leading to better health outcomes for those affected by FH.
Collapse
Affiliation(s)
- Asier Larrea-Sebal
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
- Fundación Biofisika Bizkaia, 48940, Leioa, Spain
| | - Shifa Jebari-Benslaiman
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
| | - Unai Galicia-Garcia
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
| | - Ane San Jose-Urteaga
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
| | - Kepa B Uribe
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
| | - Asier Benito-Vicente
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
| | - César Martín
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain.
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain.
| |
Collapse
|
16
|
Sun J, Kulandaisamy A, Ru J, Gromiha MM, Cribbs AP. TMKit: a Python interface for computational analysis of transmembrane proteins. Brief Bioinform 2023; 24:bbad288. [PMID: 37594311 PMCID: PMC10516361 DOI: 10.1093/bib/bbad288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 07/07/2023] [Accepted: 07/18/2023] [Indexed: 08/19/2023] Open
Abstract
Transmembrane proteins are receptors, enzymes, transporters and ion channels that are instrumental in regulating a variety of cellular activities, such as signal transduction and cell communication. Despite tremendous progress in computational capacities to support protein research, there is still a significant gap in the availability of specialized computational analysis toolkits for transmembrane protein research. Here, we introduce TMKit, an open-source Python programming interface that is modular, scalable and specifically designed for processing transmembrane protein data. TMKit is a one-stop computational analysis tool for transmembrane proteins, enabling users to perform database wrangling, engineer features at the mutational, domain and topological levels, and visualize protein-protein interaction interfaces. In addition, TMKit includes seqNetRR, a high-performance computing library that allows customized construction of a large number of residue connections. This library is particularly well suited for assigning correlation matrix-based features at a fast speed. TMKit should serve as a useful tool for researchers in assisting the study of transmembrane protein sequences and structures. TMKit is publicly available through https://github.com/2003100127/tmkit and https://tmkit-guide.herokuapp.com/doc/overview.
Collapse
Affiliation(s)
- Jianfeng Sun
- Nuffield Department of Orthopedics, Rheumatology, and Musculoskeletal Sciences, Botnar Research Centre, University of Oxford, Headington, Oxford OX3 7LD, UK
| | - Arulsamy Kulandaisamy
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai 600036, Tamil Nadu, India
| | - Jinlong Ru
- Chair of Prevention of Microbial Diseases, School of Life Sciences Weihenstephan, Technical University of Munich, 85354 Freising, Germany
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai 600036, Tamil Nadu, India
| | - Adam P Cribbs
- Nuffield Department of Orthopedics, Rheumatology, and Musculoskeletal Sciences, Botnar Research Centre, University of Oxford, Headington, Oxford OX3 7LD, UK
| |
Collapse
|
17
|
Wang H, Zang Y, Kang Y, Zhang J, Zhang L, Zhang S. ETLD: an encoder-transformation layer-decoder architecture for protein contact and mutation effects prediction. Brief Bioinform 2023; 24:bbad290. [PMID: 37598423 DOI: 10.1093/bib/bbad290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 06/21/2023] [Accepted: 07/26/2023] [Indexed: 08/22/2023] Open
Abstract
The latent features extracted from the multiple sequence alignments (MSAs) of homologous protein families are useful for identifying residue-residue contacts, predicting mutation effects, shaping protein evolution, etc. Over the past three decades, a growing body of supervised and unsupervised machine learning methods have been applied to this field, yielding fruitful results. Here, we propose a novel self-supervised model, called encoder-transformation layer-decoder (ETLD) architecture, capable of capturing protein sequence latent features directly from MSAs. Compared to the typical autoencoder model, ETLD introduces a transformation layer with the ability to learn inter-site couplings, which can be used to parse out the two-dimensional residue-residue contacts map after a simple mathematical derivation or an additional supervised neural network. ETLD retains the process of encoding and decoding sequences, and the predicted probabilities of amino acids at each site can be further used to construct the mutation landscapes for mutation effects prediction, outperforming advanced models such as GEMME, DeepSequence and EVmutation in general. Overall, ETLD is a highly interpretable unsupervised model with great potential for improvement and can be further combined with supervised methods for more extensive and accurate predictions.
Collapse
Affiliation(s)
- He Wang
- MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, School of Physics, Xi'an Jiaotong University, Xi'an 710049, China
| | - Yongjian Zang
- MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, School of Physics, Xi'an Jiaotong University, Xi'an 710049, China
| | - Ying Kang
- MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, School of Physics, Xi'an Jiaotong University, Xi'an 710049, China
| | - Jianwen Zhang
- MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, School of Physics, Xi'an Jiaotong University, Xi'an 710049, China
| | - Lei Zhang
- MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, School of Physics, Xi'an Jiaotong University, Xi'an 710049, China
| | - Shengli Zhang
- MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, School of Physics, Xi'an Jiaotong University, Xi'an 710049, China
| |
Collapse
|
18
|
Li J, Sawhney A, Lee JY, Liao L. Improving Inter-Helix Contact Prediction With Local 2D Topological Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3001-3012. [PMID: 37155404 DOI: 10.1109/tcbb.2023.3274361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Inter-helix contact prediction is to identify residue contact across different helices in α-helical integral membrane proteins. Despite the progress made by various computational methods, contact prediction remains as a challenging task, and there is no method to our knowledge that directly tap into the contact map in an alignment free manner. We build 2D contact models from an independent dataset to capture the topological patterns in the neighborhood of a residue pair depending it is a contact or not, and apply the models to the state-of-art method's predictions to extract the features reflecting 2D inter-helix contact patterns. A secondary classifier is trained on such features. Realizing that the achievable improvement is intrinsically hinged on the quality of original predictions, we devise a mechanism to deal with the issue by introducing, 1) partial discretization of original prediction scores to more effectively leverage useful information 2) fuzzy score to assess the quality of the original prediction to help with selecting the residue pairs where improvement is more achievable. The cross-validation results show that the prediction from our method outperforms other methods including the state-of-the-art method (DeepHelicon) by a notable degree even without using the refinement selection scheme. By applying the refinement selection scheme, our method outperforms the state-of-the-art method significantly in these selected sequences.
Collapse
|
19
|
Ahdritz G, Bouatta N, Kadyan S, Jarosch L, Berenberg D, Fisk I, Watkins AM, Ra S, Bonneau R, AlQuraishi M. OpenProteinSet: Training data for structural biology at scale. ARXIV 2023:arXiv:2308.05326v1. [PMID: 37608940 PMCID: PMC10441447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
Multiple sequence alignments (MSAs) of proteins encode rich biological information and have been workhorses in bioinformatic methods for tasks like protein design and protein structure prediction for decades. Recent breakthroughs like AlphaFold2 that use transformers to attend directly over large quantities of raw MSAs have reaffirmed their importance. Generation of MSAs is highly computationally intensive, however, and no datasets comparable to those used to train AlphaFold2 have been made available to the research community, hindering progress in machine learning for proteins. To remedy this problem, we introduce OpenProteinSet, an open-source corpus of more than 16 million MSAs, associated structural homologs from the Protein Data Bank, and AlphaFold2 protein structure predictions. We have previously demonstrated the utility of OpenProteinSet by successfully retraining AlphaFold2 on it. We expect OpenProteinSet to be broadly useful as training and validation data for 1) diverse tasks focused on protein structure, function, and design and 2) large-scale multimodal machine learning research.
Collapse
Affiliation(s)
| | - Nazim Bouatta
- Laboratory of Systems Pharmacology, Harvard Medical School
| | | | | | - Daniel Berenberg
- Prescient Design, Genentech & Department of Computer Science, New York University
| | | | | | | | | | | |
Collapse
|
20
|
Dichio V, Zeng HL, Aurell E. Statistical genetics in and out of quasi-linkage equilibrium. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2023; 86:052601. [PMID: 36944245 DOI: 10.1088/1361-6633/acc5fa] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 03/21/2023] [Indexed: 06/18/2023]
Abstract
This review is about statistical genetics, an interdisciplinary topic between statistical physics and population biology. The focus is on the phase ofquasi-linkage equilibrium(QLE). Our goals here are to clarify under which conditions the QLE phase can be expected to hold in population biology and how the stability of the QLE phase is lost. The QLE state, which has many similarities to a thermal equilibrium state in statistical mechanics, was discovered by M Kimura for a two-locus two-allele model, and was extended and generalized to the global genome scale byNeher&Shraiman (2011). What we will refer to as the Kimura-Neher-Shraiman theory describes a population evolving due to the mutations, recombination, natural selection and possibly genetic drift. A QLE phase exists at sufficiently high recombination rate (r) and/or mutation ratesµwith respect to selection strength. We show how in QLE it is possible to infer the epistatic parameters of the fitness function from the knowledge of the (dynamical) distribution of genotypes in a population. We further consider the breakdown of the QLE regime for high enough selection strength. We review recent results for the selection-mutation and selection-recombination dynamics. Finally, we identify and characterize a new phase which we call the non-random coexistence where variability persists in the population without either fixating or disappearing.
Collapse
Affiliation(s)
- Vito Dichio
- Sorbonne Université, Paris Brain Institute-ICM, CNRS, Inria, Inserm, AP-HP, Hôpital de la Pitié Salpêtrière, F-75013 Paris, France
| | - Hong-Li Zeng
- School of Science, Nanjing University of Posts and Telecommunications, New Energy Technology Engineering Laboratory of Jiangsu Province, Nanjing 210023, People's Republic of China
| | - Erik Aurell
- Department of Computational Science and Technology, KTH-Royal Institute of Technology, AlbaNova University Center, SE-106 91 Stockholm, Sweden
| |
Collapse
|
21
|
Si Y, Yan C. Improved inter-protein contact prediction using dimensional hybrid residual networks and protein language models. Brief Bioinform 2023; 24:7033302. [PMID: 36759333 DOI: 10.1093/bib/bbad039] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Revised: 01/13/2023] [Accepted: 01/18/2023] [Indexed: 02/11/2023] Open
Abstract
The knowledge of contacting residue pairs between interacting proteins is very useful for the structural characterization of protein-protein interactions (PPIs). However, accurately identifying the tens of contacting ones from hundreds of thousands of inter-protein residue pairs is extremely challenging, and performances of the state-of-the-art inter-protein contact prediction methods are still quite limited. In this study, we developed a deep learning method for inter-protein contact prediction, which is referred to as DRN-1D2D_Inter. Specifically, we employed pretrained protein language models to generate structural information-enriched input features to residual networks formed by dimensional hybrid residual blocks to perform inter-protein contact prediction. Extensively bechmarking DRN-1D2D_Inter on multiple datasets, including both heteromeric PPIs and homomeric PPIs, we show DRN-1D2D_Inter consistently and significantly outperformed two state-of-the-art inter-protein contact prediction methods, including GLINTER and DeepHomo, although both the latter two methods leveraged the native structures of interacting proteins in the prediction, and DRN-1D2D_Inter made the prediction purely from sequences. We further show that applying the predicted contacts as constraints for protein-protein docking can significantly improve its performance for protein complex structure prediction.
Collapse
Affiliation(s)
- Yunda Si
- School of Physics, Huazhong University of Science and Technology, China
| | - Chengfei Yan
- School of Physics, Huazhong University of Science and Technology, China
| |
Collapse
|
22
|
Bhattacharya S, Roche R, Shuvo MH, Moussad B, Bhattacharya D. Contact-Assisted Threading in Low-Homology Protein Modeling. Methods Mol Biol 2023; 2627:41-59. [PMID: 36959441 DOI: 10.1007/978-1-0716-2974-1_3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
The ability to successfully predict the three-dimensional structure of a protein from its amino acid sequence has made considerable progress in the recent past. The progress is propelled by the improved accuracy of deep learning-based inter-residue contact map predictors coupled with the rising growth of protein sequence databases. Contact map encodes interatomic interaction information that can be exploited for highly accurate prediction of protein structures via contact map threading even for the query proteins that are not amenable to direct homology modeling. As such, contact-assisted threading has garnered considerable research effort. In this chapter, we provide an overview of existing contact-assisted threading methods while highlighting the recent advances and discussing some of the current limitations and future prospects in the application of contact-assisted threading for improving the accuracy of low-homology protein modeling.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| | | | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | | |
Collapse
|
23
|
Bartuzi D, Kaczor AA, Matosiuk D. Illuminating the "Twilight Zone": Advances in Difficult Protein Modeling. Methods Mol Biol 2023; 2627:25-40. [PMID: 36959440 DOI: 10.1007/978-1-0716-2974-1_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
Homology modeling was long considered a method of choice in tertiary protein structure prediction. However, it used to provide models of acceptable quality only when templates with appreciable sequence identity with a target could be found. The threshold value was long assumed to be around 20-30%. Below this level, obtained sequence identity was getting dangerously close to values that can be obtained by chance, after aligning any random, unrelated sequences. In these cases, other approaches, including ab initio folding simulations or fragment assembly, were usually employed. The most recent editions of the CASP and CAMEO community-wide modeling methods assessment have brought some surprising outcomes, proving that much more clues can be inferred from protein sequence analyses than previously thought. In this chapter, we focus on recent advances in the field of difficult protein modeling, pushing the threshold deep into the "twilight zone", with particular attention devoted to improvements in applications of machine learning and model evaluation.
Collapse
Affiliation(s)
- Damian Bartuzi
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland.
| | - Agnieszka A Kaczor
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland
- University of Eastern Finland, School of Pharmacy, Kuopio, Finland
| | - Dariusz Matosiuk
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland
| |
Collapse
|
24
|
Adiyaman R, McGuffin LJ. Using Local Protein Model Quality Estimates to Guide a Molecular Dynamics-Based Refinement Strategy. Methods Mol Biol 2023; 2627:119-140. [PMID: 36959445 DOI: 10.1007/978-1-0716-2974-1_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
The refinement of predicted 3D models aims to bring them closer to the native structure by fixing errors including unusual bonds and torsion angles and irregular hydrogen bonding patterns. Refinement approaches based on molecular dynamics (MD) simulations using different types of restraints have performed well since CASP10. ReFOLD, developed by the McGuffin group, was one of the many MD-based refinement approaches, which were tested in CASP 12. When the performance of the ReFOLD method in CASP12 was evaluated, it was observed that ReFOLD suffered from the absence of a reliable guidance mechanism to reach consistent improvement for the quality of predicted 3D models, particularly in the case of template-based modelling (TBM) targets. Therefore, here we propose to utilize the local quality assessment score produced by ModFOLD6 to guide the MD-based refinement approach to further increase the accuracy of the predicted 3D models. The relative performance of the new local quality assessment guided MD-based refinement protocol and the original MD-based protocol ReFOLD are compared utilizing many different official scoring methods. By using the per-residue accuracy (or local quality) score to guide the refinement process, we are able to prevent the refined models from undesired structural deviations, thereby leading to more consistent improvements. This chapter will include a detailed analysis of the performance of the local quality assessment guided MD-based protocol versus that deployed in the original ReFOLD method.
Collapse
Affiliation(s)
- Recep Adiyaman
- School of Biological Sciences, University of Reading, Reading, UK
| | - Liam J McGuffin
- School of Biological Sciences, University of Reading, Reading, UK.
| |
Collapse
|
25
|
Maduranga KDG, Zadorozhnyy V, Ye Q. Symmetry-structured convolutional neural networks. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-08168-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
26
|
Mufassirin MMM, Newton MAH, Sattar A. Artificial intelligence for template-free protein structure prediction: a comprehensive review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10350-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
27
|
Roche R, Bhattacharya S, Shuvo MH, Bhattacharya D. rrQNet: Protein contact map quality estimation by deep evolutionary reconciliation. Proteins 2022; 90:2023-2034. [PMID: 35751651 PMCID: PMC9633355 DOI: 10.1002/prot.26394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 05/31/2022] [Accepted: 06/21/2022] [Indexed: 11/10/2022]
Abstract
Protein contact maps have proven to be a valuable tool in the deep learning revolution of protein structure prediction, ushering in the recent breakthrough by AlphaFold2. However, self-assessment of the quality of predicted structures are typically performed at the granularity of three-dimensional coordinates as opposed to directly exploiting the rotation- and translation-invariant two-dimensional (2D) contact maps. Here, we present rrQNet, a deep learning method for self-assessment in 2D by contact map quality estimation. Our approach is based on the intuition that for a contact map to be of high quality, the residue pairs predicted to be in contact should be mutually consistent with the evolutionary context of the protein. The deep neural network architecture of rrQNet implements this intuition by cascading two deep modules-one encoding the evolutionary context and the other performing evolutionary reconciliation. The penultimate stage of rrQNet estimates the quality scores at the interacting residue-pair level, which are then aggregated for estimating the quality of a contact map. This design choice offers versatility at varied resolutions from individual residue pairs to full-fledged contact maps. Trained on multiple complementary sources of contact predictors, rrQNet facilitates generalizability across various contact maps. By rigorously testing using publicly available datasets and comparing against several in-house baseline approaches, we show that rrQNet accurately reproduces the true quality score of a predicted contact map and successfully distinguishes between accurate and inaccurate contact maps predicted by a wide variety of contact predictors. The open-source rrQNet software package is freely available at https://github.com/Bhattacharya-Lab/rrQNet.
Collapse
Affiliation(s)
- Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Sutanu Bhattacharya
- Department of Computer Science, Florida Polytechnic University, Lakeland, FL 33805, USA
| | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | | |
Collapse
|
28
|
Guo Z, Liu J, Skolnick J, Cheng J. Prediction of inter-chain distance maps of protein complexes with 2D attention-based deep neural networks. Nat Commun 2022; 13:6963. [PMID: 36379943 PMCID: PMC9666547 DOI: 10.1038/s41467-022-34600-2] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 10/24/2022] [Indexed: 11/16/2022] Open
Abstract
Residue-residue distance information is useful for predicting tertiary structures of protein monomers or quaternary structures of protein complexes. Many deep learning methods have been developed to predict intra-chain residue-residue distances of monomers accurately, but few methods can accurately predict inter-chain residue-residue distances of complexes. We develop a deep learning method CDPred (i.e., Complex Distance Prediction) based on the 2D attention-powered residual network to address the gap. Tested on two homodimer datasets, CDPred achieves the precision of 60.94% and 42.93% for top L/5 inter-chain contact predictions (L: length of the monomer in homodimer), respectively, substantially higher than DeepHomo's 37.40% and 23.08% and GLINTER's 48.09% and 36.74%. Tested on the two heterodimer datasets, the top Ls/5 inter-chain contact prediction precision (Ls: length of the shorter monomer in heterodimer) of CDPred is 47.59% and 22.87% respectively, surpassing GLINTER's 23.24% and 13.49%. Moreover, the prediction of CDPred is complementary with that of AlphaFold2-multimer.
Collapse
Affiliation(s)
- Zhiye Guo
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Jian Liu
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Jeffrey Skolnick
- grid.213917.f0000 0001 2097 4943School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332-200 USA
| | - Jianlin Cheng
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211 USA
| |
Collapse
|
29
|
Jiang T, Chen Y, Guan S, Hu Z, Lu W, Fu Q, Ding Y, Li H, Wu H. G Protein-Coupled Receptor Interaction Prediction Based on Deep Transfer Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3126-3134. [PMID: 34780331 DOI: 10.1109/tcbb.2021.3128172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
G protein-coupled receptors (GPCRs) account for about 40% to 50% of drug targets. Many human diseases are related to G protein coupled receptors. Accurate prediction of GPCR interaction is not only essential to understand its structural role, but also helps design more effective drugs. At present, the prediction of GPCR interaction mainly uses machine learning methods. Machine learning methods generally require a large number of independent and identically distributed samples to achieve good results. However, the number of available GPCR samples that have been marked is scarce. Transfer learning has a strong advantage in dealing with such small sample problems. Therefore, this paper proposes a transfer learning method based on sample similarity, using XGBoost as a weak classifier and using the TrAdaBoost algorithm based on JS divergence for data weight initialization to transfer samples to construct a data set. After that, the deep neural network based on the attention mechanism is used for model training. The existing GPCR is used for prediction. In short-distance contact prediction, the accuracy of our method is 0.26 higher than similar methods.
Collapse
|
30
|
Improved Protein Real-Valued Distance Prediction Using Deep Residual Dense Network (DRDN). Protein J 2022; 41:468-476. [PMID: 36008645 DOI: 10.1007/s10930-022-10067-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/15/2022] [Indexed: 10/15/2022]
Abstract
Three-dimensional protein structure prediction is one of the major challenges in bioinformatics. According to recent research findings, real-valued distance prediction plays a vital role in determining the unique three-dimensional protein structure. This paper proposes a novel methodology involving a deep residual dense network (DRDN) for predicting protein real-valued distance. The features extracted from the given query protein sequence and its corresponding homologous sequences are used for training the model. Multi-aligned homologous sequences for each query protein sequence are retrieved from five different databases using DeepMSA, HHblits, and HITS_PR_HHblits methods. The proposed method yielded outcomes of 3.89, 0.23, 0.45, and 0.63, respectively, corresponding to the evaluation metrics such as Absolute Error, Relative Error, High-accuracy Pairwise Distance Test (PDA), and Pairwise Distance Test (PDT). Further, the contact map is computed based on CASP criteria by converting the predicted real-valued distance, and it is evaluated using the precision metric. It is observed that precision of long-range top L/5 contact prediction on the CASP13 dataset by the proposed method, RaptorX, Zhang, trRosetta, JinboXu & JinLu, and Deepdist are 0.834, 0.657, 0.70, 0.785, 0.786, and 0.812, respectively. Also, Top-L/5 contact prediction on the CASP14 dataset evaluated using average precision resulted in 0.847, 0.707, 0.752, 0.783, 0.792, 0.817, and 0.825 respectively, corresponding to the proposed method, Zhang, RaptorX, trRosetta, Deepdist, JinboXu & JinLu, and Alphafold2.
Collapse
|
31
|
Wilman W, Wróbel S, Bielska W, Deszynski P, Dudzic P, Jaszczyszyn I, Kaniewski J, Młokosiewicz J, Rouyan A, Satława T, Kumar S, Greiff V, Krawczyk K. Machine-designed biotherapeutics: opportunities, feasibility and advantages of deep learning in computational antibody discovery. Brief Bioinform 2022; 23:bbac267. [PMID: 35830864 PMCID: PMC9294429 DOI: 10.1093/bib/bbac267] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 05/09/2022] [Accepted: 06/07/2022] [Indexed: 11/13/2022] Open
Abstract
Antibodies are versatile molecular binders with an established and growing role as therapeutics. Computational approaches to developing and designing these molecules are being increasingly used to complement traditional lab-based processes. Nowadays, in silico methods fill multiple elements of the discovery stage, such as characterizing antibody-antigen interactions and identifying developability liabilities. Recently, computational methods tackling such problems have begun to follow machine learning paradigms, in many cases deep learning specifically. This paradigm shift offers improvements in established areas such as structure or binding prediction and opens up new possibilities such as language-based modeling of antibody repertoires or machine-learning-based generation of novel sequences. In this review, we critically examine the recent developments in (deep) machine learning approaches to therapeutic antibody design with implications for fully computational antibody design.
Collapse
|
32
|
Rahman J, Newton MAH, Hasan MAM, Sattar A. A stacked meta-ensemble for protein inter-residue distance prediction. Comput Biol Med 2022; 148:105824. [PMID: 35863250 DOI: 10.1016/j.compbiomed.2022.105824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 06/21/2022] [Accepted: 07/03/2022] [Indexed: 11/25/2022]
Abstract
Predicted inter-residue distances are a key behind recent success in high quality protein structure prediction (PSP). However, prediction of both short and long distance values together is challenging. Consequently, predicted short distances are mostly used by existing PSP methods. In this paper, we use a stacked meta-ensemble method to combine deep learning models trained for different ranges of real-valued distances. On five benchmark sets of proteins, our proposed inter-residue distance prediction method improves mean Local Distance Different Test (LDDT) scores at least by 5% over existing such methods. Moreover, using a real-valued distance based conformational search algorithm, we also show that predicted long distances help obtain significantly better protein conformations than when only predicted short distances are used. Our method is named meta-ensemble for distance prediction (MDP) and its program is available from https://gitlab.com/mahnewton/mdp.
Collapse
Affiliation(s)
- Julia Rahman
- School of Information and Communication Technology, Griffith University, Queensland, Australia.
| | - M A Hakim Newton
- Institute of Integrated and Intelligent Systems, Griffith University, Queensland, Australia; School of Information and Physical Sciences, The University of Newcastle, New South Wales, Australia.
| | | | - Abdul Sattar
- School of Information and Communication Technology, Griffith University, Queensland, Australia; Institute of Integrated and Intelligent Systems, Griffith University, Queensland, Australia
| |
Collapse
|
33
|
Data Size and Quality Matter: Generating Physically-Realistic Distance Maps of Protein Tertiary Structures. Biomolecules 2022; 12:biom12070908. [PMID: 35883464 PMCID: PMC9313347 DOI: 10.3390/biom12070908] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 06/14/2022] [Accepted: 06/23/2022] [Indexed: 02/01/2023] Open
Abstract
With the debut of AlphaFold2, we now can get a highly-accurate view of a reasonable equilibrium tertiary structure of a protein molecule. Yet, a single-structure view is insufficient and does not account for the high structural plasticity of protein molecules. Obtaining a multi-structure view of a protein molecule continues to be an outstanding challenge in computational structural biology. In tandem with methods formulated under the umbrella of stochastic optimization, we are now seeing rapid advances in the capabilities of methods based on deep learning. In recent work, we advance the capability of these models to learn from experimentally-available tertiary structures of protein molecules of varying lengths. In this work, we elucidate the important role of the composition of the training dataset on the neural network’s ability to learn key local and distal patterns in tertiary structures. To make such patterns visible to the network, we utilize a contact map-based representation of protein tertiary structure. We show interesting relationships between data size, quality, and composition on the ability of latent variable models to learn key patterns of tertiary structure. In addition, we present a disentangled latent variable model which improves upon the state-of-the-art variable autoencoder-based model in key, physically-realistic structural patterns. We believe this work opens up further avenues of research on deep learning-based models for computing multi-structure views of protein molecules.
Collapse
|
34
|
Zhang H, Huang Y, Bei Z, Ju Z, Meng J, Hao M, Zhang J, Zhang H, Xi W. Inter-Residue Distance Prediction From Duet Deep Learning Models. Front Genet 2022; 13:887491. [PMID: 35651930 PMCID: PMC9148999 DOI: 10.3389/fgene.2022.887491] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 03/30/2022] [Indexed: 12/04/2022] Open
Abstract
Residue distance prediction from the sequence is critical for many biological applications such as protein structure reconstruction, protein–protein interaction prediction, and protein design. However, prediction of fine-grained distances between residues with long sequence separations still remains challenging. In this study, we propose DuetDis, a method based on duet feature sets and deep residual network with squeeze-and-excitation (SE), for protein inter-residue distance prediction. DuetDis embraces the ability to learn and fuse features directly or indirectly extracted from the whole-genome/metagenomic databases and, therefore, minimize the information loss through ensembling models trained on different feature sets. We evaluate DuetDis and 11 widely used peer methods on a large-scale test set (610 proteins chains). The experimental results suggest that 1) prediction results from different feature sets show obvious differences; 2) ensembling different feature sets can improve the prediction performance; 3) high-quality multiple sequence alignment (MSA) used for both training and testing can greatly improve the prediction performance; and 4) DuetDis is more accurate than peer methods for the overall prediction, more reliable in terms of model prediction score, and more robust against shallow multiple sequence alignment (MSA).
Collapse
Affiliation(s)
- Huiling Zhang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Ying Huang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhendong Bei
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhen Ju
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jintao Meng
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Min Hao
- College of Electronic and Information Engineering, Southwest University, Chongqing, China
| | - Jingjing Zhang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Haiping Zhang
- University of Chinese Academy of Sciences, Beijing, China
| | - Wenhui Xi
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
- *Correspondence: Wenhui Xi,
| |
Collapse
|
35
|
Weissenow K, Heinzinger M, Rost B. Protein language-model embeddings for fast, accurate, and alignment-free protein structure prediction. Structure 2022; 30:1169-1177.e4. [DOI: 10.1016/j.str.2022.05.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 02/25/2022] [Accepted: 04/29/2022] [Indexed: 01/27/2023]
|
36
|
Gu J, Zhang T, Wu C, Liang Y, Shi X. Refined Contact Map Prediction of Peptides Based on GCN and ResNet. Front Genet 2022; 13:859626. [PMID: 35571037 PMCID: PMC9092020 DOI: 10.3389/fgene.2022.859626] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 03/23/2022] [Indexed: 11/13/2022] Open
Abstract
Predicting peptide inter-residue contact maps plays an important role in computational biology, which determines the topology of the peptide structure. However, due to the limited number of known homologous structures, there is still much room for inter-residue contact map prediction. Current models are not sufficient for capturing the high accuracy relationship between the residues, especially for those with a long-range distance. In this article, we developed a novel deep neural network framework to refine the rough contact map produced by the existing methods. The rough contact map is used to construct the residue graph that is processed by the graph convolutional neural network (GCN). GCN can better capture the global information and is therefore used to grasp the long-range contact relationship. The residual convolutional neural network is also applied in the framework for learning local information. We conducted the experiments on four different test datasets, and the inter-residue long-range contact map prediction accuracy demonstrates the effectiveness of our proposed method.
Collapse
Affiliation(s)
- Jiawei Gu
- College of Computer Science and Technology, University of Jilin, Changchun, China
| | - Tianhao Zhang
- College of Computer Science and Technology, University of Jilin, Changchun, China
| | - Chunguo Wu
- College of Computer Science and Technology, University of Jilin, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Changchun, China
| | - Yanchun Liang
- College of Computer Science and Technology, University of Jilin, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Changchun, China
- School of Computer Science, Zhuhai College of Science and Technology, Zhuhai, China
| | - Xiaohu Shi
- College of Computer Science and Technology, University of Jilin, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Changchun, China
- School of Computer Science, Zhuhai College of Science and Technology, Zhuhai, China
- *Correspondence: Xiaohu Shi,
| |
Collapse
|
37
|
Chelur VR, Priyakumar UD. BiRDS - Binding Residue Detection from Protein Sequences Using Deep ResNets. J Chem Inf Model 2022; 62:1809-1818. [PMID: 35414182 DOI: 10.1021/acs.jcim.1c00972] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Protein-drug interactions play important roles in many biological processes and therapeutics. Predicting the binding sites of a protein helps to discover such interactions. New drugs can be designed to optimize these interactions, improving protein function. The tertiary structure of a protein decides the binding sites available to the drug molecule, but the determination of the 3D structure is slow and expensive. Conversely, the determination of the amino acid sequence is swift and economical. Although quick and accurate prediction of the binding site using just the sequence is challenging, the application of Deep Learning, which has been hugely successful in several biochemical tasks, makes it feasible. BiRDS is a Residual Neural Network that predicts the protein's most active binding site using sequence information. SC-PDB, an annotated database of druggable binding sites, is used for training the network. Multiple Sequence Alignments of the proteins in the database are generated using DeepMSA, and features such as Position-Specific Scoring Matrix, Secondary Structure, and Relative Solvent Accessibility are extracted. During training, a weighted binary cross-entropy loss function is used to counter the substantial imbalance in the two classes of binding and nonbinding residues. A novel test set SC6K is introduced to compare binding-site prediction methods. BiRDS achieves an AUROC score of 0.87, and the center of 25% of its predicted binding sites lie within 4 Å of the center of the actual binding site.
Collapse
Affiliation(s)
- Vineeth R Chelur
- Center for Computational Natural Sciences & Bioinformatics International Institute of Information Technology Hyderabad 500032, India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences & Bioinformatics International Institute of Information Technology Hyderabad 500032, India
| |
Collapse
|
38
|
Neuwald AF, Yang H, Tracy Nixon B. SPARC: Structural properties associated with residue constraints. Comput Struct Biotechnol J 2022; 20:1702-1715. [PMID: 35495120 PMCID: PMC9020082 DOI: 10.1016/j.csbj.2022.04.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 03/22/2022] [Accepted: 04/05/2022] [Indexed: 11/17/2022] Open
Abstract
SPARC facilitates the generation of plausible hypotheses regarding underlying biochemical mechanisms by structurally characterizing protein sequence constraints. Such constraints appear as residues co-conserved in functionally related subgroups, as subtle pairwise correlations (i.e., direct couplings), and as correlations among these sequence features or with structural features. SPARC performs three types of analyses. First, based on pairwise sequence correlations, it estimates the biological relevance of alternative conformations and of homomeric contacts, as illustrated here for death domains. Second, it estimates the statistical significance of the correspondence between directly coupled residue pairs and interactions at heterodimeric interfaces. Third, given molecular dynamics simulated structures, it characterizes interactions among constrained residues or between such residues and ligands that: (a) are stably maintained during the simulation; (b) undergo correlated formation and/or disruption of interactions with other constrained residues; or (c) switch between alternative interactions. We illustrate this for two homohexameric complexes: the bacterial enhancer binding protein (bEBP) NtrC1, which activates transcription by remodeling RNA polymerase (RNAP) containing σ54, and for DnaB helicase, which opens DNA at the bacterial replication fork. Based on the NtrC1 analysis, we hypothesize possible mechanisms for inhibiting ATP hydrolysis until ADP is released from an adjacent subunit and for coupling ATP hydrolysis to restructuring of σ54 binding loops. Based on the DnaB analysis, we hypothesize that DnaB 'grabs' ssDNA by flipping every fourth base and inserting it into cavities between subunits and that flipping of a DnaB-specific glutamine residue triggers ATP hydrolysis.
Collapse
Affiliation(s)
- Andrew F. Neuwald
- Institute for Genome Sciences and Department of Biochemistry & Molecular Biology, University of Maryland School of Medicine, 670 W. Baltimore Steet, Baltimore, MD 21201, USA,Corresponding author.
| | - Hui Yang
- Department of Biology. Penn State University, 304A Frear South Building, University Park, PA 16802
| | - B. Tracy Nixon
- Department of Biochemistry and Molecular Biology, 335 Frear South Building, University Park, PA 16802, USA
| |
Collapse
|
39
|
Santra S, Jana M. Predicting the evolution of number of native contacts of a small protein by using deep learning approach. Comput Biol Chem 2022; 97:107625. [DOI: 10.1016/j.compbiolchem.2022.107625] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 01/07/2022] [Accepted: 01/09/2022] [Indexed: 11/28/2022]
|
40
|
Lee D, Xiong D, Wierbowski S, Li L, Liang S, Yu H. Deep learning methods for 3D structural proteome and interactome modeling. Curr Opin Struct Biol 2022; 73:102329. [PMID: 35139457 PMCID: PMC8957610 DOI: 10.1016/j.sbi.2022.102329] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 12/05/2021] [Accepted: 12/31/2021] [Indexed: 12/19/2022]
Abstract
Bolstered by recent methodological and hardware advances, deep learning has increasingly been applied to biological problems and structural proteomics. Such approaches have achieved remarkable improvements over traditional machine learning methods in tasks ranging from protein contact map prediction to protein folding, prediction of protein-protein interaction interfaces, and characterization of protein-drug binding pockets. In particular, emergence of ab initio protein structure prediction methods including AlphaFold2 has revolutionized protein structural modeling. From a protein function perspective, numerous deep learning methods have facilitated deconvolution of the exact amino acid residues and protein surface regions responsible for binding other proteins or small molecule drugs. In this review, we provide a comprehensive overview of recent deep learning methods applied in structural proteomics.
Collapse
Affiliation(s)
- Dongjin Lee
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Shayne Wierbowski
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Le Li
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Siqi Liang
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Haiyuan Yu
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA.
| |
Collapse
|
41
|
Hong Y, Lee J, Ko J. A-Prot: protein structure modeling using MSA transformer. BMC Bioinformatics 2022; 23:93. [PMID: 35296230 PMCID: PMC8925138 DOI: 10.1186/s12859-022-04628-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Accepted: 03/03/2022] [Indexed: 11/18/2022] Open
Abstract
Background The accuracy of protein 3D structure prediction has been dramatically improved with the help of advances in deep learning. In the recent CASP14, Deepmind demonstrated that their new version of AlphaFold (AF) produces highly accurate 3D models almost close to experimental structures. The success of AF shows that the multiple sequence alignment of a sequence contains rich evolutionary information, leading to accurate 3D models. Despite the success of AF, only the prediction code is open, and training a similar model requires a vast amount of computational resources. Thus, developing a lighter prediction model is still necessary. Results In this study, we propose a new protein 3D structure modeling method, A-Prot, using MSA Transformer, one of the state-of-the-art protein language models. An MSA feature tensor and row attention maps are extracted and converted into 2D residue-residue distance and dihedral angle predictions for a given MSA. We demonstrated that A-Prot predicts long-range contacts better than the existing methods. Additionally, we modeled the 3D structures of the free modeling and hard template-based modeling targets of CASP14. The assessment shows that the A-Prot models are more accurate than most top server groups of CASP14. Conclusion These results imply that A-Prot accurately captures the evolutionary and structural information of proteins with relatively low computational cost. Thus, A-Prot can provide a clue for the development of other protein property prediction methods. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04628-8.
Collapse
Affiliation(s)
- Yiyu Hong
- Arontier Co, Seoul, Republic of Korea
| | - Juyong Lee
- Arontier Co, Seoul, Republic of Korea. .,Department of Chemistry, Division of Chemistry and Biochemistry, Kangwon National University, Chuncheon, Republic of Korea.
| | - Junsu Ko
- Arontier Co, Seoul, Republic of Korea
| |
Collapse
|
42
|
Wang L, Zhang J, Wang D, Song C. Membrane contact probability: An essential and predictive character for the structural and functional studies of membrane proteins. PLoS Comput Biol 2022; 18:e1009972. [PMID: 35353812 PMCID: PMC9000120 DOI: 10.1371/journal.pcbi.1009972] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 04/11/2022] [Accepted: 02/25/2022] [Indexed: 11/20/2022] Open
Abstract
One of the unique traits of membrane proteins is that a significant fraction of their hydrophobic amino acids is exposed to the hydrophobic core of lipid bilayers rather than being embedded in the protein interior, which is often not explicitly considered in the protein structure and function predictions. Here, we propose a characteristic and predictive quantity, the membrane contact probability (MCP), to describe the likelihood of the amino acids of a given sequence being in direct contact with the acyl chains of lipid molecules. We show that MCP is complementary to solvent accessibility in characterizing the outer surface of membrane proteins, and it can be predicted for any given sequence with a machine learning-based method by utilizing a training dataset extracted from MemProtMD, a database generated from molecular dynamics simulations for the membrane proteins with a known structure. As the first of many potential applications, we demonstrate that MCP can be used to systematically improve the prediction precision of the protein contact maps and structures. The distribution of residues on protein surfaces is largely determined by the surrounding environment. For soluble proteins, most of the residues on the outer surface are hydrophilic, and people use the quantity “solvent accessibility” to describe and predict these surface residues. In contrast, for membrane proteins that are embedded in a lipid bilayer, many of their surface residues are hydrophobic and membrane-contacting, but there is yet a widely-accepted quantity for the description or prediction of this characteristic property. Here, we propose a new quantity termed “membrane contact probability (MCP)”, which can be used to describe and predict the membrane-contacting surface residues of proteins. We also propose a machine learning-based method to predict MCP from protein sequences, utilizing the dataset generated by physics-based computer simulations. We demonstrate that a quantity such as MCP is helpful for protein structure prediction, and we believe that it will find broad applications in the structure and function studies of membrane proteins.
Collapse
Affiliation(s)
- Lei Wang
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary studies, Peking University, Beijing, China
| | - Jiangguo Zhang
- School of Life Sciences, Peking University, Beijing, China
| | - Dali Wang
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary studies, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Chen Song
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary studies, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- * E-mail:
| |
Collapse
|
43
|
Singh J, Litfin T, Singh J, Paliwal K, Zhou Y. SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model. Bioinformatics 2022; 38:1888-1894. [PMID: 35104320 PMCID: PMC9113311 DOI: 10.1093/bioinformatics/btac053] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Revised: 11/21/2021] [Accepted: 01/26/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Accurate prediction of protein contact-map is essential for accurate protein structure and function prediction. As a result, many methods have been developed for protein contact map prediction. However, most methods rely on protein-sequence-evolutionary information, which may not exist for many proteins due to lack of naturally occurring homologous sequences. Moreover, generating evolutionary profiles is computationally intensive. Here, we developed a contact-map predictor utilizing the output of a pre-trained language model ESM-1b as an input along with a large training set and an ensemble of residual neural networks. RESULTS We showed that the proposed method makes a significant improvement over a single-sequence-based predictor SSCpred with 15% improvement in the F1-score for the independent CASP14-FM test set. It also outperforms evolutionary-profile-based methods trRosetta and SPOT-Contact with 48.7% and 48.5% respective improvement in the F1-score on the proteins without homologs (Neff = 1) in the independent SPOT-2018 set. The new method provides a much faster and reasonably accurate alternative to evolution-based methods, useful for large-scale prediction. AVAILABILITY AND IMPLEMENTATION Stand-alone-version of SPOT-Contact-LM is available at https://github.com/jas-preet/SPOT-Contact-Single. Direct prediction can also be made at https://sparks-lab.org/server/spot-contact-single. The datasets used in this research can also be downloaded from the GitHub. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Thomas Litfin
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Jaswinder Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | | | - Yaoqi Zhou
- To whom correspondence should be addressed. or or
| |
Collapse
|
44
|
Bhattacharya S, Roche R, Moussad B, Bhattacharya D. DisCovER: distance- and orientation-based covariational threading for weakly homologous proteins. Proteins 2022; 90:579-588. [PMID: 34599831 PMCID: PMC8738102 DOI: 10.1002/prot.26254] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2021] [Revised: 09/22/2021] [Accepted: 09/28/2021] [Indexed: 02/03/2023]
Abstract
Threading a query protein sequence onto a library of weakly homologous structural templates remains challenging, even when sequence-based predicted contact or distance information is used. Contact-assisted or distance-assisted threading methods utilize only the spatial proximity of the interacting residue pairs for template selection and alignment, ignoring their orientation. Moreover, existing threading methods fail to consider the neighborhood effect induced by the query-template alignment. We present a new distance- and orientation-based covariational threading method called DisCovER by effectively integrating information from inter-residue distance and orientation along with the topological network neighborhood of a query-template alignment. Our method first selects a subset of templates using standard profile-based threading coupled with topological network similarity terms to account for the neighborhood effect and subsequently performs distance- and orientation-based query-template alignment using an iterative double dynamic programming framework. Multiple large-scale benchmarking results on query proteins classified as weakly homologous from the continuous automated model evaluation experiment and from the current literature show that our method outperforms several existing state-of-the-art threading approaches, and that the integration of the neighborhood effect with the inter-residue distance and orientation information synergistically contributes to the improved performance of DisCovER. DisCovER is freely available at https://github.com/Bhattacharya-Lab/DisCovER.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science, Florida Polytechnic University, Lakeland, FL 33805, USA
| | - Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | | |
Collapse
|
45
|
Fukunaga T, Iwasaki W. Inverse Potts model improves accuracy of phylogenetic profiling. Bioinformatics 2022; 38:1794-1800. [PMID: 35060594 PMCID: PMC8963296 DOI: 10.1093/bioinformatics/btac034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 01/11/2022] [Accepted: 01/13/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Phylogenetic profiling is a powerful computational method for revealing the functions of function-unknown genes. Although conventional similarity metrics in phylogenetic profiling achieved high prediction accuracy, they have two estimation biases: an evolutionary bias and a spurious correlation bias. While previous studies reduced the evolutionary bias by considering a phylogenetic tree, few studies have analyzed the spurious correlation bias. RESULTS To reduce the spurious correlation bias, we developed metrics based on the inverse Potts model (IPM) for phylogenetic profiling. We also developed a metric based on both the IPM and a phylogenetic tree. In an empirical dataset analysis, we demonstrated that these IPM-based metrics improved the prediction performance of phylogenetic profiling. In addition, we found that the integration of several metrics, including the IPM-based metrics, had superior performance to a single metric. AVAILABILITY AND IMPLEMENTATION The source code is freely available at https://github.com/fukunagatsu/Ipm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Wataru Iwasaki
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 2770882, Japan,Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo 1130032, Japan,Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 2770882, Japan,Atmosphere and Ocean Research Institute, The University of Tokyo, Chiba 2770882, Japan,Institute for Quantitative Biosciences, The University of Tokyo, Tokyo 1130032, Japan,Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Tokyo 1130032, Japan
| |
Collapse
|
46
|
Rahman J, Newton MAH, Islam MKB, Sattar A. Enhancing protein inter-residue real distance prediction by scrutinising deep learning models. Sci Rep 2022; 12:787. [PMID: 35039537 PMCID: PMC8764118 DOI: 10.1038/s41598-021-04441-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 12/17/2021] [Indexed: 12/29/2022] Open
Abstract
Protein structure prediction (PSP) has achieved significant progress lately via prediction of inter-residue distances using deep learning models and exploitation of the predictions during conformational search. In this context, prediction of large inter-residue distances and also prediction of distances between residues separated largely in the protein sequence remain challenging. To deal with these challenges, state-of-the-art inter-residue distance prediction algorithms have used large sets of coevolutionary and non-coevolutionary features. In this paper, we argue that the more the types of features used, the more the kinds of noises introduced and then the deep learning model has to overcome the noises to improve the accuracy of the predictions. Also, multiple features capturing similar underlying characteristics might not necessarily have significantly better cumulative effect. So we scrutinise the feature space to reduce the types of features to be used, but at the same time, we strive to improve the prediction accuracy. Consequently, for inter-residue real distance prediction, in this paper, we propose a deep learning model named scrutinised distance predictor (SDP), which uses only 2 coevolutionary and 3 non-coevolutionary features. On several sets of benchmark proteins, our proposed SDP method improves mean Local Distance Different Test (LDDT) scores at least by 10% over existing state-of-the-art methods. The SDP program along with its data is available from the website https://gitlab.com/mahnewton/sdp .
Collapse
Affiliation(s)
- Julia Rahman
- School of Information and Communication Technology, Griffith University, Southport, Australia.
| | - M A Hakim Newton
- Institute of Integrated and Intelligent Systems, Griffith University, Southport, Australia.
| | - Md Khaled Ben Islam
- School of Information and Communication Technology, Griffith University, Southport, Australia
| | - Abdul Sattar
- School of Information and Communication Technology, Griffith University, Southport, Australia
- Institute of Integrated and Intelligent Systems, Griffith University, Southport, Australia
| |
Collapse
|
47
|
Tran NH, Xu J, Li M. A tale of solving two computational challenges in protein science: neoantigen prediction and protein structure prediction. Brief Bioinform 2022; 23:bbab493. [PMID: 34891158 PMCID: PMC8769896 DOI: 10.1093/bib/bbab493] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 10/11/2021] [Accepted: 10/26/2021] [Indexed: 12/30/2022] Open
Abstract
In this article, we review two challenging computational questions in protein science: neoantigen prediction and protein structure prediction. Both topics have seen significant leaps forward by deep learning within the past five years, which immediately unlocked new developments of drugs and immunotherapies. We show that deep learning models offer unique advantages, such as representation learning and multi-layer architecture, which make them an ideal choice to leverage a huge amount of protein sequence and structure data to address those two problems. We also discuss the impact and future possibilities enabled by those two applications, especially how the data-driven approach by deep learning shall accelerate the progress towards personalized biomedicine.
Collapse
Affiliation(s)
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, USA
| | - Ming Li
- University of Waterloo, Canada
| |
Collapse
|
48
|
Peng CX, Zhou XG, Zhang GJ. De novo Protein Structure Prediction by Coupling Contact With Distance Profile. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:395-406. [PMID: 32750861 DOI: 10.1109/tcbb.2020.3000758] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
De novo protein structure prediction is a challenging problem that requires both an accurate energy function and an efficient conformation sampling method. In this study, a de novo structure prediction method, named CoDiFold, is proposed. In CoDiFold, contacts and distance profiles are organically combined into the Rosetta low-resolution energy function to improve the accuracy of energy function. As a result, the correlation between energy and root mean square deviation (RMSD) is improved. In addition, a population-based multi-mutation strategy is designed to balance the exploration and exploitation of conformation space sampling. The average RMSD of the models generated by the proposed protocol is decreased by 49.24 and 45.21 percent in the test set with 43 proteins compared with those of Rosetta and QUARK de novo protocols, respectively. The results also demonstrate that the structures predicted by proposed CoDiFold are comparable to the state-of-the-art methods for the 10 FM targets of CASP13. The source code and executable versions are freely available at http://github.com/iobio-zjut/CoDiFold.
Collapse
|
49
|
Li Y, Zhang C, Zheng W, Zhou X, Bell EW, Yu DJ, Zhang Y. Protein inter-residue contact and distance prediction by coupling complementary coevolution features with deep residual networks in CASP14. Proteins 2021; 89:1911-1921. [PMID: 34382712 PMCID: PMC8616805 DOI: 10.1002/prot.26211] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Revised: 07/24/2021] [Accepted: 08/05/2021] [Indexed: 01/12/2023]
Abstract
This article reports and analyzes the results of protein contact and distance prediction by our methods in the 14th Critical Assessment of techniques for protein Structure Prediction (CASP14). A new deep learning-based contact/distance predictor was employed based on the ensemble of two complementary coevolution features coupling with deep residual networks. We also improved our multiple sequence alignment (MSA) generation protocol with wholesale meta-genome sequence databases. On 22 CASP14 free modeling (FM) targets, the proposed model achieved a top-L/5 long-range precision of 63.8% and a mean distance bin error of 1.494. Based on the predicted distance potentials, 11 out of 22 FM targets and all of the 14 FM/template-based modeling (TBM) targets have correctly predicted folds (TM-score >0.5), suggesting that our approach can provide reliable distance potentials for ab initio protein folding.
Collapse
Affiliation(s)
- Yang Li
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Eric W. Bell
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
50
|
Lensink MF, Brysbaert G, Mauri T, Nadzirin N, Velankar S, Chaleil RAG, Clarence T, Bates PA, Kong R, Liu B, Yang G, Liu M, Shi H, Lu X, Chang S, Roy RS, Quadir F, Liu J, Cheng J, Antoniak A, Czaplewski C, Giełdoń A, Kogut M, Lipska AG, Liwo A, Lubecka EA, Maszota-Zieleniak M, Sieradzan AK, Ślusarz R, Wesołowski PA, Zięba K, Del Carpio Muñoz CA, Ichiishi E, Harmalkar A, Gray JJ, Bonvin AMJJ, Ambrosetti F, Vargas Honorato R, Jandova Z, Jiménez-García B, Koukos PI, Van Keulen S, Van Noort CW, Réau M, Roel-Touris J, Kotelnikov S, Padhorny D, Porter KA, Alekseenko A, Ignatov M, Desta I, Ashizawa R, Sun Z, Ghani U, Hashemi N, Vajda S, Kozakov D, Rosell M, Rodríguez-Lumbreras LA, Fernandez-Recio J, Karczynska A, Grudinin S, Yan Y, Li H, Lin P, Huang SY, Christoffer C, Terashi G, Verburgt J, Sarkar D, Aderinwale T, Wang X, Kihara D, Nakamura T, Hanazono Y, Gowthaman R, Guest JD, Yin R, Taherzadeh G, Pierce BG, Barradas-Bautista D, Cao Z, Cavallo L, Oliva R, Sun Y, Zhu S, Shen Y, Park T, Woo H, Yang J, Kwon S, Won J, Seok C, Kiyota Y, Kobayashi S, Harada Y, Takeda-Shitaka M, Kundrotas PJ, Singh A, Vakser IA, et alLensink MF, Brysbaert G, Mauri T, Nadzirin N, Velankar S, Chaleil RAG, Clarence T, Bates PA, Kong R, Liu B, Yang G, Liu M, Shi H, Lu X, Chang S, Roy RS, Quadir F, Liu J, Cheng J, Antoniak A, Czaplewski C, Giełdoń A, Kogut M, Lipska AG, Liwo A, Lubecka EA, Maszota-Zieleniak M, Sieradzan AK, Ślusarz R, Wesołowski PA, Zięba K, Del Carpio Muñoz CA, Ichiishi E, Harmalkar A, Gray JJ, Bonvin AMJJ, Ambrosetti F, Vargas Honorato R, Jandova Z, Jiménez-García B, Koukos PI, Van Keulen S, Van Noort CW, Réau M, Roel-Touris J, Kotelnikov S, Padhorny D, Porter KA, Alekseenko A, Ignatov M, Desta I, Ashizawa R, Sun Z, Ghani U, Hashemi N, Vajda S, Kozakov D, Rosell M, Rodríguez-Lumbreras LA, Fernandez-Recio J, Karczynska A, Grudinin S, Yan Y, Li H, Lin P, Huang SY, Christoffer C, Terashi G, Verburgt J, Sarkar D, Aderinwale T, Wang X, Kihara D, Nakamura T, Hanazono Y, Gowthaman R, Guest JD, Yin R, Taherzadeh G, Pierce BG, Barradas-Bautista D, Cao Z, Cavallo L, Oliva R, Sun Y, Zhu S, Shen Y, Park T, Woo H, Yang J, Kwon S, Won J, Seok C, Kiyota Y, Kobayashi S, Harada Y, Takeda-Shitaka M, Kundrotas PJ, Singh A, Vakser IA, Dapkūnas J, Olechnovič K, Venclovas Č, Duan R, Qiu L, Xu X, Zhang S, Zou X, Wodak SJ. Prediction of protein assemblies, the next frontier: The CASP14-CAPRI experiment. Proteins 2021; 89:1800-1823. [PMID: 34453465 PMCID: PMC8616814 DOI: 10.1002/prot.26222] [Show More Authors] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 07/24/2021] [Accepted: 08/05/2021] [Indexed: 12/19/2022]
Abstract
We present the results for CAPRI Round 50, the fourth joint CASP-CAPRI protein assembly prediction challenge. The Round comprised a total of twelve targets, including six dimers, three trimers, and three higher-order oligomers. Four of these were easy targets, for which good structural templates were available either for the full assembly, or for the main interfaces (of the higher-order oligomers). Eight were difficult targets for which only distantly related templates were found for the individual subunits. Twenty-five CAPRI groups including eight automatic servers submitted ~1250 models per target. Twenty groups including six servers participated in the CAPRI scoring challenge submitted ~190 models per target. The accuracy of the predicted models was evaluated using the classical CAPRI criteria. The prediction performance was measured by a weighted scoring scheme that takes into account the number of models of acceptable quality or higher submitted by each group as part of their five top-ranking models. Compared to the previous CASP-CAPRI challenge, top performing groups submitted such models for a larger fraction (70-75%) of the targets in this Round, but fewer of these models were of high accuracy. Scorer groups achieved stronger performance with more groups submitting correct models for 70-80% of the targets or achieving high accuracy predictions. Servers performed less well in general, except for the MDOCKPP and LZERD servers, who performed on par with human groups. In addition to these results, major advances in methodology are discussed, providing an informative overview of where the prediction of protein assemblies currently stands.
Collapse
Affiliation(s)
- Marc F Lensink
- CNRS UMR8576 UGSF, Institute for Structural and Functional Glycobiology, University of Lille, Lille, France
| | - Guillaume Brysbaert
- CNRS UMR8576 UGSF, Institute for Structural and Functional Glycobiology, University of Lille, Lille, France
| | - Théo Mauri
- CNRS UMR8576 UGSF, Institute for Structural and Functional Glycobiology, University of Lille, Lille, France
| | - Nurul Nadzirin
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Sameer Velankar
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | | | - Tereza Clarence
- Biomolecular Modelling Laboratory, The Francis Crick Institute, London, UK
| | - Paul A Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, London, UK
| | - Ren Kong
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Bin Liu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Guangbo Yang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Ming Liu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Hang Shi
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Xufeng Lu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Raj S Roy
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Farhan Quadir
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
- Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri, USA
| | - Anna Antoniak
- Faculty of Chemistry, University of Gdansk, Gdansk, Poland
| | | | - Artur Giełdoń
- Faculty of Chemistry, University of Gdansk, Gdansk, Poland
| | - Mateusz Kogut
- Faculty of Chemistry, University of Gdansk, Gdansk, Poland
| | | | - Adam Liwo
- Faculty of Chemistry, University of Gdansk, Gdansk, Poland
| | - Emilia A Lubecka
- Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Gdansk, Poland
| | | | | | - Rafał Ślusarz
- Faculty of Chemistry, University of Gdansk, Gdansk, Poland
| | - Patryk A Wesołowski
- Faculty of Chemistry, University of Gdansk, Gdansk, Poland
- Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Gdansk, Poland
| | - Karolina Zięba
- Faculty of Chemistry, University of Gdansk, Gdansk, Poland
| | | | - Eiichiro Ichiishi
- International University of Health and Welfare Hospital (IUHW Hospital), Nasushiobara City, Japan
| | - Ameya Harmalkar
- Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, USA
| | - Jeffrey J Gray
- Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, USA
| | - Alexandre M J J Bonvin
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Francesco Ambrosetti
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Rodrigo Vargas Honorato
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Zuzana Jandova
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Brian Jiménez-García
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Panagiotis I Koukos
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Siri Van Keulen
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Charlotte W Van Noort
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Manon Réau
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Jorge Roel-Touris
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Sergei Kotelnikov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
- Innopolis University, Russia
| | - Dzmitry Padhorny
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Kathryn A Porter
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Andrey Alekseenko
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
- Institute of Computer-Aided Design of the Russian Academy of Sciences, Moscow, Russia
| | - Mikhail Ignatov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Israel Desta
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Ryota Ashizawa
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Zhuyezi Sun
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Usman Ghani
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Nasser Hashemi
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
- Department of Chemistry, Boston University, Boston, Massachusetts, USA
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Mireia Rosell
- Instituto de Ciencias de la Vid y del Vino (ICVV), CSIC - Universidad de la Rioja - Gobierno de La Rioja, Logrono, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Luis A Rodríguez-Lumbreras
- Instituto de Ciencias de la Vid y del Vino (ICVV), CSIC - Universidad de la Rioja - Gobierno de La Rioja, Logrono, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Juan Fernandez-Recio
- Instituto de Ciencias de la Vid y del Vino (ICVV), CSIC - Universidad de la Rioja - Gobierno de La Rioja, Logrono, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | | | - Sergei Grudinin
- Université Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, Grenoble, France
| | - Yumeng Yan
- School of Physics, Huazhong University of Science and Technology, Wuhan, China
| | - Hao Li
- School of Physics, Huazhong University of Science and Technology, Wuhan, China
| | - Peicong Lin
- School of Physics, Huazhong University of Science and Technology, Wuhan, China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, China
| | - Charles Christoffer
- Department of Computer Science, Purdue University, West Lafayette, Indiana, USA
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Jacob Verburgt
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Daipayan Sarkar
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Tunde Aderinwale
- Department of Computer Science, Purdue University, West Lafayette, Indiana, USA
| | - Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, Indiana, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, Indiana, USA
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Tsukasa Nakamura
- Graduate School of Information Sciences, Tohoku University, Sendai, Miyagi, Japan
| | - Yuya Hanazono
- Institute for Quantum Life Science, National Institutes for Quantum and Radiological Science and Technology, Tokai, Ibaraki, Japan
| | - Ragul Gowthaman
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, Maryland, USA
| | - Johnathan D Guest
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, Maryland, USA
| | - Rui Yin
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, Maryland, USA
| | - Ghazaleh Taherzadeh
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, Maryland, USA
| | - Brian G Pierce
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, Maryland, USA
| | | | - Zhen Cao
- King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Luigi Cavallo
- King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Romina Oliva
- University of Naples "Parthenope", Napoli, Italy
| | - Yuanfei Sun
- Department of Electrical and Computer Engineering, Texas A&M University, Texas, USA
| | - Shaowen Zhu
- Department of Electrical and Computer Engineering, Texas A&M University, Texas, USA
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, Texas, USA
| | - Taeyong Park
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Hyeonuk Woo
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Jinsol Yang
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Sohee Kwon
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Jonghun Won
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Yasuomi Kiyota
- School of Pharmacy, Kitasato University, Minato-ku, Tokyo, Japan
| | | | - Yoshiki Harada
- School of Pharmacy, Kitasato University, Minato-ku, Tokyo, Japan
| | | | - Petras J Kundrotas
- Computational Biology Program and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, USA
| | - Amar Singh
- Computational Biology Program and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, USA
| | - Ilya A Vakser
- Computational Biology Program and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, USA
| | - Justas Dapkūnas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Rui Duan
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri, USA
| | - Liming Qiu
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri, USA
| | - Xianjin Xu
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri, USA
| | - Shuang Zhang
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri, USA
| | - Xiaoqin Zou
- Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri, USA
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri, USA
- Department of Physics and Astronomy, University of Missouri, Columbia, Missouri, USA
- Department of Biochemistry, University of Missouri, Columbia, Missouri, USA
| | | |
Collapse
|