1
|
Zheng W, Wuyun Q, Li Y, Liu Q, Zhou X, Peng C, Zhu Y, Freddolino L, Zhang Y. Deep-learning-based single-domain and multidomain protein structure prediction with D-I-TASSER. Nat Biotechnol 2025:10.1038/s41587-025-02654-4. [PMID: 40410405 DOI: 10.1038/s41587-025-02654-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Accepted: 03/26/2025] [Indexed: 05/25/2025]
Abstract
The dominant success of deep learning techniques on protein structure prediction has challenged the necessity and usefulness of traditional force field-based folding simulations. We proposed a hybrid approach, deep-learning-based iterative threading assembly refinement (D-I-TASSER), which constructs atomic-level protein structural models by integrating multisource deep learning potentials with iterative threading fragment assembly simulations. D-I-TASSER introduces a domain splitting and assembly protocol for the automated modeling of large multidomain protein structures. Benchmark tests and the most recent critical assessment of protein structure prediction, 15 experiments demonstrate that D-I-TASSER outperforms AlphaFold2 and AlphaFold3 on both single-domain and multidomain proteins. Large-scale folding experiments further show that D-I-TASSER could fold 81% of protein domains and 73% of full-chain sequences in the human proteome with results highly complementary to recently released models by AlphaFold2. These results highlight a new avenue to integrate deep learning with classical physics-based folding simulations for high-accuracy protein structure and function predictions that are usable in genome-wide applications.
Collapse
Affiliation(s)
- Wei Zheng
- NITFID, School of Statistics and Data Science, AAIS, LPMC and KLMDASR, Nankai University, Tianjin, China
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Yang Li
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
| | - Quancheng Liu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Chunxiang Peng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Yiheng Zhu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Lydia Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA.
| | - Yang Zhang
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore.
- Department of Computer Science, School of Computing, National University of Singapore, Singapore, Singapore.
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
2
|
Jones MS, Khanna S, Ferguson AL. FlowBack: A Generalized Flow-Matching Approach for Biomolecular Backmapping. J Chem Inf Model 2025; 65:672-692. [PMID: 39772562 DOI: 10.1021/acs.jcim.4c02046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2025]
Abstract
Coarse-grained models have become ubiquitous in biomolecular modeling tasks aimed at studying slow dynamical processes such as protein folding and DNA hybridization. These models can considerably accelerate sampling but it remains challenging to accurately and efficiently restore all-atom detail to the coarse-grained trajectory, which can be vital for detailed understanding of molecular mechanisms and calculation of observables contingent on all-atom coordinates. In this work, we introduce FlowBack as a deep generative model employing a flow-matching objective to map samples from a coarse-grained prior distribution to an all-atom data distribution. We construct our prior distribution to be agnostic to the coarse-grained map and molecular type. A protein-specific model trained on ∼65k structures from the Protein Data Bank achieves state-of-the-art performance on structural metrics compared to previous generative and rules-based approaches in applications to static PDB structures, all-atom simulations of fast-folding proteins, and coarse-grained trajectories generated by a machine-learned force field. A DNA-protein model trained on ∼1.5k DNA-protein complexes achieves excellent reconstruction and generative capabilities on static DNA-protein complexes from the Protein Data Bank as well as on out-of-distribution coarse-grained dynamical simulations of DNA-protein complexation. FlowBack offers an accurate, efficient, and easy-to-use tool to recover all-atom structures from coarse-grained molecular simulations with higher robustness and fewer steric clashes than previous approaches. We make FlowBack freely available to the community as an open source Python package.
Collapse
Affiliation(s)
- Michael S Jones
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Smayan Khanna
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
- Department of Chemistry, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
3
|
Basu S, Kurgan L. Taxonomy-specific assessment of intrinsic disorder predictions at residue and region levels in higher eukaryotes, protists, archaea, bacteria and viruses. Comput Struct Biotechnol J 2024; 23:1968-1977. [PMID: 38765610 PMCID: PMC11098722 DOI: 10.1016/j.csbj.2024.04.059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 04/23/2024] [Accepted: 04/24/2024] [Indexed: 05/22/2024] Open
Abstract
Intrinsic disorder predictors were evaluated in several studies including the two large CAID experiments. However, these studies are biased towards eukaryotic proteins and focus primarily on the residue-level predictions. We provide first-of-its-kind assessment that comprehensively covers the taxonomy and evaluates predictions at the residue and disordered region levels. We curate a benchmark dataset that uniformly covers eukaryotic, archaeal, bacterial, and viral proteins. We find that predictive performance differs substantially across taxonomy, where viruses are predicted most accurately, followed by protists and higher eukaryotes, while bacterial and archaeal proteins suffer lower levels of accuracy. These trends are consistent across predictors. We also find that current tools, except for flDPnn, struggle with reproducing native distributions of the numbers and sizes of the disordered regions. Moreover, analysis of two variants of disorder predictions derived from the AlphaFold2 predicted structures reveals that they produce accurate residue-level propensities for archaea, bacteria and protists. However, they underperform for higher eukaryotes and generally struggle to accurately identify disordered regions. Our results motivate development of new predictors that target bacteria and archaea and which produce accurate results at both residue and region levels. We also stress the need to include the region-level assessments in future assessments.
Collapse
Affiliation(s)
- Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
4
|
Ryan-Phillips F, Henehan L, Ramdas S, Palace J, Beeson D, Dong YY. Assessing the Utility of ColabFold and AlphaMissense in Determining Missense Variant Pathogenicity for Congenital Myasthenic Syndromes. Biomedicines 2024; 12:2549. [PMID: 39595115 PMCID: PMC11592069 DOI: 10.3390/biomedicines12112549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Revised: 10/23/2024] [Accepted: 10/25/2024] [Indexed: 11/28/2024] Open
Abstract
BACKGROUND/OBJECTIVES Congenital myasthenic syndromes (CMSs) are caused by variants in >30 genes with increasing numbers of variants of unknown significance (VUS) discovered by next-generation sequencing. Establishing VUS pathogenicity requires in vitro studies that slow diagnosis and treatment initiation. The recently developed protein structure prediction software AlphaFold2/ColabFold has revolutionized structural biology; such predictions have also been leveraged in AlphaMissense, which predicts ClinVar variant pathogenicity with 90% accuracy. Few reports, however, have tested these tools on rigorously characterized clinical data. We therefore assessed ColabFold and AlphaMissense as diagnostic aids for CMSs, using variants of the CHRN genes that encode the nicotinic acetylcholine receptor (nAChR). METHODS Utilizing a dataset of 61 clinically validated CHRN variants, (1) we evaluated the possibility of a ColabFold metric (either predicted structural disruption, prediction confidence, or prediction quality) that distinguishes variant pathogenicity; (2) we assessed AlphaMissense's ability to differentiate variant pathogenicity; and (3) we compared AlphaMissense to the existing pathogenicity prediction programs AlamutVP and EVE. RESULTS Analyzing the variant effects on ColabFold CHRN structure prediction, prediction confidence, and prediction quality did not yield any reliable pathogenicity indicative metric. However, AlphaMissense predicted variant pathogenicity with 63.93% accuracy in our dataset-a much greater proportion than AlamutVP (27.87%) and EVE (28.33%). CONCLUSIONS Emerging in silico tools can revolutionize genetic disease diagnosis-however, improvement, refinement, and clinical validation are imperative prior to practical acquisition.
Collapse
Affiliation(s)
- Finlay Ryan-Phillips
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford OX3 9DS, UK
| | - Leighann Henehan
- Neurology Department, John Radcliffe Hospital, Oxford OX3 9DU, UK
| | - Sithara Ramdas
- Department of Paediatric Neurology, John Radcliffe Hospital, Oxford OX3 9DU, UK;
- MDUK Neuromuscular Centre, Department of Paediatrics, University of Oxford, Oxford OX3 9DU, UK
| | - Jacqueline Palace
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford OX3 9DS, UK
- Neurology Department, John Radcliffe Hospital, Oxford OX3 9DU, UK
| | - David Beeson
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford OX3 9DS, UK
| | - Yin Yao Dong
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford OX3 9DS, UK
| |
Collapse
|
5
|
Sui J, Chen J, Chen Y, Iwamori N, Sun J. GASIDN: identification of sub-Golgi proteins with multi-scale feature fusion. BMC Genomics 2024; 25:1019. [PMID: 39478465 PMCID: PMC11526662 DOI: 10.1186/s12864-024-10954-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Accepted: 10/24/2024] [Indexed: 11/02/2024] Open
Abstract
The Golgi apparatus is a crucial component of the inner membrane system in eukaryotic cells, playing a central role in protein biosynthesis. Dysfunction of the Golgi apparatus has been linked to neurodegenerative diseases. Accurate identification of sub-Golgi protein types is therefore essential for developing effective treatments for such diseases. Due to the expensive and time-consuming nature of experimental methods for identifying sub-Golgi protein types, various computational methods have been developed as identification tools. However, the majority of these methods rely solely on neighboring features in the protein sequence and neglect the crucial spatial structure information of the protein.To discover alternative methods for accurately identifying sub-Golgi proteins, we have developed a model called GASIDN. The GASIDN model extracts multi-dimension features by utilizing a 1D convolution module on protein sequences and a graph learning module on contact maps constructed from AlphaFold2.The model utilizes the deep representation learning model SeqVec to initialize protein sequences. GASIDN achieved accuracy values of 98.4% and 96.4% in independent testing and ten-fold cross-validation, respectively, outperforming the majority of previous predictors. To the best of our knowledge, this is the first method that utilizes multi-scale feature fusion to identify and locate sub-Golgi proteins. In order to assess the generalizability and scalability of our model, we conducted experiments to apply it in the identification of proteins from other organelles, including plant vacuoles and peroxisomes. The results obtained from these experiments demonstrated promising outcomes, indicating the effectiveness and versatility of our model. The source code and datasets can be accessed at https://github.com/SJNNNN/GASIDN .
Collapse
Affiliation(s)
- Jianan Sui
- School of Information Science and Engineering, University of Jinan, Jinan, China
| | - Jiazi Chen
- Laboratory of Zoology, Graduate School of Bioresource and Bioenvironmental Sciences, Kyushu University, Fukuoka-shi, Fukuoka, Japan
| | - Yuehui Chen
- School of Artificial Intelligence Institute and Information Science and Engineering, University of Jinan, Jinan, China.
| | - Naoki Iwamori
- Laboratory of Zoology, Graduate School of Bioresource and Bioenvironmental Sciences, Kyushu University, Fukuoka-shi, Fukuoka, Japan
| | - Jin Sun
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
| |
Collapse
|
6
|
De Salis SKF, Chen JZ, Skarratt KK, Fuller SJ, Balle T. Deep learning structural insights into heterotrimeric alternatively spliced P2X7 receptors. Purinergic Signal 2024; 20:431-447. [PMID: 38032425 PMCID: PMC11928719 DOI: 10.1007/s11302-023-09978-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 10/31/2023] [Indexed: 12/01/2023] Open
Abstract
P2X7 receptors (P2X7Rs) are membrane-bound ATP-gated ion channels that are composed of three subunits. Different subunit structures may be expressed due to alternative splicing of the P2RX7 gene, altering the receptor's function when combined with the wild-type P2X7A subunits. In this study, the application of the deep-learning method, AlphaFold2-Multimer (AF2M), for the generation of trimeric P2X7Rs was validated by comparing an AF2M-generated rat wild-type P2X7A receptor with a structure determined by cryogenic electron microscopy (cryo-EM) (Protein Data Bank Identification: 6U9V). The results suggested AF2M could firstly, accurately predict the structures of P2X7Rs and secondly, accurately identify the highest quality model through the ranking system. Subsequently, AF2M was used to generate models of heterotrimeric alternatively spliced P2X7Rs consisting of one or two wild-type P2X7A subunits in combination with one or two P2X7B, P2X7E, P2X7J, and P2X7L splice variant subunits. The top-ranking models were deemed valid based on AF2M's confidence measures, stability in molecular dynamics simulations, and consistent flexibility of the conserved regions between the models. The structure of the heterotrimeric receptors, which were missing key residues in the ATP binding sites and carboxyl terminal domains (CTDs) compared to the wild-type receptor, help to explain their observed functions. Overall, the models produced in this study (available as supplementary material) unlock the possibility of structure-based studies into the heterotrimeric P2X7Rs.
Collapse
Affiliation(s)
- Sophie K F De Salis
- Brain and Mind Centre, The University of Sydney, Camperdown, NSW, 2050, Australia
- Sydney Pharmacy School, The University of Sydney, Camperdown, NSW, 2050, Australia
| | - Jake Zheng Chen
- Brain and Mind Centre, The University of Sydney, Camperdown, NSW, 2050, Australia
- Sydney Pharmacy School, The University of Sydney, Camperdown, NSW, 2050, Australia
| | - Kristen K Skarratt
- The University of Sydney, Nepean Clinical School, Kingswood, NSW, 2747, Australia
| | - Stephen J Fuller
- The University of Sydney, Nepean Clinical School, Kingswood, NSW, 2747, Australia
| | - Thomas Balle
- Brain and Mind Centre, The University of Sydney, Camperdown, NSW, 2050, Australia.
- Sydney Pharmacy School, The University of Sydney, Camperdown, NSW, 2050, Australia.
| |
Collapse
|
7
|
Abbas MKG, Rassam A, Karamshahi F, Abunora R, Abouseada M. The Role of AI in Drug Discovery. Chembiochem 2024; 25:e202300816. [PMID: 38735845 DOI: 10.1002/cbic.202300816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 05/09/2024] [Accepted: 05/10/2024] [Indexed: 05/14/2024]
Abstract
The emergence of Artificial Intelligence (AI) in drug discovery marks a pivotal shift in pharmaceutical research, blending sophisticated computational techniques with conventional scientific exploration to break through enduring obstacles. This review paper elucidates the multifaceted applications of AI across various stages of drug development, highlighting significant advancements and methodologies. It delves into AI's instrumental role in drug design, polypharmacology, chemical synthesis, drug repurposing, and the prediction of drug properties such as toxicity, bioactivity, and physicochemical characteristics. Despite AI's promising advancements, the paper also addresses the challenges and limitations encountered in the field, including data quality, generalizability, computational demands, and ethical considerations. By offering a comprehensive overview of AI's role in drug discovery, this paper underscores the technology's potential to significantly enhance drug development, while also acknowledging the hurdles that must be overcome to fully realize its benefits.
Collapse
Affiliation(s)
- M K G Abbas
- Center for Advanced Materials, Qatar University, P.O. Box, 2713, Doha, Qatar
| | - Abrar Rassam
- Secondary Education, Educational Sciences, Qatar University, P.O. Box, 2713, Doha, Qatar
| | - Fatima Karamshahi
- Department of Chemistry and Earth Sciences, Qatar University, P.O. Box, 2713, Doha, Qatar
| | - Rehab Abunora
- Faculty of Medicine, General Medicine and Surgery, Helwan University, Cairo, Egypt
| | - Maha Abouseada
- Department of Chemistry and Earth Sciences, Qatar University, P.O. Box, 2713, Doha, Qatar
| |
Collapse
|
8
|
Xiao N, Yang W, Wang J, Li J, Zhao R, Li M, Li C, Liu K, Li Y, Yin C, Chen Z, Li X, Jiang Y. Protein structuromics: A new method for protein structure-function crosstalk in glioma. Proteins 2024; 92:24-36. [PMID: 37497743 DOI: 10.1002/prot.26555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Revised: 06/16/2023] [Accepted: 07/04/2023] [Indexed: 07/28/2023]
Abstract
Glioma is a type of tumor that starts in the glial cells of the brain or spine. Since the 1800s, when the disease was first named, its survival rates have always been unsatisfactory. Despite great advances in molecular biology and traditional treatment methods, many questions regarding cancer occurrence and the underlying mechanism remain to be answered. In this study, we assessed the protein structural features of 20 oncogenes and 20 anti-oncogenes via protein structure and dynamic analysis methods and 3D structural and systematic analyses of the structure-function relationships of proteins. All of these results directly indicate that unfavorable group proteins show more complex structures than favorable group proteins. As the tumor cell microenvironment changes, the balance of oncogene-related and anti-oncogene-related proteins is disrupted, and most of the structures of the two groups of proteins will be disrupted. However, more unfavorable group proteins will maintain and refold to achieve their correct shape faster and perform their functions more quickly than favorable group proteins, and the former thus support cancer development. We hope that these analyses will help promote mechanistic research and the development of new treatments for glioma.
Collapse
Affiliation(s)
- Nan Xiao
- Department of Medical Science, Medical College of Jinzhou Medical University, Jinzhou, Liaoning, China
| | - Wenming Yang
- Department of Neurosurgery, The First Affiliated Hospital of Jinzhou Medical University, Jinzhou, Liaoning, China
| | - Jin Wang
- Department of Rehabilitation, Medical College of Jinzhou Medical University, Jinzhou, Liaoning, China
| | - Jiarong Li
- Department of Rehabilitation, Medical College of Jinzhou Medical University, Jinzhou, Liaoning, China
| | - Ruoxuan Zhao
- Department of Medical Science, Medical College of Jinzhou Medical University, Jinzhou, Liaoning, China
| | - Muzheng Li
- Department of Rehabilitation, Medical College of Jinzhou Medical University, Jinzhou, Liaoning, China
| | - Chi Li
- Department of Anesthesiology, Medical College of Jinzhou Medical University, Jinzhou, Liaoning, China
| | - Kang Liu
- Department of Medical Science, Medical College of Jinzhou Medical University, Jinzhou, Liaoning, China
| | - Yingxin Li
- Department of Medical Science, Medical College of Jinzhou Medical University, Jinzhou, Liaoning, China
| | - Chaoqun Yin
- Department of Medical Science, Medical College of Jinzhou Medical University, Jinzhou, Liaoning, China
| | - Zhibo Chen
- Department of Medical Science, Medical College of Jinzhou Medical University, Jinzhou, Liaoning, China
| | - Xingqi Li
- Department of Medicine, Medical College of Jinzhou Medical University, Jinzhou, Liaoning, China
| | - Yun Jiang
- Department of Medical Science, Medical College of Jinzhou Medical University, Jinzhou, Liaoning, China
| |
Collapse
|
9
|
Zheng W, Wuyun Q, Freddolino PL, Zhang Y. Integrating deep learning, threading alignments, and a multi-MSA strategy for high-quality protein monomer and complex structure prediction in CASP15. Proteins 2023; 91:1684-1703. [PMID: 37650367 PMCID: PMC10840719 DOI: 10.1002/prot.26585] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 08/04/2023] [Accepted: 08/14/2023] [Indexed: 09/01/2023]
Abstract
We report the results of the "UM-TBM" and "Zheng" groups in CASP15 for protein monomer and complex structure prediction. These prediction sets were obtained using the D-I-TASSER and DMFold-Multimer algorithms, respectively. For monomer structure prediction, D-I-TASSER introduced four new features during CASP15: (i) a multiple sequence alignment (MSA) generation protocol that combines multi-source MSA searching and a structural modeling-based MSA ranker; (ii) attention-network based spatial restraints; (iii) a multi-domain module containing domain partition and arrangement for domain-level templates and spatial restraints; (iv) an optimized I-TASSER-based folding simulation system for full-length model creation guided by a combination of deep learning restraints, threading alignments, and knowledge-based potentials. For 47 free modeling targets in CASP15, the final models predicted by D-I-TASSER showed average TM-score 19% higher than the standard AlphaFold2 program. We thus showed that traditional Monte Carlo-based folding simulations, when appropriately coupled with deep learning algorithms, can generate models with improved accuracy over end-to-end deep learning methods alone. For protein complex structure prediction, DMFold-Multimer generated models by integrating a new MSA generation algorithm (DeepMSA2) with the end-to-end modeling module from AlphaFold2-Multimer. For the 38 complex targets, DMFold-Multimer generated models with an average TM-score of 0.83 and Interface Contact Score of 0.60, both significantly higher than those of competing complex prediction tools. Our analyses on complexes highlighted the critical role played by MSA generating, ranking, and pairing in protein complex structure prediction. We also discuss future room for improvement in the areas of viral protein modeling and complex model ranking.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Peter L Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Computer Science, School of Computing, National University of Singapore, 117417 Singapore
- Cancer Science Institute of Singapore, National University of Singapore, 117599, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 117596, Singapore
| |
Collapse
|
10
|
Jones MS, Shmilovich K, Ferguson AL. DiAMoNDBack: Diffusion-Denoising Autoregressive Model for Non-Deterministic Backmapping of Cα Protein Traces. J Chem Theory Comput 2023; 19:7908-7923. [PMID: 37906711 DOI: 10.1021/acs.jctc.3c00840] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Coarse-grained molecular models of proteins permit access to length and time scales unattainable by all-atom models and the simulation of processes that occur on long time scales, such as aggregation and folding. The reduced resolution realizes computational accelerations, but an atomistic representation can be vital for a complete understanding of mechanistic details. Backmapping is the process of restoring all-atom resolution to coarse-grained molecular models. In this work, we report DiAMoNDBack (Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping) as an autoregressive denoising diffusion probability model to restore all-atom details to coarse-grained protein representations retaining only Cα coordinates. The autoregressive generation process proceeds from the protein N-terminus to C-terminus in a residue-by-residue fashion conditioned on the Cα trace and previously backmapped backbone and side-chain atoms within the local neighborhood. The local and autoregressive nature of our model makes it transferable between proteins. The stochastic nature of the denoising diffusion process means that the model generates a realistic ensemble of backbone and side-chain all-atom configurations consistent with the coarse-grained Cα trace. We train DiAMoNDBack over 65k+ structures from the Protein Data Bank (PDB) and validate it in applications to a hold-out PDB test set, intrinsically disordered protein structures from the Protein Ensemble Database (PED), molecular dynamics simulations of fast-folding mini-proteins from DE Shaw Research, and coarse-grained simulation data. We achieve state-of-the-art reconstruction performance in terms of correct bond formation, avoidance of side-chain clashes, and the diversity of the generated side-chain configurational states. We make the DiAMoNDBack model publicly available as a free and open-source Python package.
Collapse
Affiliation(s)
- Michael S Jones
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Kirill Shmilovich
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
11
|
Sui J, Chen J, Chen Y, Iwamori N, Sun J. Identification of plant vacuole proteins by using graph neural network and contact maps. BMC Bioinformatics 2023; 24:357. [PMID: 37740195 PMCID: PMC10517492 DOI: 10.1186/s12859-023-05475-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Accepted: 09/12/2023] [Indexed: 09/24/2023] Open
Abstract
Plant vacuoles are essential organelles in the growth and development of plants, and accurate identification of their proteins is crucial for understanding their biological properties. In this study, we developed a novel model called GraphIdn for the identification of plant vacuole proteins. The model uses SeqVec, a deep representation learning model, to initialize the amino acid sequence. We utilized the AlphaFold2 algorithm to obtain the structural information of corresponding plant vacuole proteins, and then fed the calculated contact maps into a graph convolutional neural network. GraphIdn achieved accuracy values of 88.51% and 89.93% in independent testing and fivefold cross-validation, respectively, outperforming previous state-of-the-art predictors. As far as we know, this is the first model to use predicted protein topology structure graphs to identify plant vacuole proteins. Furthermore, we assessed the effectiveness and generalization capability of our GraphIdn model by applying it to identify and locate peroxisomal proteins, which yielded promising outcomes. The source code and datasets can be accessed at https://github.com/SJNNNN/GraphIdn .
Collapse
Affiliation(s)
- Jianan Sui
- School of Information Science and Engineering, University of Jinan, Jinan, China
| | - Jiazi Chen
- Laboratory of Zoology, Graduate School of Bioresource and Bioenvironmental Sciences, Kyushu University, Fukuoka-Shi, Fukuoka, Japan
| | - Yuehui Chen
- School of Artificial Intelligence Institute and Information Science and Engineering, University of Jinan, Jinan, China.
| | - Naoki Iwamori
- Laboratory of Zoology, Graduate School of Bioresource and Bioenvironmental Sciences, Kyushu University, Fukuoka-Shi, Fukuoka, Japan
| | - Jin Sun
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
| |
Collapse
|
12
|
Zaman AB, Inan TT, De Jong K, Shehu A. Adaptive Stochastic Optimization to Improve Protein Conformation Sampling. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2759-2771. [PMID: 34882562 DOI: 10.1109/tcbb.2021.3134103] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We have long known that characterizing protein structures structure is key to understanding protein function. Computational approaches have largely addressed a narrow formulation of the problem, seeking to compute one native structure from an amino-acid sequence. Now AlphaFold2 is shown to be able to reveal a high-quality native structure for many proteins. However, researchers over the years have argued for broadening our view to account for the multiplicity of native structures. We now know that many protein molecules switch between different structures to regulate interactions with molecular partners in the cell. Elucidating such structures de novo is exceptionally difficult, as it requires exploration of possibly a very large structure space in search of competing, near-optimal structures. Here we report on a novel stochastic optimization method capable of revealing very different structures for a given protein from knowledge of its amino-acid sequence. The method leverages evolutionary search techniques and adapts its exploration of the search space to balance between exploration and exploitation in the presence of a computational budget. In addition to demonstrating the utility of this method for identifying multiple native structures, we additionally provide a benchmark dataset for researchers to continue work on this problem.
Collapse
|
13
|
Kiran A, Altaf A, Sarwar M, Malik A, Maqbool T, Ali Q. Phytochemical profiling and cytotoxic potential of Arnebia nobilis root extracts against hepatocellular carcinoma using in-vitro and in-silico approaches. Sci Rep 2023; 13:11376. [PMID: 37452082 PMCID: PMC10349071 DOI: 10.1038/s41598-023-38517-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 07/10/2023] [Indexed: 07/18/2023] Open
Abstract
Hepatocellular carcinoma is the fifth most prevalent cancer worldwide. The emergence of drug resistance and other adverse effects in available anticancer options are challenging to explore natural sources. The current study was designed to decipher the Arnebia nobilis (A. nobilis) extracts for detecting phytochemicals, in-vitro evaluation of antioxidative and cytotoxic potentials, and in-silico prediction of potent anticancer compounds. The phytochemical analysis revealed the presence of flavonoids, phenols, tannins, alkaloids, quinones, and cardiac glycosides, in the ethanol (ANE) and n-hexane (ANH) extracts of A. nobilis. ANH extract exhibited a better antioxidant potential to scavenge DPPH, nitric oxide and superoxide anion radicals than ANE extract, which showed better potential only against H2O2 radicals. In 24 h treatment, ANH extract revealed higher cytotoxicity (IC50 value: 22.77 µg/mL) than ANH extract (IC50 value: 46.74 µg/mL) on cancer (HepG2) cells without intoxicating the normal (BHK) cells using MTT assay. A better apoptotic potential was observed in ANH extract (49.10%) compared to ANE extract (41.35%) on HepG2 cells using the annexin V/PI method. GCMS analysis of ANH extract identified 35 phytocompounds, from which only 14 bioactive compounds were selected for molecular docking based on druggability criteria and toxicity filters. Among the five top scorers, deoxyshikonin exhibited the best binding affinities of - 7.2, - 9.2, - 7.2 and - 9.2 kcal/mol against TNF-α, TGF-βR1, Bcl-2 and iNOS, respectively, followed by ethyl cholate and 2-Methyl-6-(4-methylphenyl)hept-2-en-4-one along with their desirable ADMET properties. The phytochemicals of ANH extract could be used as a promising drug candidate for liver cancer after further validations.
Collapse
Affiliation(s)
- Asia Kiran
- Institute of Molecular Biology and Biotechnology, The University of Lahore, Lahore, 54300, Pakistan
| | - Awais Altaf
- Institute of Molecular Biology and Biotechnology, The University of Lahore, Lahore, 54300, Pakistan.
| | - Muhammad Sarwar
- Institute of Molecular Biology and Biotechnology, The University of Lahore, Lahore, 54300, Pakistan
| | - Arif Malik
- Institute of Molecular Biology and Biotechnology, The University of Lahore, Lahore, 54300, Pakistan
| | - Tahir Maqbool
- Institute of Molecular Biology and Biotechnology, The University of Lahore, Lahore, 54300, Pakistan
| | - Qurban Ali
- Department of Plant Breeding and Genetics, Faculty of Agricultural Sciences, University of the Punjab, Lahore, Pakistan.
| |
Collapse
|
14
|
Maghsoud Y, Dong C, Cisneros GA. Investigation of the Inhibition Mechanism of Xanthine Oxidoreductase by Oxipurinol: A Computational Study. J Chem Inf Model 2023; 63:4190-4206. [PMID: 37319436 PMCID: PMC10405278 DOI: 10.1021/acs.jcim.3c00624] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Xanthine oxidoreductase (XOR) is an enzyme found in various organisms. It converts hypoxanthine to xanthine and urate, which are crucial steps in purine elimination in humans. Elevated uric acid levels can lead to conditions like gout and hyperuricemia. Therefore, there is significant interest in developing drugs that target XOR for treating these conditions and other diseases. Oxipurinol, an analogue of xanthine, is a well-known inhibitor of XOR. Crystallographic studies have revealed that oxipurinol directly binds to the molybdenum cofactor (MoCo) in XOR. However, the precise details of the inhibition mechanism are still unclear, which would be valuable for designing more effective drugs with similar inhibitory functions. In this study, molecular dynamics and quantum mechanics/molecular mechanics calculations are employed to investigate the inhibition mechanism of XOR by oxipurinol. The study examines the structural and dynamic effects of oxipurinol on the pre-catalytic structure of the metabolite-bound system. Our results provide insights on the reaction mechanism catalyzed by the MoCo center in the active site, which aligns well with experimental findings. Furthermore, the results provide insights into the residues surrounding the active site and propose an alternative mechanism for developing alternative covalent inhibitors.
Collapse
Affiliation(s)
- Yazdan Maghsoud
- Department of Chemistry and Biochemistry, The University of Texas at Dallas, Richardson, Texas 75080, United States
| | - Chao Dong
- Department of Chemistry and Physics, The University of Texas Permian Basin, Odessa, Texas 79762, United States
| | - G Andrés Cisneros
- Department of Chemistry and Biochemistry, The University of Texas at Dallas, Richardson, Texas 75080, United States
- Department of Physics, The University of Texas at Dallas, Richardson, Texas 75080, United States
| |
Collapse
|
15
|
Qureshi R, Irfan M, Gondal TM, Khan S, Wu J, Hadi MU, Heymach J, Le X, Yan H, Alam T. AI in drug discovery and its clinical relevance. Heliyon 2023; 9:e17575. [PMID: 37396052 PMCID: PMC10302550 DOI: 10.1016/j.heliyon.2023.e17575] [Citation(s) in RCA: 78] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 06/17/2023] [Accepted: 06/21/2023] [Indexed: 07/04/2023] Open
Abstract
The COVID-19 pandemic has emphasized the need for novel drug discovery process. However, the journey from conceptualizing a drug to its eventual implementation in clinical settings is a long, complex, and expensive process, with many potential points of failure. Over the past decade, a vast growth in medical information has coincided with advances in computational hardware (cloud computing, GPUs, and TPUs) and the rise of deep learning. Medical data generated from large molecular screening profiles, personal health or pathology records, and public health organizations could benefit from analysis by Artificial Intelligence (AI) approaches to speed up and prevent failures in the drug discovery pipeline. We present applications of AI at various stages of drug discovery pipelines, including the inherently computational approaches of de novo design and prediction of a drug's likely properties. Open-source databases and AI-based software tools that facilitate drug design are discussed along with their associated problems of molecule representation, data collection, complexity, labeling, and disparities among labels. How contemporary AI methods, such as graph neural networks, reinforcement learning, and generated models, along with structure-based methods, (i.e., molecular dynamics simulations and molecular docking) can contribute to drug discovery applications and analysis of drug responses is also explored. Finally, recent developments and investments in AI-based start-up companies for biotechnology, drug design and their current progress, hopes and promotions are discussed in this article.
Collapse
Affiliation(s)
- Rizwan Qureshi
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
- Department of Imaging Physics, MD Anderson Cancer Center, The University of Texas, Houston, USA
| | - Muhammad Irfan
- Faculty of Electrical Engineering, Ghulam Ishaq Khan Institute of Engineering Sciences and Technology, Swabi, Pakistan
| | | | - Sheheryar Khan
- School of Professional Education & Executive Development, The Hong Kong Polytechnic University, Hong Kong
| | - Jia Wu
- Department of Imaging Physics, MD Anderson Cancer Center, The University of Texas, Houston, USA
| | | | - John Heymach
- Department of Thoracic Head and Neck Medical Oncology, Division of Cancer Medicine, The University of Texas, MD Anderson Cancer Center, Houston, USA
| | - Xiuning Le
- Department of Thoracic Head and Neck Medical Oncology, Division of Cancer Medicine, The University of Texas, MD Anderson Cancer Center, Houston, USA
| | - Hong Yan
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong
| | - Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| |
Collapse
|
16
|
Xiao N, Ma H, Gao H, Yang J, Tong D, Gan D, Yang J, Li C, Liu K, Li Y, Chen Z, Yin C, Li X, Wang H. Structure-function crosstalk in liver cancer research: Protein structuromics. Int J Biol Macromol 2023:125291. [PMID: 37315670 DOI: 10.1016/j.ijbiomac.2023.125291] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 06/04/2023] [Accepted: 06/07/2023] [Indexed: 06/16/2023]
Abstract
Liver cancer can be primary (starting in the liver) or secondary (cancer that has spread from elsewhere to the liver, known as liver metastasis). Liver metastasis is more common than primary liver cancer. Despite great advances in molecular biology methods and treatments, liver cancer is still associated with a poor survival rate and a high death rate, and there is no cure. Many questions remain regarding the mechanisms of liver cancer occurrence and development as well as tumor reoccurrence after treatment. In this study, we assessed the protein structural features of 20 oncogenes and 20 anti-oncogenes via protein structure and dynamic analysis methods and 3D structural and systematic analyses of the structure-function relationships of proteins. Our aim was to provide new insights that may inform research on the development and treatment of liver cancer.
Collapse
Affiliation(s)
- Nan Xiao
- Department of Medical Science, Medical College of Jinzhou Medical University, Jinzhou City, Liaoning Province, China.
| | - Hongming Ma
- Department of Oncology, China Emergency General Hospital City, Beijing, China
| | - Hong Gao
- Department of Oncology, China Emergency General Hospital City, Beijing, China
| | - Jing Yang
- Department of Computer Center, Medical College of Jinzhou Medical University, Jinzhou City, Liaoning Province, China
| | - Dan Tong
- Department of Nurse, Medical College of Jinzhou Medical University, Jinzhou City, Liaoning Province, China
| | - Dingzhu Gan
- Department of Publicity, Peking Union Medical College, Beijing, China
| | - Jinhua Yang
- Department of Development and Production, Institute of Medical Biology, Peking Union Medical College, Kunming City, Yunnan Province, China
| | - Chi Li
- Department of Anesthesiology, Medical College of Jinzhou Medical University, Jinzhou City, Liaoning Province, China
| | - Kang Liu
- Department of Medical Science, Medical College of Jinzhou Medical University, Jinzhou City, Liaoning Province, China
| | - Yingxin Li
- Department of Medical Science, Medical College of Jinzhou Medical University, Jinzhou City, Liaoning Province, China
| | - Zhibo Chen
- Department of Medical Science, Medical College of Jinzhou Medical University, Jinzhou City, Liaoning Province, China
| | - Chaoqun Yin
- Department of Medical Science, Medical College of Jinzhou Medical University, Jinzhou City, Liaoning Province, China
| | - Xingqi Li
- Department of Medicine, Medical College of Jinzhou Medical University, Jinzhou City, Liaoning Province, China
| | - Hongwu Wang
- Department of Respiratory and Critical Care Medicine, Dongzhimen Hospital Affiliated to Beijing University of Chinese Medicine, Beijing, China
| |
Collapse
|
17
|
Maghsoud Y, Dong C, Cisneros GA. Computational Characterization of the Inhibition Mechanism of Xanthine Oxidoreductase by Topiroxostat. ACS Catal 2023; 13:6023-6043. [PMID: 37547543 PMCID: PMC10399974 DOI: 10.1021/acscatal.3c01245] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Xanthine oxidase (XO) is a member of the molybdopterin-containing enzyme family. It interconverts xanthine to uric acid as the last step of purine catabolism in the human body. The high uric acid concentration in the blood directly leads to human diseases like gout and hyperuricemia. Therefore, drugs that inhibit the biosynthesis of uric acid by human XO have been clinically used for many years to decrease the concentration of uric acid in the blood. In this study, the inhibition mechanism of XO and a new promising drug, topiroxostat (code: FYX-051), is investigated by employing molecular dynamics (MD) and quantum mechanics/molecular mechanics (QM/MM) calculations. This drug has been reported to act as both a noncovalent and covalent inhibitor and undergoes a stepwise inhibition by all its hydroxylated metabolites, which include 2-hydroxy-FYX-051, dihydroxy-FYX-051, and trihydroxy-FYX-051. However, the detailed mechanism of inhibition of each metabolite remains elusive and can be useful for designing more effective drugs with similar inhibition functions. Hence, herein we present the computational investigation of the structural and dynamical effects of FYX-051 and the calculated reaction mechanism for all of the oxidation steps catalyzed by the molybdopterin center in the active site. Calculated results for the proposed reaction mechanisms for each metabolite's inhibition reaction in the enzyme's active site, binding affinities, and the noncovalent interactions with the surrounding amino acid residues are consistent with previously reported experimental findings. Analysis of the noncovalent interactions via energy decomposition analysis (EDA) and noncovalent interaction (NCI) techniques suggests that residues L648, K771, E802, R839, L873, R880, R912, F914, F1009, L1014, and A1079 can be used as key interacting residues for further hybrid-type inhibitor development.
Collapse
Affiliation(s)
- Yazdan Maghsoud
- Department of Chemistry and Biochemistry, The University of Texas at Dallas, Richardson, Texas 75080, United States
| | - Chao Dong
- Department of Chemistry and Physics, The University of Texas Permian Basin, Odessa, Texas 79762, United States
| | - G Andrés Cisneros
- Department of Chemistry and Biochemistry, The University of Texas at Dallas, Richardson, Texas 75080, United States; Department of Physics, The University of Texas at Dallas, Richardson, Texas 75080, United States
| |
Collapse
|
18
|
Li S, Yuan L, Ma Y, Liu Y. WG-ICRN: Protein 8-state secondary structure prediction based on Wasserstein generative adversarial networks and residual networks with Inception modules. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:7721-7737. [PMID: 37161169 DOI: 10.3934/mbe.2023333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Protein secondary structure is the basis of studying the tertiary structure of proteins, drug design and development, and the 8-state protein secondary structure can provide more adequate protein information than the 3-state structure. Therefore, this paper proposes a novel method WG-ICRN for predicting protein 8-state secondary structures. First, we use the Wasserstein generative adversarial network (WGAN) to extract protein features in the position-specific scoring matrix (PSSM). The extracted features are combined with PSSM into a new feature set of WG-data, which contains richer feature information. Then, we use the residual network (ICRN) with Inception to further extract the features in WG-data and complete the prediction. Compared with the residual network, ICRN can reduce parameter calculations and increase the width of feature extraction to obtain more feature information. We evaluated the prediction performance of the model using six datasets. The experimental results show that the WGAN has excellent feature extraction capabilities, and ICRN can further improve network performance and improve prediction accuracy. Compared with four popular models, WG-ICRN achieves better prediction performance.
Collapse
Affiliation(s)
- Shun Li
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
| | - Lu Yuan
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
| | - Yuming Ma
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
| | - Yihui Liu
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
| |
Collapse
|
19
|
Yuan L, Ma Y, Liu Y. Ensemble deep learning models for protein secondary structure prediction using bidirectional temporal convolution and bidirectional long short-term memory. Front Bioeng Biotechnol 2023; 11:1051268. [PMID: 36860882 PMCID: PMC9968878 DOI: 10.3389/fbioe.2023.1051268] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 02/03/2023] [Indexed: 02/16/2023] Open
Abstract
Protein secondary structure prediction (PSSP) is a challenging task in computational biology. However, existing models with deep architectures are not sufficient and comprehensive for deep long-range feature extraction of long sequences. This paper proposes a novel deep learning model to improve Protein secondary structure prediction. In the model, our proposed bidirectional temporal convolutional network (BTCN) can extract the bidirectional deep local dependencies in protein sequences segmented by the sliding window technique, the bidirectional long short-term memory (BLSTM) network can extract the global interactions between residues, and our proposed multi-scale bidirectional temporal convolutional network (MSBTCN) can further capture the bidirectional multi-scale long-range features of residues while preserving the hidden layer information more comprehensively. In particular, we also propose that fusing the features of 3-state and 8-state Protein secondary structure prediction can further improve the prediction accuracy. Moreover, we also propose and compare multiple novel deep models by combining bidirectional long short-term memory with temporal convolutional network (TCN), reverse temporal convolutional network (RTCN), multi-scale temporal convolutional network (multi-scale bidirectional temporal convolutional network), bidirectional temporal convolutional network and multi-scale bidirectional temporal convolutional network, respectively. Furthermore, we demonstrate that the reverse prediction of secondary structure outperforms the forward prediction, suggesting that amino acids at later positions have a greater impact on secondary structure recognition. Experimental results on benchmark datasets including CASP10, CASP11, CASP12, CASP13, CASP14, and CB513 show that our methods achieve better prediction performance compared to five state-of-the-art methods.
Collapse
Affiliation(s)
- Lu Yuan
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
| | - Yuming Ma
- *Correspondence: Yuming Ma, ; Yihui Liu,
| | - Yihui Liu
- *Correspondence: Yuming Ma, ; Yihui Liu,
| |
Collapse
|
20
|
Sykes J, Holland BR, Charleston MA. A review of visualisations of protein fold networks and their relationship with sequence and function. Biol Rev Camb Philos Soc 2023; 98:243-262. [PMID: 36210328 PMCID: PMC10092621 DOI: 10.1111/brv.12905] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 09/08/2022] [Accepted: 09/09/2022] [Indexed: 01/12/2023]
Abstract
Proteins form arguably the most significant link between genotype and phenotype. Understanding the relationship between protein sequence and structure, and applying this knowledge to predict function, is difficult. One way to investigate these relationships is by considering the space of protein folds and how one might move from fold to fold through similarity, or potential evolutionary relationships. The many individual characterisations of fold space presented in the literature can tell us a lot about how well the current Protein Data Bank represents protein fold space, how convergence and divergence may affect protein evolution, how proteins affect the whole of which they are part, and how proteins themselves function. A synthesis of these different approaches and viewpoints seems the most likely way to further our knowledge of protein structure evolution and thus, facilitate improved protein structure design and prediction.
Collapse
Affiliation(s)
- Janan Sykes
- School of Natural Sciences, University of Tasmania, Private Bag 37, Hobart, Tasmania, 7001, Australia
| | - Barbara R Holland
- School of Natural Sciences, University of Tasmania, Private Bag 37, Hobart, Tasmania, 7001, Australia
| | - Michael A Charleston
- School of Natural Sciences, University of Tasmania, Private Bag 37, Hobart, Tasmania, 7001, Australia
| |
Collapse
|
21
|
Bhattacharya S, Roche R, Shuvo MH, Moussad B, Bhattacharya D. Contact-Assisted Threading in Low-Homology Protein Modeling. Methods Mol Biol 2023; 2627:41-59. [PMID: 36959441 DOI: 10.1007/978-1-0716-2974-1_3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
The ability to successfully predict the three-dimensional structure of a protein from its amino acid sequence has made considerable progress in the recent past. The progress is propelled by the improved accuracy of deep learning-based inter-residue contact map predictors coupled with the rising growth of protein sequence databases. Contact map encodes interatomic interaction information that can be exploited for highly accurate prediction of protein structures via contact map threading even for the query proteins that are not amenable to direct homology modeling. As such, contact-assisted threading has garnered considerable research effort. In this chapter, we provide an overview of existing contact-assisted threading methods while highlighting the recent advances and discussing some of the current limitations and future prospects in the application of contact-assisted threading for improving the accuracy of low-homology protein modeling.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| | | | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | | |
Collapse
|
22
|
Yuan L, Ma Y, Liu Y. Protein secondary structure prediction based on Wasserstein generative adversarial networks and temporal convolutional networks with convolutional block attention modules. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:2203-2218. [PMID: 36899529 DOI: 10.3934/mbe.2023102] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
As an important task in bioinformatics, protein secondary structure prediction (PSSP) is not only beneficial to protein function research and tertiary structure prediction, but also to promote the design and development of new drugs. However, current PSSP methods cannot sufficiently extract effective features. In this study, we propose a novel deep learning model WGACSTCN, which combines Wasserstein generative adversarial network with gradient penalty (WGAN-GP), convolutional block attention module (CBAM) and temporal convolutional network (TCN) for 3-state and 8-state PSSP. In the proposed model, the mutual game of generator and discriminator in WGAN-GP module can effectively extract protein features, and our CBAM-TCN local extraction module can capture key deep local interactions in protein sequences segmented by sliding window technique, and the CBAM-TCN long-range extraction module can further capture the key deep long-range interactions in sequences. We evaluate the performance of the proposed model on seven benchmark datasets. Experimental results show that our model exhibits better prediction performance compared to the four state-of-the-art models. The proposed model has strong feature extraction ability, which can extract important information more comprehensively.
Collapse
Affiliation(s)
- Lu Yuan
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
| | - Yuming Ma
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
| | - Yihui Liu
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
| |
Collapse
|
23
|
Bartuzi D, Kaczor AA, Matosiuk D. Illuminating the "Twilight Zone": Advances in Difficult Protein Modeling. Methods Mol Biol 2023; 2627:25-40. [PMID: 36959440 DOI: 10.1007/978-1-0716-2974-1_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
Homology modeling was long considered a method of choice in tertiary protein structure prediction. However, it used to provide models of acceptable quality only when templates with appreciable sequence identity with a target could be found. The threshold value was long assumed to be around 20-30%. Below this level, obtained sequence identity was getting dangerously close to values that can be obtained by chance, after aligning any random, unrelated sequences. In these cases, other approaches, including ab initio folding simulations or fragment assembly, were usually employed. The most recent editions of the CASP and CAMEO community-wide modeling methods assessment have brought some surprising outcomes, proving that much more clues can be inferred from protein sequence analyses than previously thought. In this chapter, we focus on recent advances in the field of difficult protein modeling, pushing the threshold deep into the "twilight zone", with particular attention devoted to improvements in applications of machine learning and model evaluation.
Collapse
Affiliation(s)
- Damian Bartuzi
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland.
| | - Agnieszka A Kaczor
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland
- University of Eastern Finland, School of Pharmacy, Kuopio, Finland
| | - Dariusz Matosiuk
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland
| |
Collapse
|
24
|
Ślusarz R, Lubecka EA, Czaplewski C, Liwo A. Improvements and new functionalities of UNRES server for coarse-grained modeling of protein structure, dynamics, and interactions. Front Mol Biosci 2022; 9:1071428. [PMID: 36589235 PMCID: PMC9794589 DOI: 10.3389/fmolb.2022.1071428] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2022] [Accepted: 11/29/2022] [Indexed: 12/15/2022] Open
Abstract
In this paper we report the improvements and extensions of the UNRES server (https://unres-server.chem.ug.edu.pl) for physics-based simulations with the coarse-grained UNRES model of polypeptide chains. The improvements include the replacement of the old code with the recently optimized one and adding the recent scale-consistent variant of the UNRES force field, which performs better in the modeling of proteins with the β and the α+β structures. The scope of applications of the package was extended to data-assisted simulations with restraints from nuclear magnetic resonance (NMR) and chemical crosslink mass-spectroscopy (XL-MS) measurements. NMR restraints can be input in the NMR Exchange Format (NEF), which has become a standard. Ambiguous NMR restraints are handled without expert intervention owing to a specially designed penalty function. The server can be used to run smaller jobs directly or to prepare input data to run larger production jobs by using standalone installations of UNRES.
Collapse
Affiliation(s)
- Rafał Ślusarz
- Faculty of Chemistry, University of Gdańsk, Fahrenheit Union of Universities in Gdańsk, Gdańsk, Poland
| | - Emilia A. Lubecka
- Faculty of Electronics, Telecommunication and Informatics, Gdańsk University of Technology, Fahrenheit Union of Universities in Gdańsk, Gdańsk, Poland
| | - Cezary Czaplewski
- Faculty of Chemistry, University of Gdańsk, Fahrenheit Union of Universities in Gdańsk, Gdańsk, Poland
| | - Adam Liwo
- Faculty of Chemistry, University of Gdańsk, Fahrenheit Union of Universities in Gdańsk, Gdańsk, Poland,*Correspondence: Adam Liwo,
| |
Collapse
|
25
|
Yuan L, Hu X, Ma Y, Liu Y. DLBLS_SS: protein secondary structure prediction using deep learning and broad learning system. RSC Adv 2022; 12:33479-33487. [PMID: 36505696 PMCID: PMC9682407 DOI: 10.1039/d2ra06433b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 11/16/2022] [Indexed: 11/24/2022] Open
Abstract
Protein secondary structure prediction (PSSP) is not only beneficial to the study of protein structure and function but also to the development of drugs. As a challenging task in computational biology, experimental methods for PSSP are time-consuming and expensive. In this paper, we propose a novel PSSP model DLBLS_SS based on deep learning and broad learning system (BLS) to predict 3-state and 8-state secondary structure. We first use a bidirectional long short-term memory (BLSTM) network to extract global features in residue sequences. Then, our proposed SEBTCN based on temporal convolutional networks (TCN) and channel attention can capture bidirectional key long-range dependencies in sequences. We also use BLS to rapidly optimize fused features while further capturing local interactions between residues. We conduct extensive experiments on public test sets including CASP10, CASP11, CASP12, CASP13, CASP14 and CB513 to evaluate the performance of the model. Experimental results show that our model exhibits better 3-state and 8-state PSSP performance compared to five state-of-the-art models.
Collapse
Affiliation(s)
- Lu Yuan
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences) Jinan 250353 China
| | - Xiaopei Hu
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences) Jinan 250353 China
| | - Yuming Ma
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences) Jinan 250353 China
| | - Yihui Liu
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences) Jinan 250353 China
| |
Collapse
|
26
|
Sinha S, Tam B, Wang SM. Applications of Molecular Dynamics Simulation in Protein Study. MEMBRANES 2022; 12:844. [PMID: 36135863 PMCID: PMC9505860 DOI: 10.3390/membranes12090844] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 08/24/2022] [Accepted: 08/25/2022] [Indexed: 05/29/2023]
Abstract
Molecular Dynamics (MD) Simulations is increasingly used as a powerful tool to study protein structure-related questions. Starting from the early simulation study on the photoisomerization in rhodopsin in 1976, MD Simulations has been used to study protein function, protein stability, protein-protein interaction, enzymatic reactions and drug-protein interactions, and membrane proteins. In this review, we provide a brief review for the history of MD Simulations application and the current status of MD Simulations applications in protein studies.
Collapse
Affiliation(s)
| | | | - San Ming Wang
- MoE Frontiers Science Center for Precision Oncology, Cancer Center and Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Macau SAR, China
| |
Collapse
|
27
|
Protein structure prediction based on particle swarm optimization and tabu search strategy. BMC Bioinformatics 2022; 23:352. [PMID: 35999491 PMCID: PMC9396775 DOI: 10.1186/s12859-022-04888-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 08/10/2022] [Indexed: 11/10/2022] Open
Abstract
Background The stability of protein sequence structure plays an important role in the prevention and treatment of diseases. Results In this paper, particle swarm optimization and tabu search are combined to propose a new method for protein structure prediction. The experimental results show that: for four groups of artificial protein sequences with different lengths, this method obtains the lowest potential energy value and stable structure prediction results, and the effect is obviously better than the other two comparison methods. Taking the first group of protein sequences as an example, our method improves the prediction of minimum potential energy by 127% and 7% respectively. Conclusions Therefore, the method proposed in this paper is more suitable for the prediction of protein structural stability.
Collapse
|
28
|
Bitton M, Keasar C. Estimation of model accuracy by a unique set of features and tree-based regressor. Sci Rep 2022; 12:14074. [PMID: 35982086 PMCID: PMC9388490 DOI: 10.1038/s41598-022-17097-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 07/20/2022] [Indexed: 11/26/2022] Open
Abstract
Computationally generated models of protein structures bridge the gap between the practically negligible price tag of sequencing and the high cost of experimental structure determination. By providing a low-cost (and often free) partial alternative to experimentally determined structures, these models help biologists design and interpret their experiments. Obviously, the more accurate the models the more useful they are. However, methods for protein structure prediction generate many structural models of various qualities, necessitating means for the estimation of their accuracy. In this work we present MESHI_consensus, a new method for the estimation of model accuracy. The method uses a tree-based regressor and a set of structural, target-based, and consensus-based features. The new method achieved high performance in the EMA (Estimation of Model Accuracy) track of the recent CASP14 community-wide experiment (https://predictioncenter.org/casp14/index.cgi). The tertiary structure prediction track of that experiment revealed an unprecedented leap in prediction performance by a single prediction group/method, namely AlphaFold2. This achievement would inevitably have a profound impact on the field of protein structure prediction, including the accuracy estimation sub-task. We conclude this manuscript with some speculations regarding the future role of accuracy estimation in a new era of accurate protein structure prediction.
Collapse
Affiliation(s)
- Mor Bitton
- Department of Computer Science, Ben Gurion University, Be'er Sheva, Israel.
| | - Chen Keasar
- Department of Computer Science, Ben Gurion University, Be'er Sheva, Israel.
| |
Collapse
|
29
|
Balkenhol J, Bencurova E, Gupta SK, Schmidt H, Heinekamp T, Brakhage A, Pottikkadavath A, Dandekar T. Prediction and validation of host-pathogen interactions by a versatile inference approach using Aspergillus fumigatus as a case study. Comput Struct Biotechnol J 2022; 20:4225-4237. [PMID: 36051885 PMCID: PMC9399266 DOI: 10.1016/j.csbj.2022.07.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Revised: 07/29/2022] [Accepted: 07/29/2022] [Indexed: 11/03/2022] Open
Abstract
Biological networks are characterized by diverse interactions and dynamics in time and space. Many regulatory modules operate in parallel and are interconnected with each other. Some pathways are functionally known and annotated accordingly, e.g., endocytosis, migration, or cytoskeletal rearrangement. However, many interactions are not so well characterized. For reconstructing the biological complexity in cellular networks, we combine here existing experimentally confirmed and analyzed interactions with a protein-interaction inference framework using as basis experimentally confirmed interactions from other organisms. Prediction scoring includes sequence similarity, evolutionary conservation of interactions, the coexistence of interactions in the same pathway, orthology as well as structure similarity to rank and compare inferred interactions. We exemplify our inference method by studying host-pathogen interactions during infection of Mus musculus (phagolysosomes in alveolar macrophages) with Aspergillus fumigatus (conidia, airborne, asexual spores). Three of nine predicted critical host-pathogen interactions could even be confirmed by direct experiments. Moreover, we suggest drugs that manipulate the host-pathogen interaction.
Collapse
Affiliation(s)
| | - Elena Bencurova
- Department of Bioinformatics, University of Würzburg, Würzburg, Germany
| | - Shishir K Gupta
- Evolutionary Genomics Group, Center for Computational and Theoretical Biology, University of Würzburg, 97078 Würzburg, Germany
| | - Hella Schmidt
- Department of Molecular and Applied Microbiology, Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI), 07745 Jena, Germany
| | - Thorsten Heinekamp
- Department of Molecular and Applied Microbiology, Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI), 07745 Jena, Germany
| | - Axel Brakhage
- Department of Molecular and Applied Microbiology, Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI), 07745 Jena, Germany
| | - Aparna Pottikkadavath
- Department of Structural Biology, Rudolf Virchow Center for Integrative and Translational Bioimaging, University of Würzburg, 97074 Würzburg, Germany
| | - Thomas Dandekar
- Department of Bioinformatics, University of Würzburg, Würzburg, Germany
| |
Collapse
|
30
|
Jin X, Guo L, Jiang Q, Wu N, Yao S. Prediction of protein secondary structure based on an improved channel attention and multiscale convolution module. Front Bioeng Biotechnol 2022; 10:901018. [PMID: 35935483 PMCID: PMC9355137 DOI: 10.3389/fbioe.2022.901018] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 06/28/2022] [Indexed: 11/13/2022] Open
Abstract
Prediction of the protein secondary structure is a key issue in protein science. Protein secondary structure prediction (PSSP) aims to construct a function that can map the amino acid sequence into the secondary structure so that the protein secondary structure can be obtained according to the amino acid sequence. Driven by deep learning, the prediction accuracy of the protein secondary structure has been greatly improved in recent years. To explore a new technique of PSSP, this study introduces the concept of an adversarial game into the prediction of the secondary structure, and a conditional generative adversarial network (GAN)-based prediction model is proposed. We introduce a new multiscale convolution module and an improved channel attention (ICA) module into the generator to generate the secondary structure, and then a discriminator is designed to conflict with the generator to learn the complicated features of proteins. Then, we propose a PSSP method based on the proposed multiscale convolution module and ICA module. The experimental results indicate that the conditional GAN-based protein secondary structure prediction (CGAN-PSSP) model is workable and worthy of further study because of the strong feature-learning ability of adversarial learning.
Collapse
Affiliation(s)
- Xin Jin
- Engineering Research Center of Cyberspace, Yunnan University, Kunming, Yunnan, China
- School of Software, Yunnan University, Kunming, Yunnan, China
| | - Lin Guo
- Engineering Research Center of Cyberspace, Yunnan University, Kunming, Yunnan, China
- School of Software, Yunnan University, Kunming, Yunnan, China
| | - Qian Jiang
- Engineering Research Center of Cyberspace, Yunnan University, Kunming, Yunnan, China
- School of Software, Yunnan University, Kunming, Yunnan, China
| | - Nan Wu
- Engineering Research Center of Cyberspace, Yunnan University, Kunming, Yunnan, China
- School of Software, Yunnan University, Kunming, Yunnan, China
| | - Shaowen Yao
- Engineering Research Center of Cyberspace, Yunnan University, Kunming, Yunnan, China
- School of Software, Yunnan University, Kunming, Yunnan, China
| |
Collapse
|
31
|
Monroe L, Kihara D. Using steered molecular dynamic tension for assessing quality of computational protein structure models. J Comput Chem 2022; 43:1140-1150. [PMID: 35475517 PMCID: PMC9133218 DOI: 10.1002/jcc.26876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 02/16/2022] [Accepted: 04/15/2022] [Indexed: 11/12/2022]
Abstract
The native structures of proteins, except for notable exceptions of intrinsically disordered proteins, in general take their most stable conformation in the physiological condition to maintain their structural framework so that their biological function can be properly carried out. Experimentally, the stability of a protein can be measured by several means, among which the pulling experiment using the atomic force microscope (AFM) stands as a unique method. AFM directly measures the resistance from unfolding, which can be quantified from the observed force-extension profile. It has been shown that key features observed in an AFM pulling experiment can be well reproduced by computational molecular dynamics simulations. Here, we applied computational pulling for estimating the accuracy of computational protein structure models under the hypothesis that the structural stability would positively correlated with the accuracy, i.e. the closeness to the native, of a model. We used in total 4929 structure models for 24 target proteins from the Critical Assessment of Techniques of Structure Prediction (CASP) and investigated if the magnitude of the break force, that is, the force required to rearrange the model's structure, from the force profile was sufficient information for selecting near-native models. We found that near-native models can be successfully selected by examining their break forces suggesting that high break force indeed indicates high stability of models. On the other hand, there were also near-native models that had relatively low peak forces. The mechanisms of the stability exhibited by the break forces were explored and discussed.
Collapse
Affiliation(s)
- Lyman Monroe
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
- Purdue University Center for Cancer Research, West Lafayette, IN, 47907, USA
| |
Collapse
|
32
|
Marcu ŞB, Tăbîrcă S, Tangney M. An Overview of Alphafold's Breakthrough. Front Artif Intell 2022; 5:875587. [PMID: 35757294 PMCID: PMC9218062 DOI: 10.3389/frai.2022.875587] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 05/02/2022] [Indexed: 11/13/2022] Open
Abstract
This paper presents a short summary of the protein folding problem, what it is and why it is significant. Introduces the CASP competition and how accuracy is measured. Looks at different approaches for solving the problem followed by a review of the current breakthroughs in the field introduced by AlphaFold 1 and AlphaFold 2.
Collapse
Affiliation(s)
- Ştefan-Bogdan Marcu
- Department of Computer Science, University College Cork, Cork, Ireland
- *Correspondence: Ştefan-Bogdan Marcu
| | - Sabin Tăbîrcă
- Department of Computer Science, University College Cork, Cork, Ireland
- Department of Informatics, Faculty of Informatics and Mathematics, Transilvania University, Brasov, Romania
| | - Mark Tangney
- Department of Computer Science, University College Cork, Cork, Ireland
| |
Collapse
|
33
|
Sykes J, Holland B, Charleston M. Unattained Geometric Configurations of Secondary Structure Elements in Protein Structural Space. J Struct Biol 2022; 214:107870. [DOI: 10.1016/j.jsb.2022.107870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Revised: 05/14/2022] [Accepted: 05/17/2022] [Indexed: 11/30/2022]
|
34
|
Akhter N, Kabir KL, Chennupati G, Vangara R, Alexandrov BS, Djidjev H, Shehu A. Improved Protein Decoy Selection via Non-Negative Matrix Factorization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1670-1682. [PMID: 33400654 DOI: 10.1109/tcbb.2020.3049088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
A central challenge in protein modeling research and protein structure prediction in particular is known as decoy selection. The problem refers to selecting biologically-active/native tertiary structures among a multitude of physically-realistic structures generated by template-free protein structure prediction methods. Research on decoy selection is active. Clustering-based methods are popular, but they fail to identify good/near-native decoys on datasets where near-native decoys are severely under-sampled by a protein structure prediction method. Reasonable progress is reported by methods that additionally take into account the internal energy of a structure and employ it to identify basins in the energy landscape organizing the multitude of decoys. These methods, however, incur significant time costs for extracting basins from the landscape. In this paper, we propose a novel decoy selection method based on non-negative matrix factorization. We demonstrate that our method outperforms energy landscape-based methods. In particular, the proposed method addresses both the time cost issue and the challenge of identifying good decoys in a sparse dataset, successfully recognizing near-native decoys for both easy and hard protein targets.
Collapse
|
35
|
Gu J, Zhang T, Wu C, Liang Y, Shi X. Refined Contact Map Prediction of Peptides Based on GCN and ResNet. Front Genet 2022; 13:859626. [PMID: 35571037 PMCID: PMC9092020 DOI: 10.3389/fgene.2022.859626] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 03/23/2022] [Indexed: 11/13/2022] Open
Abstract
Predicting peptide inter-residue contact maps plays an important role in computational biology, which determines the topology of the peptide structure. However, due to the limited number of known homologous structures, there is still much room for inter-residue contact map prediction. Current models are not sufficient for capturing the high accuracy relationship between the residues, especially for those with a long-range distance. In this article, we developed a novel deep neural network framework to refine the rough contact map produced by the existing methods. The rough contact map is used to construct the residue graph that is processed by the graph convolutional neural network (GCN). GCN can better capture the global information and is therefore used to grasp the long-range contact relationship. The residual convolutional neural network is also applied in the framework for learning local information. We conducted the experiments on four different test datasets, and the inter-residue long-range contact map prediction accuracy demonstrates the effectiveness of our proposed method.
Collapse
Affiliation(s)
- Jiawei Gu
- College of Computer Science and Technology, University of Jilin, Changchun, China
| | - Tianhao Zhang
- College of Computer Science and Technology, University of Jilin, Changchun, China
| | - Chunguo Wu
- College of Computer Science and Technology, University of Jilin, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Changchun, China
| | - Yanchun Liang
- College of Computer Science and Technology, University of Jilin, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Changchun, China
- School of Computer Science, Zhuhai College of Science and Technology, Zhuhai, China
| | - Xiaohu Shi
- College of Computer Science and Technology, University of Jilin, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Changchun, China
- School of Computer Science, Zhuhai College of Science and Technology, Zhuhai, China
- *Correspondence: Xiaohu Shi,
| |
Collapse
|
36
|
Schleif R, Espinosa M. Where to From Here? Front Mol Biosci 2022; 9:848444. [PMID: 35402507 PMCID: PMC8990317 DOI: 10.3389/fmolb.2022.848444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 02/25/2022] [Indexed: 11/13/2022] Open
Abstract
The biological-biochemical community has been shocked and delighted by the remarkable progress that has recently been made on a problem that has consumed the attention, energy, and resources of many, if not most of the scientists in the field for the past 50 years. The problem has been to predict the tertiary structure of a protein merely from its amino acid sequence. Nature does it easily enough, but it has been an incredibly difficult problem, often considered intractable, for humankind. The breakthrough has come in the form of two computer-based approaches, AlphaFold2 and RoseTTAFold in conjunction with factors such as the use of vast computing power, the field of artificial intelligence, and the existence of huge protein sequence databases. The advancement of these tools depended upon and was stimulated by the last 50 years of development of smaller and smaller and more and more powerful electronics components, mainly processors and memory. Along with the problem of protein folding, determining the function or mechanism of action of proteins has similarly limped along as did protein folding until the recent breakthroughs. Perhaps AlphaFold2 and RoseTTAFold can substantially aid in protein mechanistic studies. Now it is not completely insane to consider what might be the next grand challenge in biochemistry-biology. We offer several possibilities.
Collapse
Affiliation(s)
- Robert Schleif
- Department of Biology, Johns Hopkins University, Baltimore, MA, United States
- *Correspondence: Robert Schleif, ; Manuel Espinosa,
| | - Manuel Espinosa
- Department of Molecular and Cell Biology, Centro de Investigaciones Biológicas Margarita Salas, CSIC, Madrid, Spain
- *Correspondence: Robert Schleif, ; Manuel Espinosa,
| |
Collapse
|
37
|
Enireddy V, Karthikeyan C, Babu DV. OneHotEncoding and LSTM-based deep learning models for protein secondary structure prediction. Soft comput 2022. [DOI: 10.1007/s00500-022-06783-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
38
|
Cheng J, Xu Y, Zhao Y. Prediction of protein secondary structure based on deep residual convolutional neural network. BIOTECHNOL BIOTEC EQ 2022. [DOI: 10.1080/13102818.2022.2026815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Open
Affiliation(s)
- Jinyong Cheng
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu, PR China
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong, PR China
| | - Ying Xu
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong, PR China
| | - Yunxiang Zhao
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong, PR China
| |
Collapse
|
39
|
Kaushik R, Zhang KYJ. ProFitFun: a protein tertiary structure fitness function for quantifying the accuracies of model structures. Bioinformatics 2022; 38:369-376. [PMID: 34542606 DOI: 10.1093/bioinformatics/btab666] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 09/06/2021] [Accepted: 09/16/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION An accurate estimation of the quality of protein model structures typifies as a cornerstone in protein structure prediction regimes. Despite the recent groundbreaking success in the field of protein structure prediction, there are certain prospects for the improvement in model quality estimation at multiple stages of protein structure prediction and thus, to further push the prediction accuracy. Here, a novel approach, named ProFitFun, for assessing the quality of protein models is proposed by harnessing the sequence and structural features of experimental protein structures in terms of the preferences of backbone dihedral angles and relative surface accessibility of their amino acid residues at the tripeptide level. The proposed approach leverages upon the backbone dihedral angle and surface accessibility preferences of the residues by accounting for its N-terminal and C-terminal neighbors in the protein structure. These preferences are used to evaluate protein structures through a machine learning approach and tested on an extensive dataset of diverse proteins. RESULTS The approach was extensively validated on a large test dataset (n = 25 005) of protein structures, comprising 23 661 models of 82 non-homologous proteins and 1344 non-homologous experimental structures. In addition, an external dataset of 40 000 models of 200 non-homologous proteins was also used for the validation of the proposed method. Both datasets were further used for benchmarking the proposed method with four different state-of-the-art methods for protein structure quality assessment. In the benchmarking, the proposed method outperformed some state-of-the-art methods in terms of Spearman's and Pearson's correlation coefficients, average GDT-TS loss, sum of z-scores and average absolute difference of predictions over corresponding observed values. The high accuracy of the proposed approach promises a potential use of the sequence and structural features in computational protein design. AVAILABILITY AND IMPLEMENTATION http://github.com/KYZ-LSB/ProTerS-FitFun. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rahul Kaushik
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, Yokohama, Kanagawa 230-0045, Japan
| | - Kam Y J Zhang
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, Yokohama, Kanagawa 230-0045, Japan
| |
Collapse
|
40
|
Zheng W, Li Y, Zhang C, Zhou X, Pearce R, Bell EW, Huang X, Zhang Y. Protein structure prediction using deep learning distance and hydrogen-bonding restraints in CASP14. Proteins 2021; 89:1734-1751. [PMID: 34331351 PMCID: PMC8616857 DOI: 10.1002/prot.26193] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 07/06/2021] [Accepted: 07/22/2021] [Indexed: 11/10/2022]
Abstract
In this article, we report 3D structure prediction results by two of our best server groups ("Zhang-Server" and "QUARK") in CASP14. These two servers were built based on the D-I-TASSER and D-QUARK algorithms, which integrated four newly developed components into the classical protein folding pipelines, I-TASSER and QUARK, respectively. The new components include: (a) a new multiple sequence alignment (MSA) collection tool, DeepMSA2, which is extended from the DeepMSA program; (b) a contact-based domain boundary prediction algorithm, FUpred, to detect protein domain boundaries; (c) a residual convolutional neural network-based method, DeepPotential, to predict multiple spatial restraints by co-evolutionary features derived from the MSA; and (d) optimized spatial restraint energy potentials to guide the structure assembly simulations. For 37 FM targets, the average TM-scores of the first models produced by D-I-TASSER and D-QUARK were 96% and 112% higher than those constructed by I-TASSER and QUARK, respectively. The data analysis indicates noticeable improvements produced by each of the four new components, especially for the newly added spatial restraints from DeepPotential and the well-tuned force field that combines spatial restraints, threading templates, and generic knowledge-based potentials. However, challenges still exist in the current pipelines. These include difficulties in modeling multi-domain proteins due to low accuracy in inter-domain distance prediction and modeling protein domains from oligomer complexes, as the co-evolutionary analysis cannot distinguish inter-chain and intra-chain distances. Specifically tuning the deep learning-based predictors for multi-domain targets and protein complexes may be helpful to address these issues.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, China
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Eric W. Bell
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Xiaoqiang Huang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
41
|
Mokhtari DA, Appel MJ, Fordyce PM, Herschlag D. High throughput and quantitative enzymology in the genomic era. Curr Opin Struct Biol 2021; 71:259-273. [PMID: 34592682 PMCID: PMC8648990 DOI: 10.1016/j.sbi.2021.07.010] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 07/23/2021] [Indexed: 12/28/2022]
Abstract
Accurate predictions from models based on physical principles are the ultimate metric of our biophysical understanding. Although there has been stunning progress toward structure prediction, quantitative prediction of enzyme function has remained challenging. Realizing this goal will require large numbers of quantitative measurements of rate and binding constants and the use of these ground-truth data sets to guide the development and testing of these quantitative models. Ground truth data more closely linked to the underlying physical forces are also desired. Here, we describe technological advances that enable both types of ground truth measurements. These advances allow classic models to be tested, provide novel mechanistic insights, and place us on the path toward a predictive understanding of enzyme structure and function.
Collapse
Affiliation(s)
- D A Mokhtari
- Department of Biochemistry, Stanford University, Stanford, CA, 94305, USA
| | - M J Appel
- Department of Biochemistry, Stanford University, Stanford, CA, 94305, USA
| | - P M Fordyce
- Department of Bioengineering, Stanford University, Stanford, CA, 94305, USA; ChEM-H Institute, Stanford University, Stanford, CA, 94305, USA; Department of Genetics, Stanford University, Stanford, CA, 94305, USA; Chan Zuckerberg Biohub San Francisco, CA, 94110, USA.
| | - D Herschlag
- Department of Biochemistry, Stanford University, Stanford, CA, 94305, USA; Department of Chemical Engineering, Stanford University, Stanford, CA, 94305, USA; ChEM-H Institute, Stanford University, Stanford, CA, 94305, USA.
| |
Collapse
|
42
|
Wang W, Wang J, Li Z, Xu D, Shang Y. MUfoldQA_G: High-accuracy protein model QA via retraining and transformation. Comput Struct Biotechnol J 2021; 19:6282-6290. [PMID: 34900138 PMCID: PMC8636996 DOI: 10.1016/j.csbj.2021.11.021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 11/10/2021] [Accepted: 11/14/2021] [Indexed: 11/21/2022] Open
Abstract
Protein tertiary structure prediction is an active research area and has attracted significant attention recently due to the success of AlphaFold from DeepMind. Methods capable of accurately evaluating the quality of predicted models are of great importance. In the past, although many model quality assessment (QA) methods have been developed, their accuracies are not consistently high across different QA performance metrics for diverse target proteins. In this paper, we propose MUfoldQA_G, a new multi-model QA method that aims at simultaneously optimizing Pearson correlation and average GDT-TS difference, two commonly used QA performance metrics. This method is based on two new algorithms MUfoldQA_Gp and MUfoldQA_Gr. MUfoldQA_Gp uses a new technique to combine information from protein templates and reference protein models to maximize the Pearson correlation QA metric. MUfoldQA_Gr employs a new machine learning technique that resamples training data and retrains adaptively to learn a consensus model that is better than naïve consensus while minimizing average GDT-TS difference. MUfoldQA_G uses a new method to combine the results of MUfoldQA_Gr and MUfoldQA_Gp so that the final QA prediction results achieve low average GDT-TS difference that is close to the results from MUfoldQA_Gr, while maintaining high Pearson correlation that is the same as the results from MUfoldQA_Gp. In CASP14 QA categories, MUfoldQA_G ranked No. 1 in Pearson correlation and No. 2 in average GDT-TS difference.
Collapse
Affiliation(s)
- Wenbo Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Junlin Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Zhaoyu Li
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Yi Shang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
43
|
Thambu K, Glomb V, Hernadez R, Facelli JC. Microproteins: a 3D protein structure prediction analysis. J Biomol Struct Dyn 2021; 40:13738-13746. [PMID: 34705603 PMCID: PMC9489054 DOI: 10.1080/07391102.2021.1993343] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 10/11/2021] [Indexed: 01/03/2023]
Abstract
Microproteins are a novel and expanding group of small proteins encoded by less than 100-150 codons that are translated from small open reading frames (smORFs). It has been shown that smORFs and their corresponding microproteins make up a sizable fraction of the genome and proteome, but very little information on microproteins' structural features exists in the literature. In this paper, we present the results of analyzing the predicted structures of 44 microproteins. The results show that this set of microproteins have a different amino acid composition profiles, similar structural characteristics and fewer small-molecule ligand binding sites than regular proteins.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Kishan Thambu
- Department of Biomedical Informatics, The University of Utah, Salt Lake City, Utah
| | - Victoria Glomb
- Department of Biomedical Informatics, The University of Utah, Salt Lake City, Utah
| | - Rolando Hernadez
- Department of Biomedical Informatics, The University of Utah, Salt Lake City, Utah
| | - Julio C. Facelli
- Department of Biomedical Informatics, The University of Utah, Salt Lake City, Utah
- Center for Clinical and Translational Science, The University of Utah, Salt Lake City, Utah
| |
Collapse
|
44
|
de Oliveira GB, Pedrini H, Dias Z. Ensemble of Template-Free and Template-Based Classifiers for Protein Secondary Structure Prediction. Int J Mol Sci 2021; 22:11449. [PMID: 34768880 PMCID: PMC8583764 DOI: 10.3390/ijms222111449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 10/18/2021] [Accepted: 10/20/2021] [Indexed: 11/16/2022] Open
Abstract
Protein secondary structures are important in many biological processes and applications. Due to advances in sequencing methods, there are many proteins sequenced, but fewer proteins with secondary structures defined by laboratory methods. With the development of computer technology, computational methods have (started to) become the most important methodologies for predicting secondary structures. We evaluated two different approaches to this problem-driven by the recent results obtained by computational methods in this task-(i) template-free classifiers, based on machine learning techniques; and (ii) template-based classifiers, based on searching tools. Both approaches are formed by different sub-classifiers-six for template-free and two for template-based, each with a specific view of the protein. Our results show that these ensembles improve the results of each approach individually.
Collapse
|
45
|
Ma EJ, Siirola E, Moore C, Kummer A, Stoeckli M, Faller M, Bouquet C, Eggimann F, Ligibel M, Huynh D, Cutler G, Siegrist L, Lewis RA, Acker AC, Freund E, Koch E, Vogel M, Schlingensiepen H, Oakeley EJ, Snajdrova R. Machine-Directed Evolution of an Imine Reductase for Activity and Stereoselectivity. ACS Catal 2021. [DOI: 10.1021/acscatal.1c02786] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Eric J. Ma
- NIBR Informatics, Novartis Institutes for BioMedical Research (NIBR), 181 Massachusetts Ave, Cambridge, Massachusetts 02139, United States
| | - Elina Siirola
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Charles Moore
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Arkadij Kummer
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Markus Stoeckli
- Analytical Sciences and Imaging, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Michael Faller
- Chemical Biology and Therapeutics, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Caroline Bouquet
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Fabian Eggimann
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Mathieu Ligibel
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Dan Huynh
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Geoffrey Cutler
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Luca Siegrist
- NIBR Biologics Center, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Richard A. Lewis
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Anne-Christine Acker
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Ernst Freund
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Elke Koch
- Chemical Biology and Therapeutics, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Markus Vogel
- NIBR Biologics Center, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Holger Schlingensiepen
- Chemical Biology and Therapeutics, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Edward J. Oakeley
- Chemical Biology and Therapeutics, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| | - Radka Snajdrova
- Global Discovery Chemistry, Novartis Institutes for BioMedical Research (NIBR), Novartis Campus, CH-4056 Basel, Switzerland
| |
Collapse
|
46
|
Jandova Z, Vargiu AV, Bonvin AMJJ. Native or Non-Native Protein-Protein Docking Models? Molecular Dynamics to the Rescue. J Chem Theory Comput 2021; 17:5944-5954. [PMID: 34342983 PMCID: PMC8444332 DOI: 10.1021/acs.jctc.1c00336] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Indexed: 11/29/2022]
Abstract
Molecular docking excels at creating a plethora of potential models of protein-protein complexes. To correctly distinguish the favorable, native-like models from the remaining ones remains, however, a challenge. We assessed here if a protocol based on molecular dynamics (MD) simulations would allow distinguishing native from non-native models to complement scoring functions used in docking. To this end, the first models for 25 protein-protein complexes were generated using HADDOCK. Next, MD simulations complemented with machine learning were used to discriminate between native and non-native complexes based on a combination of metrics reporting on the stability of the initial models. Native models showed higher stability in almost all measured properties, including the key ones used for scoring in the Critical Assessment of PRedicted Interaction (CAPRI) competition, namely the positional root mean square deviations and fraction of native contacts from the initial docked model. A random forest classifier was trained, reaching a 0.85 accuracy in correctly distinguishing native from non-native complexes. Reasonably modest simulation lengths of the order of 50-100 ns are sufficient to reach this accuracy, which makes this approach applicable in practice.
Collapse
Affiliation(s)
- Zuzana Jandova
- Computational
Structural Biology Group, Bijvoet Centre for Biomolecular Research,
Faculty of Science—Chemistry, Utrecht
University, Padualaan 8, 3584 CH Utrecht, the Netherlands
| | - Attilio Vittorio Vargiu
- Physics
Department, University of Cagliari, Cittadella
Universitaria, S.P. 8 km 0.700, 09042 Monserrato, Italy
| | - Alexandre M. J. J. Bonvin
- Computational
Structural Biology Group, Bijvoet Centre for Biomolecular Research,
Faculty of Science—Chemistry, Utrecht
University, Padualaan 8, 3584 CH Utrecht, the Netherlands
| |
Collapse
|
47
|
Kryshtafovych A, Moult J, Billings WM, Della Corte D, Fidelis K, Kwon S, Olechnovič K, Seok C, Venclovas Č, Won J. Modeling SARS-CoV-2 proteins in the CASP-commons experiment. Proteins 2021; 89:1987-1996. [PMID: 34462960 PMCID: PMC8616790 DOI: 10.1002/prot.26231] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 08/23/2021] [Accepted: 08/26/2021] [Indexed: 01/21/2023]
Abstract
Critical Assessment of Structure Prediction (CASP) is an organization aimed at advancing the state of the art in computing protein structure from sequence. In the spring of 2020, CASP launched a community project to compute the structures of the most structurally challenging proteins coded for in the SARS-CoV-2 genome. Forty-seven research groups submitted over 3000 three-dimensional models and 700 sets of accuracy estimates on 10 proteins. The resulting models were released to the public. CASP community members also worked together to provide estimates of local and global accuracy and identify structure-based domain boundaries for some proteins. Subsequently, two of these structures (ORF3a and ORF8) have been solved experimentally, allowing assessment of both model quality and the accuracy estimates. Models from the AlphaFold2 group were found to have good agreement with the experimental structures, with main chain GDT_TS accuracy scores ranging from 63 (a correct topology) to 87 (competitive with experiment).
Collapse
Affiliation(s)
| | - John Moult
- Department of Cell Biology and Molecular genetics, Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland, USA
| | - Wendy M Billings
- Department of Physics & Astronomy, Brigham Young University, Provo, Utah, USA
| | - Dennis Della Corte
- Department of Physics & Astronomy, Brigham Young University, Provo, Utah, USA
| | - Krzysztof Fidelis
- Genome Center, University of California, Davis, Davis, California, USA
| | - Sohee Kwon
- Department of Chemistry, Seoul National University, Seoul, South Korea
| | - Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul, South Korea
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Jonghun Won
- Department of Chemistry, Seoul National University, Seoul, South Korea
| | | |
Collapse
|
48
|
Egbert M, Ghani U, Ashizawa R, Kotelnikov S, Nguyen T, Desta I, Hashemi N, Padhorny D, Kozakov D, Vajda S. Assessing the binding properties of CASP14 targets and models. Proteins 2021; 89:1922-1939. [PMID: 34368994 DOI: 10.1002/prot.26209] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 07/22/2021] [Accepted: 08/04/2021] [Indexed: 12/27/2022]
Abstract
An important question is how well the models submitted to CASP retain the properties of target structures. We investigate several properties related to binding. First we explore the binding of small molecules as probes, and count the number of interactions between each residue and such probes, resulting in a binding fingerprint. The similarity between two fingerprints, one for the X-ray structure and the other for a model, is determined by calculating their correlation coefficient. The fingerprint similarity weakly correlates with global measures of accuracy, and GDT_TS higher than 80 is a necessary but not sufficient condition for the conservation of surface binding properties. The advantage of this approach is that it can be carried out without information on potential ligands and their binding sites. The latter information was available for a few targets, and we explored whether the CASP14 models can be used to predict binding sites and to dock small ligands. Finally, we tested the ability of models to reproduce protein-protein interactions by docking both the X-ray structures and the models to their interaction partners in complexes. The analysis showed that in CASP14 the quality of individual domain models is approaching that offered by X-ray crystallography, and hence such models can be successfully used for the identification of binding and regulatory sites, as well as for assembling obligatory protein-protein complexes. Success of ligand docking, however, often depends on fine details of the binding interface, and thus may require accounting for conformational changes by simulation methods.
Collapse
Affiliation(s)
- Megan Egbert
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Usman Ghani
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Ryota Ashizawa
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA.,Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Sergei Kotelnikov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA.,Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Thu Nguyen
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Israel Desta
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Nasser Hashemi
- Division of Systems Engineering, Boston University, Boston, Massachusetts, USA
| | - Dzmitry Padhorny
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA.,Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA.,Department of Chemistry, Boston University, Boston, Massachusetts, USA
| |
Collapse
|
49
|
Zheng W, Zhang C, Li Y, Pearce R, Bell EW, Zhang Y. Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations. CELL REPORTS METHODS 2021; 1:100014. [PMID: 34355210 PMCID: PMC8336924 DOI: 10.1016/j.crmeth.2021.100014] [Citation(s) in RCA: 299] [Impact Index Per Article: 74.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 04/22/2021] [Accepted: 05/03/2021] [Indexed: 12/23/2022]
Abstract
Structure prediction for proteins lacking homologous templates in the Protein Data Bank (PDB) remains a significant unsolved problem. We developed a protocol, C-I-TASSER, to integrate interresidue contact maps from deep neural-network learning with the cutting-edge I-TASSER fragment assembly simulations. Large-scale benchmark tests showed that C-I-TASSER can fold more than twice the number of non-homologous proteins than the I-TASSER, which does not use contacts. When applied to a folding experiment on 8,266 unsolved Pfam families, C-I-TASSER successfully folded 4,162 domain families, including 504 folds that are not found in the PDB. Furthermore, it created correct folds for 85% of proteins in the SARS-CoV-2 genome, despite the quick mutation rate of the virus and sparse sequence profiles. The results demonstrated the critical importance of coupling whole-genome and metagenome-based evolutionary information with optimal structure assembly simulations for solving the problem of non-homologous protein structure prediction.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Eric W. Bell
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
50
|
Pearce R, Zhang Y. Toward the solution of the protein structure prediction problem. J Biol Chem 2021; 297:100870. [PMID: 34119522 PMCID: PMC8254035 DOI: 10.1016/j.jbc.2021.100870] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 06/07/2021] [Accepted: 06/09/2021] [Indexed: 11/20/2022] Open
Abstract
Since Anfinsen demonstrated that the information encoded in a protein's amino acid sequence determines its structure in 1973, solving the protein structure prediction problem has been the Holy Grail of structural biology. The goal of protein structure prediction approaches is to utilize computational modeling to determine the spatial location of every atom in a protein molecule starting from only its amino acid sequence. Depending on whether homologous structures can be found in the Protein Data Bank (PDB), structure prediction methods have been historically categorized as template-based modeling (TBM) or template-free modeling (FM) approaches. Until recently, TBM has been the most reliable approach to predicting protein structures, and in the absence of reliable templates, the modeling accuracy sharply declines. Nevertheless, the results of the most recent community-wide assessment of protein structure prediction experiment (CASP14) have demonstrated that the protein structure prediction problem can be largely solved through the use of end-to-end deep machine learning techniques, where correct folds could be built for nearly all single-domain proteins without using the PDB templates. Critically, the model quality exhibited little correlation with the quality of available template structures, as well as the number of sequence homologs detected for a given target protein. Thus, the implementation of deep-learning techniques has essentially broken through the 50-year-old modeling border between TBM and FM approaches and has made the success of high-resolution structure prediction significantly less dependent on template availability in the PDB library.
Collapse
Affiliation(s)
- Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA; Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, USA.
| |
Collapse
|