1
|
Maher S, Atta S, Kamel M, A. Hammam O, Okasha H. Therapeutic Potential and Mechanistic Insights of a Novel Synthetic α-Lactalbumin-Derived Peptide for the Treatment of Liver Fibrosis. J Clin Exp Hepatol 2025; 15:102488. [PMID: 39868009 PMCID: PMC11755051 DOI: 10.1016/j.jceh.2024.102488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/22/2024] [Accepted: 12/09/2024] [Indexed: 01/28/2025] Open
Abstract
Background Liver fibrosis is a serious global health issue, but current treatment options are limited due to a lack of approved therapies capable of preventing or reversing established fibrosis. Aim This study investigated the antifibrotic effects of a synthetic peptide derived from α-lactalbumin in a mouse model of thioacetamide (TAA)-induced liver fibrosis. Methods In silico analyses were conducted to assess the physicochemical properties, pharmacophore features, and docking interactions of the peptide. Mice with induced fibrosis were treated with three different doses of the synthetic peptide (2.5, 5, or 10 μg/kg, twice weekly for 8 weeks). Immunohistochemistry, antioxidant enzyme levels, IGF-1 levels, and expression of fibrosis-related genes were assessed. Results Peptide interacted with human prothrombin's many sites with varying binding affinities. Besides, ligand similarity analysis identified 26 thrombin inhibitors with high Tanimoto scores. The peptide exhibited antifibrotic effects with dose-dependent improvements. The upregulated expression of IGF-1 in all treated groups compared with the pathological untreated group. In contrast, fibrotic markers such as TIMP, PDGF-α, and TGF-β were upregulated in the untreated pathological group but downregulated in the peptide-treated groups. The assessment of IGF-1 concentration in sera demonstrated that the peptide-treated groups exhibited an increase in IGF-1 levels. Histopathological examination of peptide-treated groups showed normal hepatic architecture with hepatocytes arranged in thin plates. Immunohistochemical results of high dose peptide-treated group showed a few numbers of positive αSMA with mild proliferating cell nuclear antigen expression. Conclusion The synthetic α-lactalbumin peptide shows promise as an antifibrotic therapy. Its safety and effectiveness are supported by in silico and in vivo analyses. The peptide's pharmacophore characteristics and potential as a thrombin inhibitor combine with its ability to downregulate fibrotic markers and maintain liver tissue integrity. These findings concluded the potential of this peptide as a promising therapeutic candidate for liver fibrosis, warranting further investigation.
Collapse
Affiliation(s)
- Sara Maher
- Immunology Department, Theodor Bilharz Research Institute, Giza, Egypt
| | - Shimaa Atta
- Immunology Department, Theodor Bilharz Research Institute, Giza, Egypt
| | - Manal Kamel
- Immunology Department, Theodor Bilharz Research Institute, Giza, Egypt
| | - Olfat A. Hammam
- Pathology Department, Theodor Bilharz Research Institute, Giza, Egypt
| | - Hend Okasha
- Biochemistry and Molecular Biology Department, Theodor Bilharz Research Institute, Giza, Egypt
| |
Collapse
|
2
|
Hameduh T, Miller AD, Heger Z, Haddad Y. The proteomic code: Novel amino acid residue pairing models "encode" protein folding and protein-protein interactions. Comput Biol Med 2025; 190:110033. [PMID: 40112562 DOI: 10.1016/j.compbiomed.2025.110033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 03/11/2025] [Accepted: 03/13/2025] [Indexed: 03/22/2025]
Abstract
Recent advances in protein 3D structure prediction using deep learning have focused on the importance of amino acid residue-residue connections (i.e., pairwise atomic contacts) for accuracy at the expense of mechanistic interpretability. Therefore, we decided to perform a series of analyses based on an alternative framework of residue-residue connections making primary use of the TOP2018 dataset. This framework of residue-residue connections is derived from amino acid residue pairing models both historic and new, all based on genetic principles complemented by relevant biophysical principles. Of these pairing models, three new models (named the GU, Transmuted and Shift pairing models) exhibit the highest observed-over-expected ratios and highest correlations in statistical analyses with various intra- and inter-chain datasets, in comparison to the remaining models. In addition, these new pairing models are universally frequent across different connection ranges, secondary structure connections, and protein sizes. Accordingly, following further statistical and other analyses described herein, we have come to a major conclusion that all three pairing models together could represent the basis of a universal proteomic code (second genetic code) sufficient, in and of itself, to "encode" for both protein folding mechanisms and protein-protein interactions.
Collapse
Affiliation(s)
- Tareq Hameduh
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00, Brno, Czech Republic; MendelFOLD s.r.o., Zezulova 174/3, CZ-613 00, Brno, Czech Republic
| | - Andrew D Miller
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00, Brno, Czech Republic; MendelFOLD s.r.o., Zezulova 174/3, CZ-613 00, Brno, Czech Republic; Veterinary Research Institute, Hudcova 296/70, CZ-621 00, Brno, Czech Republic; KP Therapeutics (Europe) s.r.o., Purkyňova 649/127, CZ-612 00, Brno, Czech Republic
| | - Zbynek Heger
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00, Brno, Czech Republic; MendelFOLD s.r.o., Zezulova 174/3, CZ-613 00, Brno, Czech Republic
| | - Yazan Haddad
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00, Brno, Czech Republic; MendelFOLD s.r.o., Zezulova 174/3, CZ-613 00, Brno, Czech Republic.
| |
Collapse
|
3
|
Alanazi W, Meng D, Pollastri G. Advancements in one-dimensional protein structure prediction using machine learning and deep learning. Comput Struct Biotechnol J 2025; 27:1416-1430. [PMID: 40242292 PMCID: PMC12002955 DOI: 10.1016/j.csbj.2025.04.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2025] [Revised: 04/01/2025] [Accepted: 04/02/2025] [Indexed: 04/18/2025] Open
Abstract
The accurate prediction of protein structures remains a cornerstone challenge in structural bioinformatics, essential for understanding the intricate relationship between protein sequence, structure, and function. Recent advancements in Machine Learning (ML) and Deep Learning (DL) have revolutionized this field, offering innovative approaches to tackle one- dimensional (1D) protein structure annotations, including secondary structure, solvent accessibility, and intrinsic disorder. This review highlights the evolution of predictive methodologies, from early machine learning models to sophisticated deep learning frameworks that integrate sequence embeddings and pretrained language models. Key advancements, such as AlphaFold's transformative impact on structure prediction and the rise of protein language models (PLMs), have enabled unprecedented accuracy in capturing sequence-structure relationships. Furthermore, we explore the role of specialized datasets, benchmarking competitions, and multimodal integration in shaping state-of-the-art prediction models. By addressing challenges in data quality, scalability, interpretability, and task-specific optimization, this review underscores the transformative impact of ML, DL, and PLMs on 1D protein prediction while providing insights into emerging trends and future directions in this rapidly evolving field.
Collapse
Affiliation(s)
- Wafa Alanazi
- School of Computer Science, University College Dublin, Belfield, Dublin D04 C1P1, Ireland
- Department of Computer Science, College of Science, Northern Border University, Arar, Saudi Arabia
| | - Di Meng
- School of Computer Science, University College Dublin, Belfield, Dublin D04 C1P1, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin, Belfield, Dublin D04 C1P1, Ireland
| |
Collapse
|
4
|
Öter A. Deep learning-based LDL-C level prediction and explainable AI interpretation. Comput Biol Med 2025; 188:109905. [PMID: 40010176 DOI: 10.1016/j.compbiomed.2025.109905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2024] [Revised: 02/18/2025] [Accepted: 02/19/2025] [Indexed: 02/28/2025]
Abstract
This study investigates the use of deep learning (DL) models to predict low-density lipoprotein cholesterol (LDL-C) levels. The dataset obtained from New York-Presbyterian Hospital/Weill Cornell Medical Center includes triglycerides (TG), total cholesterol (TC) and high-density lipoprotein cholesterol (HDL-C). LDL-C prediction was performed using DL models such as CNN, RNN and LSTM and the results were compared with traditional machine learning (ML) and LDL-C formulas. The obtained results showed that DL models are more successful than traditional formulas while giving closer results to ML models. It is shown that DL models can predict LDL-C with higher accuracy compared to the Sampson, and Martin equation. In particular, RNN and LSTM models performed better in LDL-C prediction than the other formulas. In addition, the prediction results of DL models were explained using Local Interpretable Model-Agnostic Explanations (LIME) method. The features of the proposed models provide more parameters to explain the AI Model better in comparison with the ML models but require more computational efforts to explain DL model decisions. The results demonstrate that DL models in predicting LDL-C levels are more effective than traditional methods for LDL-C prediction and can be used in clinical applications. As a result, the findings might provide significant contributions to assessing cardiovascular disease risk and planning treatment protocols.
Collapse
Affiliation(s)
- Ali Öter
- Department of Electronics and Automation, Kahramanmaraş Sütçü Imam University, Kahramanmaraş, Türkiye.
| |
Collapse
|
5
|
Malhotra Y, John J, Yadav D, Sharma D, Vanshika, Rawal K, Mishra V, Chaturvedi N. Advancements in protein structure prediction: A comparative overview of AlphaFold and its derivatives. Comput Biol Med 2025; 188:109842. [PMID: 39970826 DOI: 10.1016/j.compbiomed.2025.109842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2024] [Revised: 02/07/2025] [Accepted: 02/10/2025] [Indexed: 02/21/2025]
Abstract
This review provides a comprehensive analysis of AlphaFold (AF) and its derivatives (AF2 and AF3) in protein structure prediction. These tools have revolutionized structural biology with their highly accurate predictions, driving progress in protein modeling, drug discovery, and the study of protein dynamics. Its exceptional accuracy has redefined our understanding of protein folding, which enables groundbreaking advancements in protein design, disease research and discusses future integration with experimental techniques. In addition, their achievement features, architectures, important case studies, and noteworthy effects in the field of biology and medicine were evaluated. In consideration of the fact that AF2 is a relatively recent innovation, it has already been taken into account in many studies that highlight its applications in many ways. Moreover, the limitations of AF2 that directed to the introduction of AF3 are also reported, which is a great improvement as it provides precise predictions of the structures and interactions of proteins, DNA, RNA, and ligands, thereby aiding in the understanding of the molecular level. Addressing current challenges and forecasting future developments, this work underscores the lasting significance of AF in reshaping the scientific landscape of protein research.
Collapse
Affiliation(s)
- Yuktika Malhotra
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Jerry John
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Deepika Yadav
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Deepshikha Sharma
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Vanshika
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Kamal Rawal
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Vaibhav Mishra
- Amity Institute of Microbial Technology, Amity University, Uttar Pradesh, 201303, India
| | - Navaneet Chaturvedi
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India.
| |
Collapse
|
6
|
Gillani M, Pollastri G. Impact of Alignments on the Accuracy of Protein Subcellular Localization Predictions. Proteins 2025; 93:745-759. [PMID: 39575640 PMCID: PMC11809130 DOI: 10.1002/prot.26767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 10/01/2024] [Accepted: 11/01/2024] [Indexed: 02/11/2025]
Abstract
Alignments in bioinformatics refer to the arrangement of sequences to identify regions of similarity that can indicate functional, structural, or evolutionary relationships. They are crucial for bioinformaticians as they enable accurate predictions and analyses in various applications, including protein subcellular localization. The predictive model used in this article is based on a deep - convolutional architecture. We tested configurations of Deep N-to-1 convolutional neural networks of various depths and widths during experimentation for the evaluation of better-performing values across a diverse set of eight classes. For without alignment assessment, sequences are encoded using one-hot encoding, converting each character into a numerical representation, which is straightforward for non-numerical data and useful for machine learning models. For with alignments assessment, multiple sequence alignments (MSAs) are created using PSI-BLAST, capturing evolutionary information by calculating frequencies of residues and gaps. The average difference in peak performance between models with alignments and without alignments is approximately 15.82%. The average difference in the highest accuracy achieved with alignments compared with without alignments is approximately 15.16%. Thus, extensive experimentation indicates that higher alignment accuracy implies a more reliable model and improved prediction accuracy, which can be trusted to deliver consistent performance across different layers and classes of subcellular localization predictions. This research provides valuable insights into prediction accuracies with and without alignments, offering bioinformaticians an effective tool for better understanding while potentially reducing the need for extensive experimental validations. The source code and datasets are available at http://distilldeep.ucd.ie/SCL8/.
Collapse
Affiliation(s)
- Maryam Gillani
- School of Computer ScienceUniversity College Dublin (UCD)DublinIreland
| | | |
Collapse
|
7
|
Poblete S, Mlynarczyk M, Szachniuk M. Unknotting RNA: A method to resolve computational artifacts. PLoS Comput Biol 2025; 21:e1012843. [PMID: 40112280 PMCID: PMC11925458 DOI: 10.1371/journal.pcbi.1012843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2024] [Accepted: 02/02/2025] [Indexed: 03/22/2025] Open
Abstract
RNA 3D structure prediction often encounters entanglements, computational artifacts that complicate structural models, resulting in their exclusion from further studies despite the potentially accurate prediction of regions outside the entanglement. This study presents a protocol aimed at resolving such issues in RNA models while preserving the overall 3D fold and structural integrity. By employing the SPQR coarse-grained model and short Molecular Dynamics simulations, the protocol imposes energy terms that enable selective modifications to disentangle structures without causing significant distortions. The method was validated on 195 entangled RNA models from CASP15 and RNA-Puzzles, successfully resolving over 70% of interlaces and approximately 40% of lassos, with minimal impact on the original geometry but notable improvement in ClashScore. The efficiency of untangling conformations that are unequivocally classified as artifacts is 81%. Certain cases, particularly those involving dense packing of atoms or complex secondary structures, posed challenges that limited the efficiency of the method. In this paper, we present quantitative results from the application of the protocol and discuss examples of both successfully disentangled and unresolved structures. We show a viable approach for refining models previously deemed unsuitable due to topological artifacts.
Collapse
Affiliation(s)
- Simón Poblete
- Facultadde Ingeniería, Arquitectura y Diseño, Universidad San Sebastián, Santiago, Chile
- Centro BASAL Ciencia & Vida, Universidad San Sebastián, Santiago, Chile
| | - Mikolaj Mlynarczyk
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
| | - Marta Szachniuk
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan,Poland
| |
Collapse
|
8
|
Ambreen S, Umar M, Noor A, Jain H, Ali R. Advanced AI and ML frameworks for transforming drug discovery and optimization: With innovative insights in polypharmacology, drug repurposing, combination therapy and nanomedicine. Eur J Med Chem 2025; 284:117164. [PMID: 39721292 DOI: 10.1016/j.ejmech.2024.117164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Revised: 11/24/2024] [Accepted: 11/27/2024] [Indexed: 12/28/2024]
Abstract
Artificial Intelligence (AI) and Machine Learning (ML) are transforming drug discovery by overcoming traditional challenges like high costs, time-consuming, and frequent failures. AI-driven approaches streamline key phases, including target identification, lead optimization, de novo drug design, and drug repurposing. Frameworks such as deep neural networks (DNNs), convolutional neural networks (CNNs), and deep reinforcement learning (DRL) models have shown promise in identifying drug targets, optimizing delivery systems, and accelerating drug repurposing. Generative adversarial networks (GANs) and variational autoencoders (VAEs) aid de novo drug design by creating novel drug-like compounds with desired properties. Case studies, such as DDR1 kinase inhibitors designed using generative models and CDK20 inhibitors developed via structure-based methods, highlight AI's ability to produce highly specific therapeutics. Models like SNF-CVAE and DeepDR further advance drug repurposing by uncovering new therapeutic applications for existing drugs. Advanced ML algorithms enhance precision in predicting drug efficacy, toxicity, and ADME-Tox properties, reducing development costs and improving drug-target interactions. AI also supports polypharmacology by optimizing multi-target drug interactions and enhances combination therapy through predictions of drug synergies and antagonisms. In nanomedicine, AI models like CURATE.AI and the Hartung algorithm optimize personalized treatments by predicting toxicological risks and real-time dosing adjustments with high accuracy. Despite its potential, challenges like data quality, model interpretability, and ethical concerns must be addressed. High-quality datasets, transparent models, and unbiased algorithms are essential for reliable AI applications. As AI continues to evolve, it is poised to revolutionize drug discovery and personalized medicine, advancing therapeutic development and patient care.
Collapse
Affiliation(s)
- Subiya Ambreen
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Mohammad Umar
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Aaisha Noor
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Himangini Jain
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Ruhi Ali
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India.
| |
Collapse
|
9
|
Gillani M, Pollastri G. Protein subcellular localization prediction tools. Comput Struct Biotechnol J 2024; 23:1796-1807. [PMID: 38707539 PMCID: PMC11066471 DOI: 10.1016/j.csbj.2024.04.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/11/2024] [Accepted: 04/11/2024] [Indexed: 05/07/2024] Open
Abstract
Protein subcellular localization prediction is of great significance in bioinformatics and biological research. Most of the proteins do not have experimentally determined localization information, computational prediction methods and tools have been acting as an active research area for more than two decades now. Knowledge of the subcellular location of a protein provides valuable information about its functionalities, the functioning of the cell, and other possible interactions with proteins. Fast, reliable, and accurate predictors provides platforms to harness the abundance of sequence data to predict subcellular locations accordingly. During the last decade, there has been a considerable amount of research effort aimed at developing subcellular localization predictors. This paper reviews recent subcellular localization prediction tools in the Eukaryotic, Prokaryotic, and Virus-based categories followed by a detailed analysis. Each predictor is discussed based on its main features, strengths, weaknesses, algorithms used, prediction techniques, and analysis. This review is supported by prediction tools taxonomies that highlight their rele- vant area and examples for uncomplicated categorization and ease of understandability. These taxonomies help users find suitable tools according to their needs. Furthermore, recent research gaps and challenges are discussed to cover areas that need the utmost attention. This survey provides an in-depth analysis of the most recent prediction tools to facilitate readers and can be considered a quick guide for researchers to identify and explore the recent literature advancements.
Collapse
Affiliation(s)
- Maryam Gillani
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| |
Collapse
|
10
|
Markandan K, Tiong YW, Sankaran R, Subramanian S, Markandan UD, Chaudhary V, Numan A, Khalid M, Walvekar R. Emergence of infectious diseases and role of advanced nanomaterials in point-of-care diagnostics: a review. Biotechnol Genet Eng Rev 2024; 40:3438-3526. [PMID: 36243900 DOI: 10.1080/02648725.2022.2127070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 09/12/2022] [Indexed: 11/09/2022]
Abstract
Infectious outbreaks are the foremost global public health concern, challenging the current healthcare system, which claims millions of lives annually. The most crucial way to control an infectious outbreak is by early detection through point-of-care (POC) diagnostics. POC diagnostics are highly advantageous owing to the prompt diagnosis, which is economical, simple and highly efficient with remote access capabilities. In particular, utilization of nanomaterials to architect POC devices has enabled highly integrated and portable (compact) devices with enhanced efficiency. As such, this review will detail the factors influencing the emergence of infectious diseases and methods for fast and accurate detection, thus elucidating the underlying factors of these infections. Furthermore, it comprehensively highlights the importance of different nanomaterials in POCs to detect nucleic acid, whole pathogens, proteins and antibody detection systems. Finally, we summarize findings reported on nanomaterials based on advanced POCs such as lab-on-chip, lab-on-disc-devices, point-of-action and hospital-on-chip. To this end, we discuss the challenges, potential solutions, prospects of integrating internet-of-things, artificial intelligence, 5G communications and data clouding to achieve intelligent POCs.
Collapse
Affiliation(s)
- Kalaimani Markandan
- Temasek Laboratories, Nanyang Technological University, Nanyang Drive, Singapore
- Faculty of Engineering, Technology and Built Environment, UCSI University, Kuala Lumpur, Malaysia
| | - Yong Wei Tiong
- NUS Environmental Research Institute, National University of Singapore, Engineering Drive, Singapore
| | - Revathy Sankaran
- Graduate School, University of Nottingham Malaysia Campus, Semenyih, Selangor, Malaysia
| | - Sakthinathan Subramanian
- Department of Materials & Mineral Resources Engineering, National Taipei University of Technology (NTUT), Taipei, Taiwan
| | | | - Vishal Chaudhary
- Research Cell & Department of Physics, Bhagini Nivedita College, University of Delhi, New Delhi, India
| | - Arshid Numan
- Graphene & Advanced 2D Materials Research Group (GAMRG), School of Engineering and Technology, Sunway University, Petaling Jaya, Selangor, Malaysia
- Sunway Materials Smart Science & Engineering (SMS2E) Research Cluster School of Engineering and Technology, Sunway University, Selangor, Malaysia
| | - Mohammad Khalid
- Graphene & Advanced 2D Materials Research Group (GAMRG), School of Engineering and Technology, Sunway University, Petaling Jaya, Selangor, Malaysia
- Sunway Materials Smart Science & Engineering (SMS2E) Research Cluster School of Engineering and Technology, Sunway University, Selangor, Malaysia
| | - Rashmi Walvekar
- Department of Chemical Engineering, School of Energy and Chemical Engineering, Xiamen University Malaysia, Sepang, Selangor, Malaysia
| |
Collapse
|
11
|
He L, Yan M, Naeem M, Chen M, Chen Y, Ni Z, Chen H. Enhancing Manganese Peroxidase: Innovations in Genetic Modification, Screening Processes, and Sustainable Agricultural Applications. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2024; 72:26040-26056. [PMID: 39535434 DOI: 10.1021/acs.jafc.4c05878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
Manganese peroxidase (MnP), a vital extracellular enzyme for the degradation of lignin and other organic pollutants, has demonstrated immense potential for agricultural and environmental applications, including straw pretreatment, feed fermentation, mycotoxin degradation, and water treatment. However, current research remains in its exploratory phase, with naturally sourced MnP unable to meet industrial-scale demands and no mature commercial enzyme preparations available on the market. This comprehensive review innovatively constructs a framework for MnP research, probing into its molecular conformation and catalytic principles, while providing an overview of the advancements in high-throughput screening and In silco designing strategies. Specifically, this review focuses on the practical applications of MnP in sustainable agriculture, elaborating on its potential and challenges in straw resource utilization, efficient feed fermentation, mycotoxin control, and water quality improvement. Furthermore, this review summarizes the recent achievements in optimizing MnP activity through enzyme engineering techniques and discuss customized mutation strategies tailored to specific agricultural and environmental requirements, thereby laying a solid theoretical foundation and scientific basis for the industrial production and commercialization of MnP.
Collapse
Affiliation(s)
- Lu He
- School of the Life Sciences, Jiangsu University, Zhenjiang 212000, China
| | - Mingchen Yan
- School of the Life Sciences, Jiangsu University, Zhenjiang 212000, China
| | - Muhammad Naeem
- School of the Life Sciences, Jiangsu University, Zhenjiang 212000, China
| | - Minghaonan Chen
- School of the Life Sciences, Jiangsu University, Zhenjiang 212000, China
| | - Yong Chen
- School of the Life Sciences, Jiangsu University, Zhenjiang 212000, China
| | - Zhong Ni
- School of the Life Sciences, Jiangsu University, Zhenjiang 212000, China
| | - Huayou Chen
- School of the Life Sciences, Jiangsu University, Zhenjiang 212000, China
| |
Collapse
|
12
|
Olanders G, Testa G, Tibo A, Nittinger E, Tyrchan C. Challenge for Deep Learning: Protein Structure Prediction of Ligand-Induced Conformational Changes at Allosteric and Orthosteric Sites. J Chem Inf Model 2024; 64:8481-8494. [PMID: 39484820 DOI: 10.1021/acs.jcim.4c01475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
In the realm of biomedical research, understanding the intricate structure of proteins is crucial, as these structures determine how proteins function within our bodies and interact with potential drugs. Traditionally, methods like X-ray crystallography and cryo-electron microscopy have been used to unravel these structures, but they are often challenging, time-consuming and costly. Recently, a breakthrough in computational biology has emerged with the development of deep learning algorithms capable of predicting protein structures based on their amino acid sequences (Jumper, J., et al. Nature 2021, 596, 583. Lane, T. J. Nature Methods 2023, 20, 170. Kryshtafovych, A., et al. Proteins: Structure, Function and Bioinformatics 2021, 89, 1607). This study focuses on predicting the dynamic changes that proteins undergo upon ligand binding, specifically when they bind to allosteric sites, i.e. a pocket different from the active site. Allosteric modulators are particularly important for drug discovery, as they open new avenues for designing drugs that can target proteins more effectively and with fewer side effects (Nussinov, R.; Tsai, C. J. Cell 2013, 153, 293). To study this, we curated a data set of 578 X-ray structures comprised of proteins displaying orthosteric and allosteric binding as well as a general framework to evaluate deep learning-based structure prediction methods. Our findings demonstrate the potential and current limitations of deep learning methods, such as AlphaFold2 (Jumper, J., et al. Nature 2021, 596, 583), NeuralPLexer (Qiao, Z., et al. Nat Mach Intell 2024, 6, 195), and RoseTTAFold All-Atom (Krishna, R., et al. Science 2024, 384, eadl2528) to predict not just static protein structures but also the dynamic conformational changes. Herein we show that predicting the allosteric induce-fit conformation still poses a challenge to deep learning methods as they more accurately predict the orthosteric bound conformation compared to the allosteric induce fit conformation. For AlphaFold2, we observed that conformational diversity, and sampling between the apo and holo state could be increased by modifying the MSA depth, but this did not enhance the ability to generate conformations close to the allosteric induced-fit conformation. To further support advancements in protein structure prediction field, the curated data set and evaluation framework are made publicly available.
Collapse
Affiliation(s)
- Gustav Olanders
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, 43183 Gothenburg, Sweden
| | - Giulia Testa
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, 43183 Gothenburg, Sweden
| | - Alessandro Tibo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, 43183 Gothenburg, Sweden
| | - Eva Nittinger
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, 43183 Gothenburg, Sweden
| | - Christian Tyrchan
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, 43183 Gothenburg, Sweden
| |
Collapse
|
13
|
Cai L, Yue G, Chen Y, Wang L, Yao X, Zou Q, Fu X, Cao D. ET-PROTACs: modeling ternary complex interactions using cross-modal learning and ternary attention for accurate PROTAC-induced degradation prediction. Brief Bioinform 2024; 26:bbae654. [PMID: 39783892 PMCID: PMC11713031 DOI: 10.1093/bib/bbae654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 09/27/2024] [Accepted: 12/06/2024] [Indexed: 01/12/2025] Open
Abstract
MOTIVATION Accurately predicting the degradation capabilities of proteolysis-targeting chimeras (PROTACs) for given target proteins and E3 ligases is important for PROTAC design. The distinctive ternary structure of PROTACs presents a challenge to traditional drug-target interaction prediction methods, necessitating more innovative approaches. While current state-of-the-art (SOTA) methods using graph neural networks (GNNs) can discern the molecular structure of PROTACs and proteins, thus enabling the efficient prediction of PROTACs' degradation capabilities, they rely heavily on limited crystal structure data of the POI-PROTAC-E3 ternary complex. This reliance underutilizes rich PROTAC experimental data and neglects intricate interaction relationships within ternary complexes. RESULTS In this study, we propose a model based on cross-modal strategy and ternary attention technology, ET-PROTACs, to predict the targeted degradation capabilities of PROTACs. Our model capitalizes on the strengths of cross-modal methods by using equivariant GNN graph neural networks to process the graph structure and spatial coordinates of PROTAC molecules concurrently while utilizing sequence-based methods to learn the protein sequence information. This integration of cross-modal information is cohesively harnessed and channeled into a ternary attention mechanism, specially tailored for the unique structure of PROTACs, enabling the congruent modeling of both PROTAC and protein modalities. Experimental results demonstrate that the ET-PROTACs model outperforms existing SOTA methods. Moreover, visualizing attention scores illuminates crucial residues and atoms pivotal in specific POI-PROTAC-E3 interactions, thus offering invaluable insights and guidance for future pharmaceutical research. AVAILABILITY AND IMPLEMENTATION The codes of our model are available at https://github.com/GuanyuYue/ET-PROTACs.
Collapse
Affiliation(s)
- Lijun Cai
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Guanyu Yue
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Yifan Chen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Li Wang
- Degree Programs in Systems and Information Engineering, Graduate School of Science and Technology Doctoral Program in Computer Science, University of Tsukuba, Tsukuba, Japan
| | - Xiaojun Yao
- Faculty of Applied Sciences, Centre for Artificial Intelligence Driven Drug Discovery, Macao Polytechnic University, Macao 999078, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China
- Research Institute of Hunan University in Chongqing, Chongqing 401120, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410003, China
| |
Collapse
|
14
|
Kumar V, Deepak A, Ranjan A, Prakash A. Bi-SeqCNN: A Novel Light-Weight Bi-Directional CNN Architecture for Protein Function Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1922-1933. [PMID: 38990747 DOI: 10.1109/tcbb.2024.3426491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]
Abstract
Deep learning approaches, such as convolution neural networks (CNNs) and deep recurrent neural networks (RNNs), have been the backbone for predicting protein function, with promising state-of-the-art (SOTA) results. RNNs with an in-built ability (i) focus on past information, (ii) collect both short-and-long range dependency information, and (iii) bi-directional processing offers a strong sequential processing mechanism. CNNs, however, are confined to focusing on short-term information from both the past and the future, although they offer parallelism. Therefore, a novel bi-directional CNN that strictly complies with the sequential processing mechanism of RNNs is introduced and is used for developing a protein function prediction framework, Bi-SeqCNN. This is a sub-sequence-based framework. Further, Bi-SeqCNN is an ensemble approach to better the prediction results. To our knowledge, this is the first time bi-directional CNNs are employed for general temporal data analysis and not just for protein sequences. The proposed architecture produces improvements up to +5.5% over contemporary SOTA methods on three benchmark protein sequence datasets. Moreover, it is substantially lighter and attain these results with (0.50-0.70 times) fewer parameters than the SOTA methods.
Collapse
|
15
|
Liu J, Guo Z, You H, Zhang C, Lai L. All-Atom Protein Sequence Design Based on Geometric Deep Learning. Angew Chem Int Ed Engl 2024:e202411461. [PMID: 39295564 DOI: 10.1002/anie.202411461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 09/09/2024] [Accepted: 09/18/2024] [Indexed: 09/21/2024]
Abstract
Designing sequences for specific protein backbones is a key step in creating new functional proteins. Here, we introduce GeoSeqBuilder, a deep learning framework that integrates protein sequence generation with side chain conformation prediction to produce the complete all-atom structures for designed sequences. GeoSeqBuilder uses spatial geometric features from protein backbones and explicitly includes three-body interactions of neighboring residues. GeoSeqBuilder achieves native residue type recovery rate of 51.6 %, comparable to ProteinMPNN and other leading methods, while accurately predicting side chain conformations. We first used GeoSeqBuilder to design sequences for thioredoxin and a hallucinated three-helical bundle protein. All the 15 tested sequences expressed as soluble monomeric proteins with high thermal stability, and the 2 high-resolution crystal structures solved closely match the designed models. The generated protein sequences exhibit low similarity (minimum 23 %) to the original sequences, with significantly altered hydrophobic cores. We further redesigned the hydrophobic core of glutathione peroxidase 4, and 3 of the 5 designs showed improved enzyme activity. Although further testing is needed, the high experimental success rate in our testing demonstrates that GeoSeqBuilder is a powerful tool for designing novel sequences for predefined protein structures with atomic details. GeoSeqBuilder is available at https://github.com/PKUliujl/GeoSeqBuilder.
Collapse
Affiliation(s)
- Jiale Liu
- Center for Life Sciences Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Zheng Guo
- Center for Life Sciences Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Hantian You
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing, 100871, China
| | - Changsheng Zhang
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing, 100871, China
| | - Luhua Lai
- Center for Life Sciences Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing, 100871, China
- Center for Quantitative Biology Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
- Chengdu Academy for Advanced Interdisciplinary Biotechnologies, Peking University, Chengdu, 510100, Sichuan, China
| |
Collapse
|
16
|
Plett C, Grimme S, Hansen A. Toward Reliable Conformational Energies of Amino Acids and Dipeptides─The DipCONFS Benchmark and DipCONL Datasets. J Chem Theory Comput 2024. [PMID: 39259679 DOI: 10.1021/acs.jctc.4c00801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2024]
Abstract
Simulating peptides and proteins is becoming increasingly important, leading to a growing need for efficient computational methods. These are typically semiempirical quantum mechanical (SQM) methods, force fields (FFs), or machine-learned interatomic potentials (MLIPs), all of which require a large amount of accurate data for robust training and evaluation. To assess potential reference methods and complement the available data, we introduce two sets, DipCONFL and DipCONFS, which cover large parts of the conformational space of 17 amino acids and their 289 possible dipeptides in aqueous solution. The conformers were selected from the exhaustive PeptideCS dataset by Andris et al. [ J. Phys. Chem. B 2022, 126, 5949-5958]. The structures, originally generated with GFN2-xTB, were reoptimized using the accurate r2SCAN-3c density functional theory (DFT) composite method including the implicit CPCM water solvation model. The DipCONFS benchmark set contains 918 conformers and is one of the largest sets with highly accurate coupled cluster conformational energies so far. It is employed to evaluate various DFT and wave function theory (WFT) methods, especially regarding whether they are accurate enough to be used as reliable reference methods for larger datasets intended for training and testing more approximated SQM, FF, and MLIP methods. The results reveal that the originally provided BP86-D3(BJ)/DGauss-DZVP conformational energies are not sufficiently accurate. Among the DFT methods tested as an alternative reference level, the revDSD-PBEP86-D4 double hybrid performs best with a mean absolute error (MAD) of 0.2 kcal mol-1 compared with the PNO-LCCSD(T)-F12b reference. The very efficient r2SCAN-3c composite method also shows excellent results, with an MAD of 0.3 kcal mol-1, similar to the best-tested hybrid ωB97M-D4. With these findings, we compiled the large DipCONFL set, which includes over 29,000 realistic conformers in solution with reasonably accurate r2SCAN-3c reference conformational energies, gradients, and further properties potentially relevant for training MLIP methods. This set, also in comparison to DipCONFS, is used to assess the performance of various SQM, FF, and MLIP methods robustly and can complement training sets for those.
Collapse
Affiliation(s)
- Christoph Plett
- Mulliken Center for Theoretical Chemistry, Clausius-Institut für Physikalische und Theoretische Chemie, Universität Bonn, Beringstraße 4, 53115 Bonn, Germany
| | - Stefan Grimme
- Mulliken Center for Theoretical Chemistry, Clausius-Institut für Physikalische und Theoretische Chemie, Universität Bonn, Beringstraße 4, 53115 Bonn, Germany
| | - Andreas Hansen
- Mulliken Center for Theoretical Chemistry, Clausius-Institut für Physikalische und Theoretische Chemie, Universität Bonn, Beringstraße 4, 53115 Bonn, Germany
| |
Collapse
|
17
|
Glukhov E, Kalitin D, Stepanenko D, Zhu Y, Nguyen T, Jones G, Patsahan T, Simmerling C, Mitchell JC, Vajda S, Dill KA, Padhorny D, Kozakov D. MHC-Fine: Fine-tuned AlphaFold for precise MHC-peptide complex prediction. Biophys J 2024; 123:2902-2909. [PMID: 38751115 PMCID: PMC11393670 DOI: 10.1016/j.bpj.2024.05.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 04/13/2024] [Accepted: 05/10/2024] [Indexed: 05/28/2024] Open
Abstract
The precise prediction of major histocompatibility complex (MHC)-peptide complex structures is pivotal for understanding cellular immune responses and advancing vaccine design. In this study, we enhanced AlphaFold's capabilities by fine-tuning it with a specialized dataset consisting of exclusively high-resolution class I MHC-peptide crystal structures. This tailored approach aimed to address the generalist nature of AlphaFold's original training, which, while broad-ranging, lacked the granularity necessary for the high-precision demands of class I MHC-peptide interaction prediction. A comparative analysis was conducted against the homology-modeling-based method Pandora as well as the AlphaFold multimer model. Our results demonstrate that our fine-tuned model outperforms others in terms of root-mean-square deviation (median value for Cα atoms for peptides is 0.66 Å) and also provides enhanced predicted local distance difference test scores, offering a more reliable assessment of the predicted structures. These advances have substantial implications for computational immunology, potentially accelerating the development of novel therapeutics and vaccines by providing a more precise computational lens through which to view MHC-peptide interactions.
Collapse
Affiliation(s)
- Ernest Glukhov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York; Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York
| | - Dmytro Kalitin
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York; Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York; Faculty of Applied Science, Ukrainian Catholic University, Lviv, Ukraine
| | - Darya Stepanenko
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York; Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York
| | - Yimin Zhu
- Department of Computer Science, Stony Brook University, Stony Brook, New York; Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York
| | - Thu Nguyen
- Department of Computer Science, Stony Brook University, Stony Brook, New York; Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York
| | - George Jones
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York; Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York
| | - Taras Patsahan
- Institute for Condensed Matter Physics of the National Academy of Sciences of Ukraine, Lviv, Ukraine; Institute of Applied Mathematics and Fundamental Sciences, Lviv Polytechnic National University, Lviv, Ukraine
| | - Carlos Simmerling
- Department of Chemistry, Stony Brook University, Stony Brook, New York; Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York
| | - Julie C Mitchell
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts
| | - Ken A Dill
- Department of Chemistry, Stony Brook University, Stony Brook, New York; Department of Physics and Astronomy, Stony Brook University, Stony Brook, New York; Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York
| | - Dzmitry Padhorny
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York.
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York; Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York.
| |
Collapse
|
18
|
Szikszai M, Magnus M, Sanghi S, Kadyan S, Bouatta N, Rivas E. RNA3DB: A structurally-dissimilar dataset split for training and benchmarking deep learning models for RNA structure prediction. J Mol Biol 2024; 436:168552. [PMID: 38552946 PMCID: PMC11377173 DOI: 10.1016/j.jmb.2024.168552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 03/19/2024] [Accepted: 03/22/2024] [Indexed: 04/09/2024]
Abstract
With advances in protein structure prediction thanks to deep learning models like AlphaFold, RNA structure prediction has recently received increased attention from deep learning researchers. RNAs introduce substantial challenges due to the sparser availability and lower structural diversity of the experimentally resolved RNA structures in comparison to protein structures. These challenges are often poorly addressed by the existing literature, many of which report inflated performance due to using training and testing sets with significant structural overlap. Further, the most recent Critical Assessment of Structure Prediction (CASP15) has shown that deep learning models for RNA structure are currently outperformed by traditional methods. In this paper we present RNA3DB, a dataset of structured RNAs, derived from the Protein Data Bank (PDB), that is designed for training and benchmarking deep learning models. The RNA3DB method arranges the RNA 3D chains into distinct groups (Components) that are non-redundant both with regard to sequence as well as structure, providing a robust way of dividing training, validation, and testing sets. Any split of these structurally-dissimilar Components are guaranteed to produce test and validations sets that are distinct by sequence and structure from those in the training set. We provide the RNA3DB dataset, a particular train/test split of the RNA3DB Components (in an approximate 70/30 ratio) that will be updated periodically. We also provide the RNA3DB methodology along with the source-code, with the goal of creating a reproducible and customizable tool for producing structurally-dissimilar dataset splits for structural RNAs.
Collapse
Affiliation(s)
- Marcell Szikszai
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Marcin Magnus
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Siddhant Sanghi
- Department of Systems Biology, Columbia University, New York 10027, NY, USA; College of Biological Sciences, UC Davis, Davis 95616, CA, USA
| | - Sachin Kadyan
- Department of Systems Biology, Columbia University, New York 10027, NY, USA
| | - Nazim Bouatta
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston 02115, MA, USA
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| |
Collapse
|
19
|
Giladi M, Montgomery AP, Kassiou M, Danon JJ. Structure-based drug design for TSPO: Challenges and opportunities. Biochimie 2024; 224:41-50. [PMID: 38782353 DOI: 10.1016/j.biochi.2024.05.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 04/27/2024] [Accepted: 05/21/2024] [Indexed: 05/25/2024]
Abstract
The translocator protein 18 kDa (TSPO) is an evolutionarily conserved mitochondrial transmembrane protein implicated in various neuropathologies and inflammatory conditions, making it a longstanding diagnostic and therapeutic target of interest. Despite the development of various classes of TSPO ligand chemotypes, and the elucidation of bacterial and non-human mammalian experimental structures, many unknowns exist surrounding its differential structural and functional features in health and disease. There are several limitations associated with currently used computational methodologies for modelling the native structure and ligand-binding behaviour of this enigmatic protein. In this perspective, we provide a critical analysis of the developments in the uses of these methods, outlining their uses, inherent limitations, and continuing challenges. We offer suggestions of unexplored opportunities that exist in the use of computational methodologies which offer promise for enhancing our understanding of the TSPO.
Collapse
Affiliation(s)
- Mia Giladi
- School of Chemistry, The University of Sydney, 2050, Sydney, NSW, Australia
| | | | - Michael Kassiou
- School of Chemistry, The University of Sydney, 2050, Sydney, NSW, Australia.
| | - Jonathan J Danon
- School of Chemistry, The University of Sydney, 2050, Sydney, NSW, Australia.
| |
Collapse
|
20
|
Kumar H, Kim P. Artificial intelligence in fusion protein three-dimensional structure prediction: Review and perspective. Clin Transl Med 2024; 14:e1789. [PMID: 39090739 PMCID: PMC11294035 DOI: 10.1002/ctm2.1789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 07/16/2024] [Accepted: 07/19/2024] [Indexed: 08/04/2024] Open
Abstract
Recent advancements in artificial intelligence (AI) have accelerated the prediction of unknown protein structures. However, accurately predicting the three-dimensional (3D) structures of fusion proteins remains a difficult task because the current AI-based protein structure predictions are focused on the WT proteins rather than on the newly fused proteins in nature. Following the central dogma of biology, fusion proteins are translated from fusion transcripts, which are made by transcribing the fusion genes between two different loci through the chromosomal rearrangements in cancer. Accurately predicting the 3D structures of fusion proteins is important for understanding the functional roles and mechanisms of action of new chimeric proteins. However, predicting their 3D structure using a template-based model is challenging because known template structures are often unavailable in databases. Deep learning (DL) models that utilize multi-level protein information have revolutionized the prediction of protein 3D structures. In this review paper, we highlighted the latest advancements and ongoing challenges in predicting the 3D structure of fusion proteins using DL models. We aim to explore both the advantages and challenges of employing AlphaFold2, RoseTTAFold, tr-Rosetta and D-I-TASSER for modelling the 3D structures. HIGHLIGHTS: This review provides the overall pipeline and landscape of the prediction of the 3D structure of fusion protein. This review provides the factors that should be considered in predicting the 3D structures of fusion proteins using AI approaches in each step. This review highlights the latest advancements and ongoing challenges in predicting the 3D structure of fusion proteins using deep learning models. This review explores the advantages and challenges of employing AlphaFold2, RoseTTAFold, tr-Rosetta, and D-I-TASSER to model 3D structures.
Collapse
Affiliation(s)
- Himansu Kumar
- Department of Bioinformatics and Systems MedicineMcWilliams School of Biomedical InformaticsThe University of Texas Health Science Center at HoustonHoustonTexasUSA
| | - Pora Kim
- Department of Bioinformatics and Systems MedicineMcWilliams School of Biomedical InformaticsThe University of Texas Health Science Center at HoustonHoustonTexasUSA
| |
Collapse
|
21
|
Vila JA. Analysis of proteins in the light of mutations. EUROPEAN BIOPHYSICS JOURNAL : EBJ 2024; 53:255-265. [PMID: 38955858 DOI: 10.1007/s00249-024-01714-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 05/23/2024] [Accepted: 06/18/2024] [Indexed: 07/04/2024]
Abstract
Proteins have evolved through mutations-amino acid substitutions-since life appeared on Earth, some 109 years ago. The study of these phenomena has been of particular significance because of their impact on protein stability, function, and structure. This study offers a new viewpoint on how the most recent findings in these areas can be used to explore the impact of mutations on protein sequence, stability, and evolvability. Preliminary results indicate that: (1) mutations can be viewed as sensitive probes to identify 'typos' in the amino-acid sequence, and also to assess the resistance of naturally occurring proteins to unwanted sequence alterations; (2) the presence of 'typos' in the amino acid sequence, rather than being an evolutionary obstacle, could promote faster evolvability and, in turn, increase the likelihood of higher protein stability; (3) the mutation site is far more important than the substituted amino acid in terms of the marginal stability changes of the protein, and (4) the unpredictability of protein evolution at the molecular level-by mutations-exists even in the absence of epistasis effects. Finally, the Darwinian concept of evolution "descent with modification" and experimental evidence endorse one of the results of this study, which suggests that some regions of any protein sequence are susceptible to mutations while others are not. This work contributes to our general understanding of protein responses to mutations and may spur significant progress in our efforts to develop methods to accurately forecast changes in protein stability, their propensity for metamorphism, and their ability to evolve.
Collapse
Affiliation(s)
- Jorge A Vila
- IMASL-CONICET, Universidad Nacional de San Luis, Ejército de los Andes 950, 5700, San Luis, Argentina.
| |
Collapse
|
22
|
Zhao L, Li J, Zhan W, Jiang X, Zhang B. Prediction of protein secondary structure by the improved TCN-BiLSTM-MHA model with knowledge distillation. Sci Rep 2024; 14:16488. [PMID: 39020005 PMCID: PMC11255250 DOI: 10.1038/s41598-024-67403-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 07/10/2024] [Indexed: 07/19/2024] Open
Abstract
Secondary structure prediction is a key step in understanding protein function and biological properties and is highly important in the fields of new drug development, disease treatment, bioengineering, etc. Accurately predicting the secondary structure of proteins helps to reveal how proteins are folded and how they function in cells. The application of deep learning models in protein structure prediction is particularly important because of their ability to process complex sequence information and extract meaningful patterns and features, thus significantly improving the accuracy and efficiency of prediction. In this study, a combined model integrating an improved temporal convolutional network (TCN), bidirectional long short-term memory (BiLSTM), and a multi-head attention (MHA) mechanism is proposed to enhance the accuracy of protein prediction in both eight-state and three-state structures. One-hot encoding features and word vector representations of physicochemical properties are incorporated. A significant emphasis is placed on knowledge distillation techniques utilizing the ProtT5 pretrained model, leading to performance improvements. The improved TCN, achieved through multiscale fusion and bidirectional operations, allows for better extraction of amino acid sequence features than traditional TCN models. The model demonstrated excellent prediction performance on multiple datasets. For the TS115, CB513 and PDB (2018-2020) datasets, the prediction accuracy of the eight-state structure of the six datasets in this paper reached 88.2%, 84.9%, and 95.3%, respectively, and the prediction accuracy of the three-state structure reached 91.3%, 90.3%, and 96.8%, respectively. This study not only improves the accuracy of protein secondary structure prediction but also provides an important tool for understanding protein structure and function, which is particularly applicable to resource-constrained contexts and provides a valuable tool for understanding protein structure and function.
Collapse
Affiliation(s)
- Lufei Zhao
- Agricultural Science and Engineering School, Liaocheng University, Liaocheng, 252059, China
| | - Jingyi Li
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, 430073, China
| | - Weiqiang Zhan
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, 430073, China
| | - Xuchu Jiang
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, 430073, China.
- Emergency Management Research Center, Zhongnan University of Economics and Law, Wuhan, 430073, China.
| | - Biao Zhang
- School of Computer Science, Liaocheng University, Liaocheng, 252059, China
| |
Collapse
|
23
|
Saharkhiz S, Mostafavi M, Birashk A, Karimian S, Khalilollah S, Jaferian S, Yazdani Y, Alipourfard I, Huh YS, Farani MR, Akhavan-Sigari R. The State-of-the-Art Overview to Application of Deep Learning in Accurate Protein Design and Structure Prediction. Top Curr Chem (Cham) 2024; 382:23. [PMID: 38965117 PMCID: PMC11224075 DOI: 10.1007/s41061-024-00469-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 06/09/2024] [Indexed: 07/06/2024]
Abstract
In recent years, there has been a notable increase in the scientific community's interest in rational protein design. The prospect of designing an amino acid sequence that can reliably fold into a desired three-dimensional structure and exhibit the intended function is captivating. However, a major challenge in this endeavor lies in accurately predicting the resulting protein structure. The exponential growth of protein databases has fueled the advancement of the field, while newly developed algorithms have pushed the boundaries of what was previously achievable in structure prediction. In particular, using deep learning methods instead of brute force approaches has emerged as a faster and more accurate strategy. These deep-learning techniques leverage the vast amount of data available in protein databases to extract meaningful patterns and predict protein structures with improved precision. In this article, we explore the recent developments in the field of protein structure prediction. We delve into the newly developed methods that leverage deep learning approaches, highlighting their significance and potential for advancing our understanding of protein design.
Collapse
Affiliation(s)
- Saber Saharkhiz
- Division of Neuroscience, Department of Cellular and Molecular Medicine, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada
| | - Mehrnaz Mostafavi
- Faculty of Allied Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Amin Birashk
- Department of Computer Science, The University of Texas at Dallas, Richardson, TX, USA
| | - Shiva Karimian
- Electrical and Computer Research Center, Sanandaj Azad University, Sanandaj, Iran
| | - Shayan Khalilollah
- Department of Neurosurgery, Faculty of Medicine, Tehran Medical Sciences, Islamic Azad University, Tehran, Iran
| | - Sohrab Jaferian
- Goergen Institute for Data Science, University of Rochester, Rochester, NY, USA
| | - Yalda Yazdani
- Immunology Research Center, Tabriz University of Medical Sciences, Tabriz, Iran.
| | - Iraj Alipourfard
- Institute of Physical Chemistry, Polish Academy of Sciences, Marcina Kasprzaka 44/52, 01-224, Warsaw, Poland.
| | - Yun Suk Huh
- Department of Biological Engineering, Inha University, Incheon, Republic of Korea
| | | | | |
Collapse
|
24
|
Wang R, Huang S, Wang P, Shi X, Li S, Ye Y, Zhang W, Shi L, Zhou X, Tang X. Bibliometric analysis of the application of deep learning in cancer from 2015 to 2023. Cancer Imaging 2024; 24:85. [PMID: 38965599 PMCID: PMC11223420 DOI: 10.1186/s40644-024-00737-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 06/27/2024] [Indexed: 07/06/2024] Open
Abstract
BACKGROUND Recently, the application of deep learning (DL) has made great progress in various fields, especially in cancer research. However, to date, the bibliometric analysis of the application of DL in cancer is scarce. Therefore, this study aimed to explore the research status and hotspots of the application of DL in cancer. METHODS We retrieved all articles on the application of DL in cancer from the Web of Science database Core Collection database. Biblioshiny, VOSviewer and CiteSpace were used to perform the bibliometric analysis through analyzing the numbers, citations, countries, institutions, authors, journals, references, and keywords. RESULTS We found 6,016 original articles on the application of DL in cancer. The number of annual publications and total citations were uptrend in general. China published the greatest number of articles, USA had the highest total citations, and Saudi Arabia had the highest centrality. Chinese Academy of Sciences was the most productive institution. Tian, Jie published the greatest number of articles, while He Kaiming was the most co-cited author. IEEE Access was the most popular journal. The analysis of references and keywords showed that DL was mainly used for the prediction, detection, classification and diagnosis of breast cancer, lung cancer, and skin cancer. CONCLUSIONS Overall, the number of articles on the application of DL in cancer is gradually increasing. In the future, further expanding and improving the application scope and accuracy of DL applications, and integrating DL with protein prediction, genomics and cancer research may be the research trends.
Collapse
Affiliation(s)
- Ruiyu Wang
- Department of Gastroenterology, The Affiliated Hospital of Southwest Medical University, Street Taiping No.25, Region Jiangyang, Luzhou, Sichuan Province, 646099, China
- Nuclear Medicine and Molecular Imaging Key Laboratory of Sichuan Province, Luzhou, China
| | - Shu Huang
- Department of Gastroenterology, Lianshui County People' Hospital, Huaian, China
- Department of Gastroenterology, Lianshui People' Hospital of Kangda CollegeAffiliated to, Nanjing Medical University , Huaian, China
| | - Ping Wang
- Department of Gastroenterology, The Affiliated Hospital of Southwest Medical University, Street Taiping No.25, Region Jiangyang, Luzhou, Sichuan Province, 646099, China
- Nuclear Medicine and Molecular Imaging Key Laboratory of Sichuan Province, Luzhou, China
| | - Xiaomin Shi
- Department of Gastroenterology, The Affiliated Hospital of Southwest Medical University, Street Taiping No.25, Region Jiangyang, Luzhou, Sichuan Province, 646099, China
- Nuclear Medicine and Molecular Imaging Key Laboratory of Sichuan Province, Luzhou, China
| | - Shiqi Li
- Department of Gastroenterology, The Affiliated Hospital of Southwest Medical University, Street Taiping No.25, Region Jiangyang, Luzhou, Sichuan Province, 646099, China
- Nuclear Medicine and Molecular Imaging Key Laboratory of Sichuan Province, Luzhou, China
| | - Yusong Ye
- Department of Gastroenterology, The Affiliated Hospital of Southwest Medical University, Street Taiping No.25, Region Jiangyang, Luzhou, Sichuan Province, 646099, China
- Nuclear Medicine and Molecular Imaging Key Laboratory of Sichuan Province, Luzhou, China
| | - Wei Zhang
- Department of Gastroenterology, The Affiliated Hospital of Southwest Medical University, Street Taiping No.25, Region Jiangyang, Luzhou, Sichuan Province, 646099, China
- Nuclear Medicine and Molecular Imaging Key Laboratory of Sichuan Province, Luzhou, China
| | - Lei Shi
- Department of Gastroenterology, The Affiliated Hospital of Southwest Medical University, Street Taiping No.25, Region Jiangyang, Luzhou, Sichuan Province, 646099, China
- Nuclear Medicine and Molecular Imaging Key Laboratory of Sichuan Province, Luzhou, China
| | - Xian Zhou
- Department of Gastroenterology, The Affiliated Hospital of Southwest Medical University, Street Taiping No.25, Region Jiangyang, Luzhou, Sichuan Province, 646099, China.
- Nuclear Medicine and Molecular Imaging Key Laboratory of Sichuan Province, Luzhou, China.
| | - Xiaowei Tang
- Department of Gastroenterology, The Affiliated Hospital of Southwest Medical University, Street Taiping No.25, Region Jiangyang, Luzhou, Sichuan Province, 646099, China.
- Nuclear Medicine and Molecular Imaging Key Laboratory of Sichuan Province, Luzhou, China.
| |
Collapse
|
25
|
Liu S, Yang Q, Zhang L, Luo S. Accurate Protein p Ka Prediction with Physical Organic Chemistry Guided 3D Protein Representation. J Chem Inf Model 2024; 64:4410-4418. [PMID: 38780156 DOI: 10.1021/acs.jcim.4c00354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
Protein pKa is a fundamental physicochemical parameter that dictates protein structure and function. However, accurately determining protein site-pKa values remains a substantial challenge, both experimentally and theoretically. In this study, we introduce a physical organic approach, leveraging a protein structural and physical-organic-parameter-based representation (P-SPOC), to develop a rapid and intuitive model for protein pKa prediction. Our P-SPOC model achieves state-of-the-art predictive accuracy, with a mean absolute error (MAE) of 0.33 pKa units. Furthermore, we have incorporated advanced protein structure prediction models, like AlphaFold2, to approximate structures for proteins lacking three-dimensional representations, which enhances the applicability of our model in the context of structure-undetermined protein research. To promote broader accessibility within the research community, an online prediction interface was also established at isyn.luoszgroup.com.
Collapse
Affiliation(s)
- Siyuan Liu
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Qi Yang
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Long Zhang
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Sanzhong Luo
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| |
Collapse
|
26
|
Zhao N, Wu T, Wang W, Zhang L, Gong X. Review and Comparative Analysis of Methods and Advancements in Predicting Protein Complex Structure. Interdiscip Sci 2024; 16:261-288. [PMID: 38955920 DOI: 10.1007/s12539-024-00626-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 02/29/2024] [Accepted: 03/01/2024] [Indexed: 07/04/2024]
Abstract
Protein complexes perform diverse biological functions, and obtaining their three-dimensional structure is critical to understanding and grasping their functions. In many cases, it's not just two proteins interacting to form a dimer; instead, multiple proteins interact to form a multimer. Experimentally resolving protein complex structures can be quite challenging. Recently, there have been efforts and methods that build upon prior predictions of dimer structures to attempt to predict multimer structures. However, in comparison to monomeric protein structure prediction, the accuracy of protein complex structure prediction remains relatively low. This paper provides an overview of recent advancements in efficient computational models for predicting protein complex structures. We introduce protein-protein docking methods in detail and summarize their main ideas, applicable modes, and related information. To enhance prediction accuracy, other critical protein-related information is also integrated, such as predicting interchain residue contact, utilizing experimental data like cryo-EM experiments, and considering protein interactions and non-interactions. In addition, we comprehensively review computational approaches for end-to-end prediction of protein complex structures based on artificial intelligence (AI) technology and describe commonly used datasets and representative evaluation metrics in protein complexes. Finally, we analyze the formidable challenges faced in current protein complex structure prediction tasks, including the structure prediction of heteromeric complex, disordered regions in complex, antibody-antigen complex, and RNA-related complex, as well as the evaluation metrics for complex assessment. We hope that this work will provide comprehensive knowledge of complex structure predictions to contribute to future advanced predictions.
Collapse
Affiliation(s)
- Nan Zhao
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Tong Wu
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Wenda Wang
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Lunchuan Zhang
- School of Mathematics, Renmin University of China, Beijing, 100872, China.
| | - Xinqi Gong
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China.
- School of Mathematics, Renmin University of China, Beijing, 100872, China.
- Beijing Academy of Artificial Intelligence, Beijing, 100084, China.
| |
Collapse
|
27
|
Lee S, Kim G, Karin EL, Mirdita M, Park S, Chikhi R, Babaian A, Kryshtafovych A, Steinegger M. Petabase-Scale Homology Search for Structure Prediction. Cold Spring Harb Perspect Biol 2024; 16:a041465. [PMID: 38316555 PMCID: PMC11065157 DOI: 10.1101/cshperspect.a041465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2024]
Abstract
The recent CASP15 competition highlighted the critical role of multiple sequence alignments (MSAs) in protein structure prediction, as demonstrated by the success of the top AlphaFold2-based prediction methods. To push the boundaries of MSA utilization, we conducted a petabase-scale search of the Sequence Read Archive (SRA), resulting in gigabytes of aligned homologs for CASP15 targets. These were merged with default MSAs produced by ColabFold-search and provided to ColabFold-predict. By using SRA data, we achieved highly accurate predictions (GDT_TS > 70) for 66% of the non-easy targets, whereas using ColabFold-search default MSAs scored highly in only 52%. Next, we tested the effect of deep homology search and ColabFold's advanced features, such as more recycles, on prediction accuracy. While SRA homologs were most significant for improving ColabFold's CASP15 ranking from 11th to 3rd place, other strategies contributed too. We analyze these in the context of existing strategies to improve prediction.
Collapse
Affiliation(s)
- Sewon Lee
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
| | - Gyuri Kim
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
| | | | - Milot Mirdita
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
| | - Sukhwan Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea
| | - Rayan Chikhi
- Institut Pasteur, Université Paris Cité, G5 Sequence Bioinformatics, 75015 Paris, France
| | - Artem Babaian
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | | | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul 08826, South Korea
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, South Korea
| |
Collapse
|
28
|
Penunuri G, Wang P, Corbett-Detig R, Russell SL. A Structural Proteome Screen Identifies Protein Mimicry in Host-Microbe Systems. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.10.588793. [PMID: 38645127 PMCID: PMC11030372 DOI: 10.1101/2024.04.10.588793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Host-microbe systems are evolutionary niches that produce coevolved biological interactions and are a key component of global health. However, these systems have historically been a difficult field of biological research due to their experimental intractability. Impactful advances in global health will be obtained by leveraging in silico screens to identify genes involved in mediating interspecific interactions. These predictions will progress our understanding of these systems and lay the groundwork for future in vitro and in vivo experiments and bioengineering projects. A driver of host-manipulation and intracellular survival utilized by host-associated microbes is molecular mimicry, a critical mechanism that can occur at any level from DNA to protein structures. We applied protein structure prediction and alignment tools to explore host-associated bacterial structural proteomes for examples of protein structure mimicry. By leveraging the Legionella pneumophila proteome and its many known structural mimics, we developed and validated a screen that can be applied to virtually any host-microbe system to uncover signals of protein mimicry. These mimics represent candidate proteins that mediate host interactions in microbial proteomes. We successfully applied this screen to other microbes with demonstrated effects on global health, Helicobacter pylori and Wolbachia , identifying protein mimic candidates in each proteome. We discuss the roles these candidates may play in important Wolbachia -induced phenotypes and show that Wobachia infection can partially rescue the loss of one of these factors. This work demonstrates how a genome-wide screen for candidates of host-manipulation and intracellular survival offers an opportunity to identify functionally important genes in host-microbe systems.
Collapse
|
29
|
Li C, Yao J, Wei W, Niu Z, Zeng X, Li J, Wang J. Geometry-Based Molecular Generation With Deep Constrained Variational Autoencoder. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:4852-4861. [PMID: 35171779 DOI: 10.1109/tnnls.2022.3147790] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Finding target molecules with specific chemical properties plays a decisive role in drug development. We proposed GEOM-CVAE, a constrained variational autoencoder based on geometric representation for molecular generation with specific properties, which is protein-context-dependent. In terms of machine learning, it includes continuous feature embedding encoder and molecular generation decoder. Our key contribution is to propose an efficient geometric embedding method, including the spatial structure representations of drug molecule (converting the 3-D coordinates into image) and the geometric graph representations of protein target (modeling the protein surface as a mesh). The 3-D geometric information is vital to successful molecular generation, which is different from previous molecular generative methods based on 1-D or 2-D. Our model framework generates specific molecules in two phases, by first generating special image with molecular 3-D information to learn latent representations and generating molecules with constrained condition based on geometric graph convolution for specific protein and then inputting the generated structural molecules into a parser network for obtaining Simplified Molecular Input Line Entry System (SMILES) strings. Our model achieves competitive performance that implies its potential effectiveness to enable the exploration of the vast chemical space for drug discovery.
Collapse
|
30
|
Monteiro da Silva G, Cui JY, Dalgarno DC, Lisi GP, Rubenstein BM. High-throughput prediction of protein conformational distributions with subsampled AlphaFold2. Nat Commun 2024; 15:2464. [PMID: 38538622 PMCID: PMC10973385 DOI: 10.1038/s41467-024-46715-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 02/28/2024] [Indexed: 04/12/2024] Open
Abstract
This paper presents an innovative approach for predicting the relative populations of protein conformations using AlphaFold 2, an AI-powered method that has revolutionized biology by enabling the accurate prediction of protein structures. While AlphaFold 2 has shown exceptional accuracy and speed, it is designed to predict proteins' ground state conformations and is limited in its ability to predict conformational landscapes. Here, we demonstrate how AlphaFold 2 can directly predict the relative populations of different protein conformations by subsampling multiple sequence alignments. We tested our method against nuclear magnetic resonance experiments on two proteins with drastically different amounts of available sequence data, Abl1 kinase and the granulocyte-macrophage colony-stimulating factor, and predicted changes in their relative state populations with more than 80% accuracy. Our subsampling approach worked best when used to qualitatively predict the effects of mutations or evolution on the conformational landscape and well-populated states of proteins. It thus offers a fast and cost-effective way to predict the relative populations of protein conformations at even single-point mutation resolution, making it a useful tool for pharmacology, analysis of experimental results, and predicting evolution.
Collapse
Affiliation(s)
| | - Jennifer Y Cui
- Brown University Department of Molecular and Cell Biology and Biochemistry, Providence, RI, USA
| | | | - George P Lisi
- Brown University Department of Molecular and Cell Biology and Biochemistry, Providence, RI, USA
- Brown University Department of Chemistry, Providence, RI, USA
| | - Brenda M Rubenstein
- Brown University Department of Molecular and Cell Biology and Biochemistry, Providence, RI, USA.
- Brown University Department of Chemistry, Providence, RI, USA.
| |
Collapse
|
31
|
Szikszai M, Magnus M, Sanghi S, Kadyan S, Bouatta N, Rivas E. RNA3DB: A structurally-dissimilar dataset split for training and benchmarking deep learning models for RNA structure prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.30.578025. [PMID: 38352531 PMCID: PMC10862857 DOI: 10.1101/2024.01.30.578025] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
With advances in protein structure prediction thanks to deep learning models like AlphaFold, RNA structure prediction has recently received increased attention from deep learning researchers. RNAs introduce substantial challenges due to the sparser availability and lower structural diversity of the experimentally resolved RNA structures in comparison to protein structures. These challenges are often poorly addressed by the existing literature, many of which report inflated performance due to using training and testing sets with significant structural overlap. Further, the most recent Critical Assessment of Structure Prediction (CASP15) has shown that deep learning models for RNA structure are currently outperformed by traditional methods. In this paper we present RNA3DB, a dataset of structured RNAs, derived from the Protein Data Bank (PDB), that is designed for training and benchmarking deep learning models. The RNA3DB method arranges the RNA 3D chains into distinct groups (Components) that are non-redundant both with regard to sequence as well as structure, providing a robust way of dividing training, validation, and testing sets. Any split of these structurally-dissimilar Components are guaranteed to produce test and validations sets that are distinct by sequence and structure from those in the training set. We provide the RNA3DB dataset, a particular train/test split of the RNA3DB Components (in an approximate 70/30 ratio) that will be updated periodically. We also provide the RNA3DB methodology along with the source-code, with the goal of creating a reproducible and customizable tool for producing structurally-dissimilar dataset splits for structural RNAs.
Collapse
Affiliation(s)
- Marcell Szikszai
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Marcin Magnus
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Siddhant Sanghi
- Department of Systems Biology, Columbia University, New York, 10027, NY, USA
- College of Biological Sciences, UC Davis, Davis, 95616, CA, USA
| | - Sachin Kadyan
- Department of Systems Biology, Columbia University, New York, 10027, NY, USA
| | - Nazim Bouatta
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, 02115, MA, USA
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| |
Collapse
|
32
|
Wenzel M, Grüner E, Strodthoff N. Insights into the inner workings of transformer models for protein function prediction. Bioinformatics 2024; 40:btae031. [PMID: 38244570 PMCID: PMC10950482 DOI: 10.1093/bioinformatics/btae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 12/14/2023] [Accepted: 01/16/2024] [Indexed: 01/22/2024] Open
Abstract
MOTIVATION We explored how explainable artificial intelligence (XAI) can help to shed light into the inner workings of neural networks for protein function prediction, by extending the widely used XAI method of integrated gradients such that latent representations inside of transformer models, which were finetuned to Gene Ontology term and Enzyme Commission number prediction, can be inspected too. RESULTS The approach enabled us to identify amino acids in the sequences that the transformers pay particular attention to, and to show that these relevant sequence parts reflect expectations from biology and chemistry, both in the embedding layer and inside of the model, where we identified transformer heads with a statistically significant correspondence of attribution maps with ground truth sequence annotations (e.g. transmembrane regions, active sites) across many proteins. AVAILABILITY AND IMPLEMENTATION Source code can be accessed at https://github.com/markuswenzel/xai-proteins.
Collapse
Affiliation(s)
- Markus Wenzel
- Department of Artificial Intelligence, Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institut, HHI, Einsteinufer 37, 10587 Berlin, Germany
| | - Erik Grüner
- Department of Artificial Intelligence, Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institut, HHI, Einsteinufer 37, 10587 Berlin, Germany
| | - Nils Strodthoff
- School VI - Medicine and Health Services, Carl von Ossietzky University of Oldenburg, Ammerländer Heerstr. 114-118, 26129 Oldenburg, Germany
| |
Collapse
|
33
|
Morera H, Dave P, Kolinko Y, Alahmari S, Anderson A, Denham G, Davis C, Riano J, Goldgof D, Hall LO, Harry GJ, Mouton PR. A novel deep learning-based method for automatic stereology of microglia cells from low magnification images. Neurotoxicol Teratol 2024; 102:107336. [PMID: 38402997 DOI: 10.1016/j.ntt.2024.107336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 01/31/2024] [Accepted: 02/21/2024] [Indexed: 02/27/2024]
Abstract
Microglial cells mediate diverse homeostatic, inflammatory, and immune processes during normal development and in response to cytotoxic challenges. During these functional activities, microglial cells undergo distinct numerical and morphological changes in different tissue volumes in both rodent and human brains. However, it remains unclear how these cytostructural changes in microglia correlate with region-specific neurochemical functions. To better understand these relationships, neuroscientists need accurate, reproducible, and efficient methods for quantifying microglial cell number and morphologies in histological sections. To address this deficit, we developed a novel deep learning (DL)-based classification, stereology approach that links the appearance of Iba1 immunostained microglial cells at low magnification (20×) with the total number of cells in the same brain region based on unbiased stereology counts as ground truth. Once DL models are trained, total microglial cell numbers in specific regions of interest can be estimated and treatment groups predicted in a high-throughput manner (<1 min) using only low-power images from test cases, without the need for time and labor-intensive stereology counts or morphology ratings in test cases. Results for this DL-based automatic stereology approach on two datasets (total 39 mouse brains) showed >90% accuracy, 100% percent repeatability (Test-Retest) and 60× greater efficiency than manual stereology (<1 min vs. ∼ 60 min) using the same tissue sections. Ongoing and future work includes use of this DL-based approach to establish clear neurodegeneration profiles in age-related human neurological diseases and related animal models.
Collapse
Affiliation(s)
- Hunter Morera
- Department of Computer Science and Engineering, University of South Florida, Tampa, FL 33620, USA.
| | - Palak Dave
- Department of Computer Science and Engineering, University of South Florida, Tampa, FL 33620, USA
| | - Yaroslav Kolinko
- Department of Histology and Embryology & Biomedical Center, Faculty of Medicine in Pilsen, Charles University, Pilsen, Czech Republic
| | - Saeed Alahmari
- Department of Computer Science, Najran University, Najran 66462, Saudi Arabia
| | | | | | | | | | - Dmitry Goldgof
- Department of Computer Science and Engineering, University of South Florida, Tampa, FL 33620, USA
| | - Lawrence O Hall
- Department of Computer Science and Engineering, University of South Florida, Tampa, FL 33620, USA
| | - G Jean Harry
- Mechanistic Toxicology Branch, Division of Translational Toxicology, NIEHS/NIH, Research Triangle Park, NC 27709, USA
| | - Peter R Mouton
- Department of Computer Science and Engineering, University of South Florida, Tampa, FL 33620, USA; SRC Biosciences, Tampa, FL 33606, USA.
| |
Collapse
|
34
|
Bu Y, Sun C, Guo J, Zhu W, Li J, Li X, Zhang Y. Identification novel salt-enhancing peptides from largemouth bass and exploration their action mechanism with transmembrane channel-like 4 (TMC4) by molecular simulation. Food Chem 2024; 435:137614. [PMID: 37820400 DOI: 10.1016/j.foodchem.2023.137614] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 09/21/2023] [Accepted: 09/27/2023] [Indexed: 10/13/2023]
Abstract
The purpose of this study was to screen and verify salt-enhancing peptides that can effectively reduce sodium consumption from Largemouth bass myosin through virtual hydrolysis, molecular simulation, and sensory evaluation. The human transmembrane channel-like 4 (TMC4) was constructed using Alphafold2, with 93.3 % of amino acids falling within allowed regions. A total of 19 peptides were predicted through virtual hydrolysis and screening. DAF, QIF, RPAL, and IPVM significantly enhanced the saltiness perception, and QIF exhibited the most pronounced effect in enhancing saltiness (P < 0.05). The residues Ala258, Ser546, Ser603, Phe259, Cys265, Glu539, Lys278 and Ser585 were identified as key binding sites. The TMC4-DAF complex achieved stability after 20, 000 ps, exhibiting an average RMSD value of 0.84 nm. DAF consistently displayed fluctuations at approximately 3.05 nm, and the number of hydrogen bonds varied between 3 and 5. These results suggested that Alphafold2 modelling can be used for predicting salt-enhancing peptides.
Collapse
Affiliation(s)
- Ying Bu
- College of Food Science and Engineering, Bohai University. National & Local Joint Engineering Research Center of Storage, Processing and Safety Control Technology for Fresh Agricultural and Aquatic Products, Jinzhou, Liaoning 121013, China; Engineering Research Centre of Fujian-Taiwan Special Marine Food Processing and Nutrition, Ministry of Education, Fuzhou 350002, China; College of Food Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China.
| | - Chaonan Sun
- College of Food Science and Engineering, Bohai University. National & Local Joint Engineering Research Center of Storage, Processing and Safety Control Technology for Fresh Agricultural and Aquatic Products, Jinzhou, Liaoning 121013, China
| | - Jiaqi Guo
- College of Food Science and Engineering, Bohai University. National & Local Joint Engineering Research Center of Storage, Processing and Safety Control Technology for Fresh Agricultural and Aquatic Products, Jinzhou, Liaoning 121013, China
| | - Wenhui Zhu
- College of Food Science and Engineering, Bohai University. National & Local Joint Engineering Research Center of Storage, Processing and Safety Control Technology for Fresh Agricultural and Aquatic Products, Jinzhou, Liaoning 121013, China
| | - Jianrong Li
- College of Food Science and Engineering, Bohai University. National & Local Joint Engineering Research Center of Storage, Processing and Safety Control Technology for Fresh Agricultural and Aquatic Products, Jinzhou, Liaoning 121013, China
| | - Xuepeng Li
- College of Food Science and Engineering, Bohai University. National & Local Joint Engineering Research Center of Storage, Processing and Safety Control Technology for Fresh Agricultural and Aquatic Products, Jinzhou, Liaoning 121013, China
| | - Yi Zhang
- Engineering Research Centre of Fujian-Taiwan Special Marine Food Processing and Nutrition, Ministry of Education, Fuzhou 350002, China; College of Food Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China.
| |
Collapse
|
35
|
Krokidis MG, Dimitrakopoulos GN, Vrahatis AG, Exarchos TP, Vlamos P. Challenges and limitations in computational prediction of protein misfolding in neurodegenerative diseases. Front Comput Neurosci 2024; 17:1323182. [PMID: 38250244 PMCID: PMC10796696 DOI: 10.3389/fncom.2023.1323182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 12/19/2023] [Indexed: 01/23/2024] Open
Affiliation(s)
| | | | | | | | - Panagiotis Vlamos
- Bioinformatics and Human Electrophysiology Laboratory, Department of Informatics, Ionian University, Corfu, Greece
| |
Collapse
|
36
|
da Silva GM, Cui JY, Dalgarno DC, Lisi GP, Rubenstein BM. Predicting Relative Populations of Protein Conformations without a Physics Engine Using AlphaFold 2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.25.550545. [PMID: 37546747 PMCID: PMC10402055 DOI: 10.1101/2023.07.25.550545] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
This paper presents a novel approach for predicting the relative populations of protein conformations using AlphaFold 2, an AI-powered method that has revolutionized biology by enabling the accurate prediction of protein structures. While AlphaFold 2 has shown exceptional accuracy and speed, it is designed to predict proteins' ground state conformations and is limited in its ability to predict conformational landscapes. Here, we demonstrate how AlphaFold 2 can directly predict the relative populations of different protein conformations by subsampling multiple sequence alignments. We tested our method against NMR experiments on two proteins with drastically different amounts of available sequence data, Abl1 kinase and the granulocyte-macrophage colony-stimulating factor, and predicted changes in their relative state populations with more than 80% accuracy. Our subsampling approach worked best when used to qualitatively predict the effects of mutations or evolution on the conformational landscape and well-populated states of proteins. It thus offers a fast and cost-effective way to predict the relative populations of protein conformations at even single-point mutation resolution, making it a useful tool for pharmacology, NMR analysis, and evolution.
Collapse
Affiliation(s)
- Gabriel Monteiro da Silva
- Brown University Department of Molecular Biology, Cell Biology, and Biochemistry, Providence, RI, USA
| | - Jennifer Y Cui
- Brown University Department of Molecular Biology, Cell Biology, and Biochemistry, Providence, RI, USA
| | | | - George P Lisi
- Brown University Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University Department of Chemistry, Providence, RI, USA
| | - Brenda M Rubenstein
- Brown University Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University Department of Chemistry, Providence, RI, USA
| |
Collapse
|
37
|
Sakhawat A, Khan MU, Rehman R, Khan S, Shan MA, Batool A, Javed MA, Ali Q. Natural compound targeting BDNF V66M variant: insights from in silico docking and molecular analysis. AMB Express 2023; 13:134. [PMID: 38015338 PMCID: PMC10684480 DOI: 10.1186/s13568-023-01640-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 11/09/2023] [Indexed: 11/29/2023] Open
Abstract
Brain-Derived Neurotrophic Factor (BDNF) is a neurotrophin gene family gene that encodes proteins vital for the growth, maintenance, and survival of neurons in the nervous system. The study aimed to screen natural compounds against BDNF variant (V66M), which affects memory, cognition, and mood regulation. BDNF variant (V66M) as a target structure was selected, and Vitamin D, Curcumin, Vitamin C, and Quercetin as ligands structures were taken from PubChem database. Multiple tools like AUTODOCK VINA, BIOVIA discovery studio, PyMOL, CB-dock, IMOD server, Swiss ADEMT, and Swiss predict ligands target were used to analyze binding energy, interaction, stability, toxicity, and visualize BDNF-ligand complexes. Compounds Vitamin D3, Curcumin, Vitamin C, and Quercetin with binding energies values of - 5.5, - 6.1, - 4.5, and - 6.7 kj/mol, respectively, were selected. The ligands bind to the active sites of the BDNF variant (V66M) via hydrophobic bonds, hydrogen bonds, and electrostatic interactions. Furthermore, ADMET analysis of the ligands revealed they exhibited sound pharmacokinetic and toxicity profiles. In addition, an MD simulation study showed that the most active ligand bound favorably and dynamically to the target protein, and protein-ligand complex stability was determined. The finding of this research could provide an excellent platform for discovering and rationalizing novel drugs against stress related to BDNF (V66M). Docking, preclinical drug testing and MD simulation results suggest Quercetin as a more potent BDNF variant (V66M) inhibitor and forming a more structurally stable complex.
Collapse
Affiliation(s)
- Azra Sakhawat
- Institute of Molecular Biology and Biotechnology, The University of Lahore, Lahore, Pakistan
| | - Muhammad Umer Khan
- Institute of Molecular Biology and Biotechnology, The University of Lahore, Lahore, Pakistan.
| | - Raima Rehman
- Institute of Molecular Biology and Biotechnology, The University of Lahore, Lahore, Pakistan
| | - Samiullah Khan
- Institute of Molecular Biology and Biotechnology, The University of Lahore, Lahore, Pakistan
| | - Muhammad Adnan Shan
- Centre for Applied Molecular Biology, University of the Punjab, Lahore, Pakistan
| | - Alia Batool
- Department of Plant Breeding and Genetics, Faculty of Agricultural Sciences, University of the Punjab, Lahore, Pakistan
| | - Muhammad Arshad Javed
- Department of Plant Breeding and Genetics, Faculty of Agricultural Sciences, University of the Punjab, Lahore, Pakistan
| | - Qurban Ali
- Department of Plant Breeding and Genetics, Faculty of Agricultural Sciences, University of the Punjab, Lahore, Pakistan.
| |
Collapse
|
38
|
Braghetto A, Orlandini E, Baiesi M. Interpretable Machine Learning of Amino Acid Patterns in Proteins: A Statistical Ensemble Approach. J Chem Theory Comput 2023; 19:6011-6022. [PMID: 37552831 PMCID: PMC10500975 DOI: 10.1021/acs.jctc.3c00383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Indexed: 08/10/2023]
Abstract
Explainable and interpretable unsupervised machine learning helps one to understand the underlying structure of data. We introduce an ensemble analysis of machine learning models to consolidate their interpretation. Its application shows that restricted Boltzmann machines compress consistently into a few bits the information stored in a sequence of five amino acids at the start or end of α-helices or β-sheets. The weights learned by the machines reveal unexpected properties of the amino acids and the secondary structure of proteins: (i) His and Thr have a negligible contribution to the amphiphilic pattern of α-helices; (ii) there is a class of α-helices particularly rich in Ala at their end; (iii) Pro occupies most often slots otherwise occupied by polar or charged amino acids, and its presence at the start of helices is relevant; (iv) Glu and especially Asp on one side and Val, Leu, Iso, and Phe on the other display the strongest tendency to mark amphiphilic patterns, i.e., extreme values of an effective hydrophobicity, though they are not the most powerful (non)hydrophobic amino acids.
Collapse
Affiliation(s)
- Anna Braghetto
- Department
of Physics and Astronomy, University of
Padova, Via Marzolo 8, 35131 Padua, Italy
- INFN,
Sezione di Padova, Via
Marzolo 8, 35131 Padua, Italy
| | - Enzo Orlandini
- Department
of Physics and Astronomy, University of
Padova, Via Marzolo 8, 35131 Padua, Italy
- INFN,
Sezione di Padova, Via
Marzolo 8, 35131 Padua, Italy
| | - Marco Baiesi
- Department
of Physics and Astronomy, University of
Padova, Via Marzolo 8, 35131 Padua, Italy
- INFN,
Sezione di Padova, Via
Marzolo 8, 35131 Padua, Italy
| |
Collapse
|
39
|
Vila JA. Protein structure prediction from the complementary science perspective. Biophys Rev 2023; 15:439-445. [PMID: 37681107 PMCID: PMC10480374 DOI: 10.1007/s12551-023-01107-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 07/25/2023] [Indexed: 09/09/2023] Open
Abstract
A comparative analysis between two problems-apparently unrelated-which are solved in a period of ~400 years, viz., the accurate prediction of both the planetary orbits and the protein structures, leads to inferred conjectures that go far beyond the existence of a common path in their resolution, i.e., observation → pattern recognition → modeling. The preliminary results from this analysis indicate that complementary science, together with a new perspective on protein folding, may help us discover common features that could contribute to a more in-depth understanding of still-unsolved problems such as protein folding.
Collapse
Affiliation(s)
- Jorge A. Vila
- IMASL-CONICET, Universidad Nacional de San Luis, Ejército de Los Andes 950, 5700 San Luis, Argentina
| |
Collapse
|
40
|
Lee S, Kim G, Karin EL, Mirdita M, Park S, Chikhi R, Babaian A, Kryshtafovych A, Steinegger M. Petascale Homology Search for Structure Prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.10.548308. [PMID: 37503235 PMCID: PMC10369885 DOI: 10.1101/2023.07.10.548308] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
The recent CASP15 competition highlighted the critical role of multiple sequence alignments (MSAs) in protein structure prediction, as demonstrated by the success of the top AlphaFold2-based prediction methods. To push the boundaries of MSA utilization, we conducted a petabase-scale search of the Sequence Read Archive (SRA), resulting in gigabytes of aligned homologs for CASP15 targets. These were merged with default MSAs produced by ColabFold-search and provided to ColabFold-predict. By using SRA data, we achieved highly accurate predictions (GDT_TS > 70) for 66% of the non-easy targets, whereas using ColabFold-search default MSAs scored highly in only 52%. Next, we tested the effect of deep homology search and ColabFold's advanced features, such as more recycles, on prediction accuracy. While SRA homologs were most significant for improving ColabFold's CASP15 ranking from 11th to 3rd place, other strategies contributed too. We analyze these in the context of existing strategies to improve prediction.
Collapse
Affiliation(s)
- Sewon Lee
- School of Biological Sciences, Seoul National University, Seoul 08826, South Korea
| | - Gyuri Kim
- School of Biological Sciences, Seoul National University, Seoul 08826, South Korea
| | | | - Milot Mirdita
- School of Biological Sciences, Seoul National University, Seoul 08826, South Korea
| | - Sukhwan Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea
| | - Rayan Chikhi
- Institut Pasteur, Université Paris Cité, G5 Sequence Bioinformatics, 75015 Paris, France
| | - Artem Babaian
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | | | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul 08826, South Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul 08826, South Korea
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, South Korea
| |
Collapse
|
41
|
Schön JC. Structure prediction in low dimensions: concepts, issues and examples. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2023; 381:20220246. [PMID: 37211034 PMCID: PMC10200350 DOI: 10.1098/rsta.2022.0246] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 03/06/2023] [Indexed: 05/23/2023]
Abstract
Structure prediction of stable and metastable polymorphs of chemical systems in low dimensions has become an important field, since materials that are patterned on the nano-scale are of increasing importance in modern technological applications. While many techniques for the prediction of crystalline structures in three dimensions or of small clusters of atoms have been developed over the past three decades, dealing with low-dimensional systems-ideal one-dimensional and two-dimensional systems, quasi-one-dimensional and quasi-two-dimensional systems, as well as low-dimensional composite systems-poses its own challenges that need to be addressed when developing a systematic methodology for the determination of low-dimensional polymorphs that are suitable for practical applications. Quite generally, the search algorithms that had been developed for three-dimensional systems need to be adjusted when being applied to low-dimensional systems with their own specific constraints; in particular, the embedding of the (quasi-)one-dimensional/two-dimensional system in three dimensions and the influence of stabilizing substrates need to be taken into account, both on a technical and a conceptual level. This article is part of a discussion meeting issue 'Supercomputing simulations of advanced materials'.
Collapse
Affiliation(s)
- J. Christian Schön
- Department of Nanoscience, Max-Planck-Institute for Solid State Research, Heisenbergstr. 1, D-70569 Stuttgart, Germany
| |
Collapse
|
42
|
Liu M, Huang J, Ma S, Yu G, Liao A, Pan L, Hou Y. Allergenicity of wheat protein in diet: Mechanisms, modifications and challenges. Food Res Int 2023; 169:112913. [PMID: 37254349 DOI: 10.1016/j.foodres.2023.112913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 04/21/2023] [Accepted: 04/25/2023] [Indexed: 06/01/2023]
Abstract
Wheat is widely available in people's daily diets. However, some people are currently experiencing IgE-mediated allergic reactions to wheat-based foods, which seriously impact their quality of life. Thus, it is imperative to provide comprehensive knowledge and effective methods to reduce the risk of wheat allergy (WA) in food. In the present review, recent advances in WA symptoms, the major allergens, detection methods, opportunities and challenges in establishing animal models of WA are summarized and discussed. Furthermore, an updated overview of the different modification methods that are currently being applied to wheat-based foods is provided. This study concludes that future approaches to food allergen detection will focus on combining multiple tools to rapidly and accurately quantify individual allergens in complex food matrices. Besides, biological modification has many advantages over physical or chemical modification methods in the development of hypoallergenic wheat products, such as enzymatic hydrolysis and fermentation. It is worth noting that using biotechnology to edit wheat allergen genes to produce allergen-free food may be a promising method in the future which could improve the safety of wheat foods and the health of allergy sufferers.
Collapse
Affiliation(s)
- Ming Liu
- Henan Provincial Key Laboratory of Biological Processing and Nutritional Function of Wheat, College of Biological Engineering, Henan University of Technology, Zhengzhou 450001, PR China; College of Food Science and Engineering, Henan University of Technology, Zhengzhou, 450001, PR China
| | - Jihong Huang
- Henan Provincial Key Laboratory of Biological Processing and Nutritional Function of Wheat, College of Biological Engineering, Henan University of Technology, Zhengzhou 450001, PR China; College of Food Science and Engineering, Henan University of Technology, Zhengzhou, 450001, PR China; State Key Laboratory of Crop Stress Adaptation and Improvement, College of Agriculture, Henan University, Kaifeng 475004, PR China; School of Food and Pharmacy, Xuchang University, Xuchang 461000, PR China.
| | - Sen Ma
- College of Food Science and Engineering, Henan University of Technology, Zhengzhou, 450001, PR China.
| | - Guanghai Yu
- Henan Provincial Key Laboratory of Biological Processing and Nutritional Function of Wheat, College of Biological Engineering, Henan University of Technology, Zhengzhou 450001, PR China
| | - Aimei Liao
- Henan Provincial Key Laboratory of Biological Processing and Nutritional Function of Wheat, College of Biological Engineering, Henan University of Technology, Zhengzhou 450001, PR China
| | - Long Pan
- Henan Provincial Key Laboratory of Biological Processing and Nutritional Function of Wheat, College of Biological Engineering, Henan University of Technology, Zhengzhou 450001, PR China
| | - Yinchen Hou
- College of Food and Biological Engineering, Henan University of Animal Husbandry and Economy, Zhengzhou 450044, PR China
| |
Collapse
|
43
|
Spiers AJ, Dorfmueller HC, Jerdan R, McGregor J, Nicoll A, Steel K, Cameron S. Bioinformatics characterization of BcsA-like orphan proteins suggest they form a novel family of pseudomonad cyclic-β-glucan synthases. PLoS One 2023; 18:e0286540. [PMID: 37267309 PMCID: PMC10237404 DOI: 10.1371/journal.pone.0286540] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 05/18/2023] [Indexed: 06/04/2023] Open
Abstract
Bacteria produce a variety of polysaccharides with functional roles in cell surface coating, surface and host interactions, and biofilms. We have identified an 'Orphan' bacterial cellulose synthase catalytic subunit (BcsA)-like protein found in four model pseudomonads, P. aeruginosa PA01, P. fluorescens SBW25, P. putida KT2440 and P. syringae pv. tomato DC3000. Pairwise alignments indicated that the Orphan and BcsA proteins shared less than 41% sequence identity suggesting they may not have the same structural folds or function. We identified 112 Orphans among soil and plant-associated pseudomonads as well as in phytopathogenic and human opportunistic pathogenic strains. The wide distribution of these highly conserved proteins suggest they form a novel family of synthases producing a different polysaccharide. In silico analysis, including sequence comparisons, secondary structure and topology predictions, and protein structural modelling, revealed a two-domain transmembrane ovoid-like structure for the Orphan protein with a periplasmic glycosyl hydrolase family GH17 domain linked via a transmembrane region to a cytoplasmic glycosyltransferase family GT2 domain. We suggest the GT2 domain synthesises β-(1,3)-glucan that is transferred to the GH17 domain where it is cleaved and cyclised to produce cyclic-β-(1,3)-glucan (CβG). Our structural models are consistent with enzymatic characterisation and recent molecular simulations of the PaPA01 and PpKT2440 GH17 domains. It also provides a functional explanation linking PaPAK and PaPA14 Orphan (also known as NdvB) transposon mutants with CβG production and biofilm-associated antibiotic resistance. Importantly, cyclic glucans are also involved in osmoregulation, plant infection and induced systemic suppression, and our findings suggest this novel family of CβG synthases may provide similar range of adaptive responses for pseudomonads.
Collapse
Affiliation(s)
- Andrew J. Spiers
- School of Applied Sciences, Abertay University, Dundee, United Kingdom
| | - Helge C. Dorfmueller
- Division of Molecular Microbiology, School of Life Sciences, University of Dundee, Dundee, United Kingdom
| | - Robyn Jerdan
- School of Applied Sciences, Abertay University, Dundee, United Kingdom
| | - Jessica McGregor
- Nuffield Research Placement Students, School of Applied Sciences, Abertay University, Dundee, United Kingdom
| | - Abbie Nicoll
- Nuffield Research Placement Students, School of Applied Sciences, Abertay University, Dundee, United Kingdom
| | - Kenzie Steel
- Nuffield Research Placement Students, School of Applied Sciences, Abertay University, Dundee, United Kingdom
| | - Scott Cameron
- School of Applied Sciences, Abertay University, Dundee, United Kingdom
| |
Collapse
|
44
|
Vila JA. Rethinking the protein folding problem from a new perspective. EUROPEAN BIOPHYSICS JOURNAL : EBJ 2023:10.1007/s00249-023-01657-w. [PMID: 37165178 DOI: 10.1007/s00249-023-01657-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 04/16/2023] [Accepted: 04/30/2023] [Indexed: 05/12/2023]
Abstract
One of the main concerns of Anfinsen was to reveal the connection between the amino-acid sequence and their biologically active conformation. This search gave rise to two crucial questions in structural biology, namely, why the proteins fold and how a sequence encodes its folding. As to the why, he proposes a plausible answer, namely, the thermodynamic hypothesis. As to the how, this remains an unsolved challenge. Consequently, the protein folding problem is examined here from a new perspective, namely, as an 'analytic whole'. Conceiving the protein folding in this way enabled us to (i) examine in detail why the force-field-based approaches have failed, among other purposes, in their ability to predict the three-dimensional structure of a protein accurately; (ii) propose how to redefine them to prevent these shortcomings, and (iii) conjecture on the origin of the state-of-the-art numerical-methods success to predict the tridimensional structure of proteins accurately.
Collapse
Affiliation(s)
- Jorge A Vila
- IMASL-CONICET, Universidad Nacional de San Luis, Ejército de Los Andes 950, 5700, San Luis, Argentina.
| |
Collapse
|
45
|
Wodak SJ, Vajda S, Lensink MF, Kozakov D, Bates PA. Critical Assessment of Methods for Predicting the 3D Structure of Proteins and Protein Complexes. Annu Rev Biophys 2023; 52:183-206. [PMID: 36626764 PMCID: PMC10885158 DOI: 10.1146/annurev-biophys-102622-084607] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Advances in a scientific discipline are often measured by small, incremental steps. In this review, we report on two intertwined disciplines in the protein structure prediction field, modeling of single chains and modeling of complexes, that have over decades emulated this pattern, as monitored by the community-wide blind prediction experiments CASP and CAPRI. However, over the past few years, dramatic advances were observed for the accurate prediction of single protein chains, driven by a surge of deep learning methodologies entering the prediction field. We review the mainscientific developments that enabled these recent breakthroughs and feature the important role of blind prediction experiments in building up and nurturing the structure prediction field. We discuss how the new wave of artificial intelligence-based methods is impacting the fields of computational and experimental structural biology and highlight areas in which deep learning methods are likely to lead to future developments, provided that major challenges are overcome.
Collapse
Affiliation(s)
- Shoshana J Wodak
- VIB-VUB Center for Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium;
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA;
- Department of Chemistry, Boston University, Boston, Massachusetts, USA
| | - Marc F Lensink
- Univ. Lille, CNRS, UMR 8576-UGSF-Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France;
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA;
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Paul A Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, London, United Kingdom;
| |
Collapse
|
46
|
Zhang O, Haghighatlari M, Li J, Liu ZH, Namini A, Teixeira JMC, Forman-Kay JD, Head-Gordon T. Learning to evolve structural ensembles of unfolded and disordered proteins using experimental solution data. J Chem Phys 2023; 158:174113. [PMID: 37144719 PMCID: PMC10163956 DOI: 10.1063/5.0141474] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/11/2023] [Indexed: 05/06/2023] Open
Abstract
The structural characterization of proteins with a disorder requires a computational approach backed by experiments to model their diverse and dynamic structural ensembles. The selection of conformational ensembles consistent with solution experiments of disordered proteins highly depends on the initial pool of conformers, with currently available tools limited by conformational sampling. We have developed a Generative Recurrent Neural Network (GRNN) that uses supervised learning to bias the probability distributions of torsions to take advantage of experimental data types such as nuclear magnetic resonance J-couplings, nuclear Overhauser effects, and paramagnetic resonance enhancements. We show that updating the generative model parameters according to the reward feedback on the basis of the agreement between experimental data and probabilistic selection of torsions from learned distributions provides an alternative to existing approaches that simply reweight conformers of a static structural pool for disordered proteins. Instead, the biased GRNN, DynamICE, learns to physically change the conformations of the underlying pool of the disordered protein to those that better agree with experiments.
Collapse
Affiliation(s)
- Oufan Zhang
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, California 94720, USA
| | - Mojtaba Haghighatlari
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, California 94720, USA
| | - Jie Li
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, California 94720, USA
| | | | - Ashley Namini
- Molecular Medicine Program, Hospital for Sick Children, Toronto, Ontario M5S 1A8, Canada
| | | | | | | |
Collapse
|
47
|
Mardikoraem M, Woldring D. Protein Fitness Prediction Is Impacted by the Interplay of Language Models, Ensemble Learning, and Sampling Methods. Pharmaceutics 2023; 15:1337. [PMID: 37242577 PMCID: PMC10224321 DOI: 10.3390/pharmaceutics15051337] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 04/19/2023] [Accepted: 04/21/2023] [Indexed: 05/28/2023] Open
Abstract
Advances in machine learning (ML) and the availability of protein sequences via high-throughput sequencing techniques have transformed the ability to design novel diagnostic and therapeutic proteins. ML allows protein engineers to capture complex trends hidden within protein sequences that would otherwise be difficult to identify in the context of the immense and rugged protein fitness landscape. Despite this potential, there persists a need for guidance during the training and evaluation of ML methods over sequencing data. Two key challenges for training discriminative models and evaluating their performance include handling severely imbalanced datasets (e.g., few high-fitness proteins among an abundance of non-functional proteins) and selecting appropriate protein sequence representations (numerical encodings). Here, we present a framework for applying ML over assay-labeled datasets to elucidate the capacity of sampling techniques and protein encoding methods to improve binding affinity and thermal stability prediction tasks. For protein sequence representations, we incorporate two widely used methods (One-Hot encoding and physiochemical encoding) and two language-based methods (next-token prediction, UniRep; masked-token prediction, ESM). Elaboration on performance is provided over protein fitness, protein size, and sampling techniques. In addition, an ensemble of protein representation methods is generated to discover the contribution of distinct representations and improve the final prediction score. We then implement multiple criteria decision analysis (MCDA; TOPSIS with entropy weighting), using multiple metrics well-suited for imbalanced data, to ensure statistical rigor in ranking our methods. Within the context of these datasets, the synthetic minority oversampling technique (SMOTE) outperformed undersampling while encoding sequences with One-Hot, UniRep, and ESM representations. Moreover, ensemble learning increased the predictive performance of the affinity-based dataset by 4% compared to the best single-encoding candidate (F1-score = 97%), while ESM alone was rigorous enough in stability prediction (F1-score = 92%).
Collapse
Affiliation(s)
- Mehrsa Mardikoraem
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI 48824, USA
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Daniel Woldring
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI 48824, USA
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
48
|
Sicard J, Barbe S, Boutrou R, Bouvier L, Delaplace G, Lashermes G, Théron L, Vitrac O, Tonda A. A primer on predictive techniques for food and bioresources transformation processes. J FOOD PROCESS ENG 2023. [DOI: 10.1111/jfpe.14325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2023]
Affiliation(s)
| | | | | | - Laurent Bouvier
- UMET Université de Lille, CNRS, Centrale Lille, INRAE Villeneuve‐D'Ascq France
| | - Guillaume Delaplace
- UMET Université de Lille, CNRS, Centrale Lille, INRAE Villeneuve‐D'Ascq France
| | | | | | - Olivier Vitrac
- SayFood, INRAE, AgroParisTech Université Paris Saclay Massy France
| | - Alberto Tonda
- MIA‐Paris, AgroParisTech, INRAE Université Paris Saclay Paris France
| |
Collapse
|
49
|
Yang Z, Zeng X, Zhao Y, Chen R. AlphaFold2 and its applications in the fields of biology and medicine. Signal Transduct Target Ther 2023; 8:115. [PMID: 36918529 PMCID: PMC10011802 DOI: 10.1038/s41392-023-01381-z] [Citation(s) in RCA: 182] [Impact Index Per Article: 91.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 12/27/2022] [Accepted: 02/16/2023] [Indexed: 03/16/2023] Open
Abstract
AlphaFold2 (AF2) is an artificial intelligence (AI) system developed by DeepMind that can predict three-dimensional (3D) structures of proteins from amino acid sequences with atomic-level accuracy. Protein structure prediction is one of the most challenging problems in computational biology and chemistry, and has puzzled scientists for 50 years. The advent of AF2 presents an unprecedented progress in protein structure prediction and has attracted much attention. Subsequent release of structures of more than 200 million proteins predicted by AF2 further aroused great enthusiasm in the science community, especially in the fields of biology and medicine. AF2 is thought to have a significant impact on structural biology and research areas that need protein structure information, such as drug discovery, protein design, prediction of protein function, et al. Though the time is not long since AF2 was developed, there are already quite a few application studies of AF2 in the fields of biology and medicine, with many of them having preliminarily proved the potential of AF2. To better understand AF2 and promote its applications, we will in this article summarize the principle and system architecture of AF2 as well as the recipe of its success, and particularly focus on reviewing its applications in the fields of biology and medicine. Limitations of current AF2 prediction will also be discussed.
Collapse
Affiliation(s)
- Zhenyu Yang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Xiaoxi Zeng
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
| | - Yi Zhao
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.
| | - Runsheng Chen
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- Pingshan Translational Medicine Center, Shenzhen Bay Laboratory, Shenzhen, 518118, China.
| |
Collapse
|
50
|
Szwabowski GL, Baker DL, Parrill AL. Application of computational methods for class A GPCR Ligand discovery. J Mol Graph Model 2023; 121:108434. [PMID: 36841204 DOI: 10.1016/j.jmgm.2023.108434] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 02/11/2023] [Accepted: 02/13/2023] [Indexed: 02/22/2023]
Abstract
G protein-coupled receptors (GPCR) are integral membrane proteins of considerable interest as targets for drug development due to their role in transmitting cellular signals in a multitude of biological processes. Of the six classes categorizing GPCR (A, B, C, D, E, and F), class A contains the largest number of therapeutically relevant GPCR. Despite their importance as drug targets, many challenges exist for the discovery of novel class A GPCR ligands serving as drug precursors. Though knowledge of the structural and functional characteristics of GPCR has grown significantly over the past 20 years, a large portion of GPCR lack reported, experimentally determined structures. Furthermore, many GPCR have no known endogenous and/or synthetic ligands, limiting further exploration of their biochemical, cellular, and physiological roles. While many successes in GPCR ligand discovery have resulted from experimental high-throughput screening, computational methods have played an increasingly important role in GPCR ligand identification in the past decade. Here we discuss computational techniques applied to GPCR ligand discovery. This review summarizes class A GPCR structure/function and provides an overview of many obstacles currently faced in GPCR ligand discovery. Furthermore, we discuss applications and recent successes of computational techniques used to predict GPCR structure as well as present a summary of ligand- and structure-based methods used to identify potential GPCR ligands. Finally, we discuss computational hit list generation and refinement and provide comprehensive workflows for GPCR ligand identification.
Collapse
Affiliation(s)
| | - Daniel L Baker
- Department of Chemistry, The University of Memphis, Memphis, TN, 38152, USA
| | - Abby L Parrill
- Department of Chemistry, The University of Memphis, Memphis, TN, 38152, USA.
| |
Collapse
|