1
|
Chen L, Smith M, Roe DR, Miranda-Quintana RA. Extended Quality (eQual): Radial Threshold Clustering Based on n-ary Similarity. J Chem Inf Model 2025; 65:5062-5070. [PMID: 40309753 DOI: 10.1021/acs.jcim.4c02341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2025]
Abstract
We are transforming Radial Threshold Clustering (RTC), an O(N2) algorithm, into Extended Quality Clustering (eQual), an O(N) algorithm with several novel features. Daura et al.'s RTC algorithm is a partitioning clustering algorithm that groups similar frames together based on their similarity to the seed configuration. RTC has two main issues: it scales as O(N2), making it inefficient for large frame counts, and its clustering results depend on the order of input frames whenever there is a tie in the most populated cluster. To address the first issue, we have increased the speed of the seed selection by using k-means++ to select the seeds of the available frames. To address the second issue and make the results invariant with respect to frame order, the densest and most compact cluster is chosen using the extended similarity indices. The new algorithm is able to cluster in linear time and produce more compact and separate clusters.
Collapse
Affiliation(s)
- Lexin Chen
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
- Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States
| | - Micah Smith
- Institute for Bioscience and Biotechnology Research, National Institute of Standards and Technology and the University of Maryland, Rockville, Maryland 20850, United States
| | - Daniel R Roe
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, United States
| | - Ramón Alain Miranda-Quintana
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
- Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States
| |
Collapse
|
2
|
Yang J, Guo H, Liu S, Xi K, Wu Q, Li Y, Fang K, Zhou K, Su C, Jing BY, Wu H, Zhu L. Locating Transition States for Biomolecular Dynamics via Invertible Dimensionality Reduction. J Chem Theory Comput 2025. [PMID: 40390308 DOI: 10.1021/acs.jctc.4c01624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/21/2025]
Abstract
Locating the transition states (TS) for the conformational changes of biomacromolecules is among the major tasks of biomolecular simulations, as they are the bottlenecks of motion encoding key mechanistic insights. However, identifying the short-lived TSs from (even abundant) simulation data has been a long-standing challenge due to the high dimensionality of the molecules. Gentlest ascent dynamics (GAD) is an effective approach that searches for saddle points but only within spaces of small number of (typically <20) dimensions. Such a restriction of GAD may in principle be relieved by dimensionality reduction (DR) that reduces the high-dimensional configurational space of the molecules to a low-dimensional manifold. However, the vast majority of DR algorithms are built to focus on only high-density regions and have therefore distorted the TS regions, disabling a subsequent GAD search. The recently introduced reaction coordinate flows (RCF) is among the few exceptions. As RCF learns an invertible mapping between the configurational space and the reduced RC space through a loss function incorporating both density and transition pair information, it shall be able to preserve kinetics and therefore TS during DR. GAD can then be readily applied to locate the TS candidates in the RCF-learned RC space, which can be validated rigorously through reverse RCF mapping and committor analysis in the original space. Here, we demonstrate the effectiveness of this RCF-GAD integration through alanine dipeptide and the ground-to-excited transition of the T4 lysozyme L99A variant (T4L-L99A) in explicit solvents. For alanine dipeptide, GAD managed to identify three TSs with the RCF reduction to four RCs, but only two TSs with a reduction to two RCs, due to the merging of two low density stable states in the 2RC representation, indicating the necessity of a priori evaluation of the number of intrinsic dimensions for RCF. For T4L-L99A, the TSs located by GAD in a 4RC RCF reduction successfully resembled those found previously via automated path searching, demonstrating the feasibility of our approach for realistic biomolecular systems.
Collapse
Affiliation(s)
- Jianyu Yang
- School of Medicine and Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Shenzhen 518172, China
| | - Huanlei Guo
- Department of Statistics and Data Science, College of Science, Southern University of Science and Technology, Shenzhen 518055, China
| | - Song Liu
- Department of Statistics and Data Science, College of Science, Southern University of Science and Technology, Shenzhen 518055, China
| | - Kun Xi
- School of Medicine and Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Shenzhen 518172, China
| | - Qiang Wu
- Bachelor Program of Mathematics and Applied Mathematics, School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, Shenzhen 518172, China
| | - Yixun Li
- Bachelor Program of Bioinformatics, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Shenzhen 518172, China
| | - Kuo Fang
- Bachelor Program of Biological Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Shenzhen 518172, China
| | - Kaiyi Zhou
- Bachelor Program of Bioinformatics, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Shenzhen 518172, China
| | - Chang Su
- Bachelor Program of Bioinformatics, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Shenzhen 518172, China
| | - Bing-Yi Jing
- Department of Statistics and Data Science, College of Science, Southern University of Science and Technology, Shenzhen 518055, China
| | - Hao Wu
- School of Mathematical Sciences, Institute of Natural Sciences and MOE-LSC, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Lizhe Zhu
- School of Medicine and Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Shenzhen 518172, China
| |
Collapse
|
3
|
Bazzi S, Sayyad S. Revealing arginine-cysteine and glycine-cysteine NOS linkages by a systematic re-evaluation of protein structures. Commun Chem 2025; 8:146. [PMID: 40360719 PMCID: PMC12075730 DOI: 10.1038/s42004-025-01535-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Accepted: 04/23/2025] [Indexed: 05/15/2025] Open
Abstract
Nitrogen-oxygen-sulfur (NOS) linkages act as allosteric redox switches, modulating enzymatic activity in response to redox fluctuations. While NOS linkages in proteins were once assumed to occur only between lysine and cysteine, our investigation shows that these bonds extend beyond the well-studied lysine-NOS-cysteine examples. By systematically analyzing over 86,000 high-resolution X-ray protein structures, we uncovered 69 additional NOS bonds, including arginine-NOS-cysteine and glycine-NOS-cysteine. Our pipeline integrates machine learning, quantum-mechanical calculations, and high-resolution X-ray crystallographic data to systematically detect these subtle covalent interactions and identify key predictive descriptors for their formation. The discovery of these previously unrecognized linkages broadens the scope of protein chemistry and may enable targeted modulation in drug design and protein engineering. Although our study focuses on NOS linkages, the flexibility of this methodology allows for the investigation of a wide range of chemical bonds and covalent modifications, including structurally resolvable posttranslational modifications (PTMs). By revisiting and re-examining well-established protein models, this work underscores how systematic data-driven approaches can uncover hidden aspects of protein chemistry and inspire deeper insights into protein function and stability.
Collapse
Affiliation(s)
- Sophia Bazzi
- Institute of Physical Chemistry, Georg-August University Göttingen, Tammannstraße 6, Göttingen, D-37077, Germany.
| | - Sharareh Sayyad
- Department of Mathematics and Statistics, Washington State University, Pullman, WA, 99164-3113, USA
- Mathematical Institute, Georg-August University Göttingen, Bunsenstraße 3-5, Göttingen, 37073, Germany
| |
Collapse
|
4
|
Clarke HA, Ma X, Shedlock CJ, Medina T, Hawkinson TR, Wu L, Ribas RA, Keohane S, Ravi S, Bizon JL, Burke SN, Abisambra JF, Merritt ME, Prentice BM, Vander Kooi CW, Gentry MS, Chen L, Sun RC. Spatial mapping of the brain metabolome lipidome and glycome. Nat Commun 2025; 16:4373. [PMID: 40355410 PMCID: PMC12069719 DOI: 10.1038/s41467-025-59487-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 04/23/2025] [Indexed: 05/14/2025] Open
Abstract
Metabolites, lipids, and glycans are fundamental but interconnected classes of biomolecules that form the basis of the metabolic network. These molecules are dynamically channeled through multiple pathways that govern cellular physiology and pathology. Here, we present a framework for the simultaneous spatial analysis of the metabolome, lipidome, and glycome from a single tissue section using mass spectrometry imaging. This workflow integrates a computational platform, the Spatial Augmented Multiomics Interface (Sami), which enables multiomics integration, high-dimensional clustering, spatial anatomical mapping of matched molecular features, and metabolic pathway enrichment. To demonstrate the utility of this approach, we applied Sami to evaluate metabolic diversity across distinct brain regions and to compare wild-type and Ps19 Alzheimer's disease (AD) mouse models. Our findings reveal region-specific metabolic demands in the normal brain and highlight metabolic dysregulation in the Ps19 model, providing insights into the biochemical alterations associated with neurodegeneration.
Collapse
Affiliation(s)
- Harrison A Clarke
- Department of Biochemistry & Molecular Biology, College of Medicine, University of Florida, Gainesville, FL, USA
- Center for Advanced Spatial Biomolecule Research, University of Florida, Gainesville, FL, USA
| | - Xin Ma
- Department of Biochemistry & Molecular Biology, College of Medicine, University of Florida, Gainesville, FL, USA
- Center for Advanced Spatial Biomolecule Research, University of Florida, Gainesville, FL, USA
- Department of Biostatistics College of Public Health and Health Professions & College of Medicine, University of Florida, Gainesville, FL, USA
| | - Cameron J Shedlock
- Department of Biochemistry & Molecular Biology, College of Medicine, University of Florida, Gainesville, FL, USA
- Center for Advanced Spatial Biomolecule Research, University of Florida, Gainesville, FL, USA
| | - Terrymar Medina
- Department of Biochemistry & Molecular Biology, College of Medicine, University of Florida, Gainesville, FL, USA
- Center for Advanced Spatial Biomolecule Research, University of Florida, Gainesville, FL, USA
| | - Tara R Hawkinson
- Department of Biochemistry & Molecular Biology, College of Medicine, University of Florida, Gainesville, FL, USA
- Center for Advanced Spatial Biomolecule Research, University of Florida, Gainesville, FL, USA
| | - Lei Wu
- Department of Biochemistry & Molecular Biology, College of Medicine, University of Florida, Gainesville, FL, USA
- Center for Advanced Spatial Biomolecule Research, University of Florida, Gainesville, FL, USA
| | - Roberto A Ribas
- Department of Biochemistry & Molecular Biology, College of Medicine, University of Florida, Gainesville, FL, USA
- Center for Advanced Spatial Biomolecule Research, University of Florida, Gainesville, FL, USA
| | - Shannon Keohane
- Department of Biochemistry & Molecular Biology, College of Medicine, University of Florida, Gainesville, FL, USA
- Center for Advanced Spatial Biomolecule Research, University of Florida, Gainesville, FL, USA
| | - Sakthivel Ravi
- Department of Neuroscience, University of Florida, Gainesville, FL, USA
- Evelyn F. and William L. McKnight Brain Institute, University of Florida, Gainesville, FL, USA
- Center for Translational Research in Neurodegenerative Disease (CTRND), University of Florida, Gainesville, FL, USA
| | - Jennifer L Bizon
- Department of Neuroscience, University of Florida, Gainesville, FL, USA
- Evelyn F. and William L. McKnight Brain Institute, University of Florida, Gainesville, FL, USA
- Center for Addiction Research and Education, University of Florida, Gainesville, FL, USA
| | - Sara N Burke
- Department of Neuroscience, University of Florida, Gainesville, FL, USA
- Evelyn F. and William L. McKnight Brain Institute, University of Florida, Gainesville, FL, USA
- Institute on Aging, University of Florida, Gainesville, FL, USA
| | - Jose Francisco Abisambra
- Department of Neuroscience, University of Florida, Gainesville, FL, USA
- Evelyn F. and William L. McKnight Brain Institute, University of Florida, Gainesville, FL, USA
- Center for Translational Research in Neurodegenerative Disease (CTRND), University of Florida, Gainesville, FL, USA
- Brain Injury Rehabilitation and Neuroresilience (BRAIN) Center, University of Florida, Gainesville, FL, USA
| | - Matthew E Merritt
- Department of Biochemistry & Molecular Biology, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Boone M Prentice
- Department of Chemistry, University of Florida, Gainesville, FL, USA
| | - Craig W Vander Kooi
- Department of Biochemistry & Molecular Biology, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Matthew S Gentry
- Department of Biochemistry & Molecular Biology, College of Medicine, University of Florida, Gainesville, FL, USA
- Center for Advanced Spatial Biomolecule Research, University of Florida, Gainesville, FL, USA
| | - Li Chen
- Department of Biostatistics College of Public Health and Health Professions & College of Medicine, University of Florida, Gainesville, FL, USA
| | - Ramon C Sun
- Department of Biochemistry & Molecular Biology, College of Medicine, University of Florida, Gainesville, FL, USA.
- Center for Advanced Spatial Biomolecule Research, University of Florida, Gainesville, FL, USA.
- Evelyn F. and William L. McKnight Brain Institute, University of Florida, Gainesville, FL, USA.
| |
Collapse
|
5
|
Bhattacharya S, Chakrabarty S. Mapping conformational landscape in protein folding: Benchmarking dimensionality reduction and clustering techniques on the Trp-Cage mini-protein. Biophys Chem 2025; 319:107389. [PMID: 39862593 DOI: 10.1016/j.bpc.2025.107389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2024] [Revised: 12/16/2024] [Accepted: 01/08/2025] [Indexed: 01/27/2025]
Abstract
Quantitative characterization of protein conformational landscapes is a computationally challenging task due to their high dimensionality and inherent complexity. In this study, we systematically benchmark several widely used dimensionality reduction and clustering methods to analyze the conformational states of the Trp-Cage mini-protein, a model system with well-documented folding dynamics. Dimensionality reduction techniques, including Principal Component Analysis (PCA), Time-lagged Independent Component Analysis (TICA), and Variational Autoencoders (VAE), were employed to project the high-dimensional free energy landscape onto 2D spaces for visualization. Additionally, clustering methods such as K-means, hierarchical clustering, HDBSCAN, and Gaussian Mixture Models (GMM) were used to identify discrete conformational states directly in the high-dimensional space. Our findings reveal that density-based clustering approaches, particularly HDBSCAN, provide physically meaningful representations of free energy minima. While highlighting the strengths and limitations of each method, our study underscores that no single technique is universally optimal for capturing the complex folding pathways, emphasizing the necessity for careful selection and interpretation of computational methods in biomolecular simulations. These insights will contribute to refining the available tools for analyzing protein conformational landscapes, enabling a deeper understanding of folding mechanisms and intermediate states.
Collapse
Affiliation(s)
- Sayari Bhattacharya
- Department of Chemical and Biological Sciences, S. N. Bose National Centre for Basic Sciences, Kolkata 700106, India
| | - Suman Chakrabarty
- Department of Chemical and Biological Sciences, S. N. Bose National Centre for Basic Sciences, Kolkata 700106, India.
| |
Collapse
|
6
|
Duy HA, Srisongkram T. Protecting your skin: a highly accurate LSTM network integrating conjoint features for predicting chemical-induced skin irritation. J Cheminform 2025; 17:39. [PMID: 40148987 PMCID: PMC11951793 DOI: 10.1186/s13321-025-00980-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2024] [Accepted: 02/28/2025] [Indexed: 03/29/2025] Open
Abstract
Skin irritation is a significant adverse effect associated with chemicals and drug substances. Quantitative structure-activity relationship (QSAR) is an alternative method bypassing in vivo assay for filling data gaps in chemical risk assessment. In this study, we developed QSAR models based on recurrent neural networks (RNNs) to classify skin irritation caused by chemical compounds. We utilized chemical language notation, molecular substructures, molecular descriptors, and a combination of these features named conjoint fingerprints for model construction. A simple RNN, long short-term memory (LSTM), bidirectional long short-term memory (BiLSTM), gated recurrent units (GRU), and bidirectional gated recurrent units (BiGRU) architectures were used to build the QSAR models. We found that the LSTM and a combination of molecular fingerprints and descriptors outperformed the other models significantly with 80% accuracy, 60% MCC, and 85% AUC for the external test set evaluation. Thereby, we selected this model for generalizability testing with other test sets beyond our study, ensuring that the model can be used with other data sets. Furthermore, the applicability domain of the purposed model was developed, enabling a trustable prediction will be made for a test compound. This model was developed based on OECD guidelines for skin irritation assessment and QSAR model development, assuring compliance with all required standards. The models and source codes developed in this study are publicly available, facilitating chemical design and safety evaluation, particularly for assessing the skin irritation potential of chemicals.
Collapse
Affiliation(s)
- Huynh Anh Duy
- Graduate School in the Program of Research and Development in Pharmaceuticals, Faculty of Pharmaceutical Sciences, Khon Kaen University, Khon Kaen, 40002, Thailand
| | - Tarapong Srisongkram
- Division of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, Khon Kaen, 40002, Thailand.
| |
Collapse
|
7
|
Joosse HJ, Chumsaeng-Reijers C, Huisman A, Hoefer IE, van Solinge WW, Haitjema S, van Es B. Haematology dimension reduction, a large scale application to regular care haematology data. BMC Med Inform Decis Mak 2025; 25:75. [PMID: 39939843 PMCID: PMC11823074 DOI: 10.1186/s12911-025-02899-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 01/28/2025] [Indexed: 02/14/2025] Open
Abstract
BACKGROUND The routine diagnostic process increasingly entails the processing of high-volume and high-dimensional data that cannot be directly visualised. This processing may provide scaling issues that limit the implementation of these types of data into research as well as integrated diagnostics in routine care. Here, we investigate whether we can use existing dimension reduction techniques to provide visualisations and analyses for a complete bloodcount (CBC) while maintaining representativeness of the original data. We considered over 3 million CBC measurements encompassing over 70 parameters of cell frequency, size and complexity from the UMC Utrecht UPOD database. We evaluated PCA as an example of a linear dimension reduction techniques and UMAP, TriMap and PaCMAP as non-linear dimension reduction techniques. We assessed their technical performance using quality metrics for dimension reduction as well as biological representation by evaluating preservation of diurnal, age and sex patterns, cluster preservation and the identification of leukemia patients. RESULTS We found that, for clinical hematology data, PCA performs systematically better than UMAP, TriMap and PaCMAP in representing the underlying data. Biological relevance was retained for periodicity in the data. However, we also observed a decrease in predictive performance of the reduced data for both age and sex, as well as an overestimation of clusters within the reduced data. Finally, we were able to identify the diverging patterns for leukemia patients after use of dimensionality reduction methods. CONCLUSIONS We conclude that for hematology data, the use of unsupervised dimension reduction techniques should be limited to data visualization applications, as implementing them in diagnostic pipelines may lead to decreased quality of integrated diagnostics in routine care.
Collapse
Affiliation(s)
- Huibert-Jan Joosse
- Central Diagnostic Laboratory, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, 3584 CX, The Netherlands
| | - Chontira Chumsaeng-Reijers
- Central Diagnostic Laboratory, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, 3584 CX, The Netherlands
| | - Albert Huisman
- Central Diagnostic Laboratory, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, 3584 CX, The Netherlands
| | - Imo E Hoefer
- Central Diagnostic Laboratory, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, 3584 CX, The Netherlands
| | - Wouter W van Solinge
- Central Diagnostic Laboratory, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, 3584 CX, The Netherlands
| | - Saskia Haitjema
- Central Diagnostic Laboratory, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, 3584 CX, The Netherlands
| | - Bram van Es
- Central Diagnostic Laboratory, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, 3584 CX, The Netherlands.
| |
Collapse
|
8
|
Xiao S, Alshahrani M, Hu G, Tao P, Verkhivker G. Accurate Characterization of the Allosteric Energy Landscapes, Binding Hotspots and Long-Range Communications for KRAS Complexes with Effector Proteins : Integrative Approach Using Microsecond Molecular Dynamics, Deep Mutational Scanning of Binding Energetics and Allosteric Network Modeling. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.27.635141. [PMID: 39975035 PMCID: PMC11838311 DOI: 10.1101/2025.01.27.635141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
KRAS is a pivotal oncoprotein that regulates cell proliferation and survival through interactions with downstream effectors such as RAF1. Oncogenic mutations in KRAS, including G12V, G13D, and Q61R, drive constitutive activation and hyperactivation of signaling pathways, contributing to cancer progression. Despite significant advances in understanding KRAS biology, the structural and dynamic mechanisms of KRAS binding and allostery by which oncogenic mutations enhance KRAS-RAF1 binding and signaling remain incompletely understood. In this study, we employ microsecond molecular dynamics simulations, Markov State Modeling, mutational scanning and binding free energy calculations together with dynamic network modeling to elucidate the effect of KRAS mutations and characterize the thermodynamic and allosteric drivers and hotspots of KRAS binding and oncogenic activation. Our simulations revealed that oncogenic mutations stabilize the open active conformation of KRAS by differentially modulating the flexibility of the switch I and switch II regions, thereby enhancing RAF1 binding affinity. The G12V mutation rigidifies both switch I and switch II, locking KRAS in a stable, active state. In contrast, the G13D mutation moderately reduces switch I flexibility while increasing switch II dynamics, restoring a balance between stability and flexibility. The Q61R mutation induces a more complex conformational landscape, characterized by the increased switch II flexibility and expansion of functional macrostates, which promotes prolonged RAF1 binding and signaling. Mutational scanning of KRAS-RAF1 complexes identified key binding affinity hotspots, including Y40, E37, D38, and D33, and together with the MM-GBSA analysis revealed the hotspots leverage synergistic electrostatic and hydrophobic binding interactions in stabilizing the KRAS-RAF1 complexes. Network-based analysis of allosteric communication identifies critical KRAS residues (e.g., L6, E37, D57, R97) that mediate long-range interactions between the KRAS core and the RAF1 binding interface. The central β-sheet of KRAS emerges as a hub for transmitting conformational changes, linking distant functional sites and facilitating allosteric regulation. Strikingly, the predicted allosteric hotspots align with experimentally identified allosteric binding hotspots that define the energy landscape of KRAS allostery. This study highlights the power of integrating computational modeling with experimental data to unravel the complex dynamics of KRAS and its mutants. The identification of binding hotspots and allosteric communication routes offers new opportunities for developing targeted therapies to disrupt KRAS-RAF1 interactions and inhibit oncogenic signaling. Our results underscore the potential of computational approaches to guide the design of allosteric inhibitors and mutant-specific therapies for KRAS-driven cancers.
Collapse
|
9
|
Mondal K, Klauda JB. Physically interpretable performance metrics for clustering. J Chem Phys 2024; 161:244106. [PMID: 39723706 DOI: 10.1063/5.0241122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Accepted: 11/21/2024] [Indexed: 12/28/2024] Open
Abstract
Clustering is a type of machine learning technique, which is used to group huge amounts of data based on their similarity into separate groups or clusters. Clustering is a very important task that is nowadays used to analyze the huge and diverse amount of data coming out of molecular dynamics (MD) simulations. Typically, the data from the MD simulations in terms of their various frames in the trajectory are clustered into different groups and a representative element from each group is studied separately. Now, a very important question coming in this process is: what is the quality of the clusters that are obtained? There are several performance metrics that are available in the literature such as the silhouette index and the Davies-Bouldin Index that are often used to analyze the quality of clustering. However, most of these metrics focus on the overlap or the similarity of the clusters in the reduced dimension that is used for clustering and do not focus on the physically important properties or the parameters of the system. To address this issue, we have developed two physically interpretable scoring metrics that focus on the physical parameters of the system that we are analyzing. We have used and tested our algorithm on three different systems: (1) Ising model, (2) peptide folding and unfolding of WT HP35, (3) a protein-ligand trajectory of an enzyme and substrate, and (4) a protein-ligand dissociated trajectory. We show that the scoring metrics provide us clusters that match with our physical intuition about the systems.
Collapse
Affiliation(s)
- Kinjal Mondal
- Institute for Physical Science and Technology, Biophysics Program, University of Maryland, College Park, Maryland 20742, USA
| | - Jeffery B Klauda
- Institute for Physical Science and Technology, Biophysics Program, University of Maryland, College Park, Maryland 20742, USA
- Department of Chemical and Biomolecular Engineering, University of Maryland, College Park, Maryland 20742, USA
| |
Collapse
|
10
|
Chen L, Smith M, Roe DR, Miranda-Quintana RA. Extended Quality (eQual): Radial threshold clustering based on n-ary similarity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.05.627001. [PMID: 39677679 PMCID: PMC11643124 DOI: 10.1101/2024.12.05.627001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/17/2024]
Abstract
We are transforming Radial Threshold Clustering (RTC), an O ( N 2 ) algorithm, into Extended Quality Clustering, an O ( N ) algorithm with several novel features. Daura et al's RTC algorithm is a partitioning clustering algorithm that groups similar frames together based on their similarity to the seed configuration. Two current issues with RTC is that it scales as O ( N 2 ) making it inefficient at high frame counts, and the clustering results are dependent on the order of the input frames. To address the first issue, we have increased the speed of the seed selection by using k -means++ to select the seeds of the available frames. To address the second issue and make the results invariant with respect to frame ordering, whenever there is a tie in the most populated cluster, the densest and most compact cluster is chosen using the extended similarity indices. The new algorithm is able to cluster in linear time and produce more compact and separate clusters.
Collapse
Affiliation(s)
- Lexin Chen
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, USA
- Quantum Theory Project, University of Florida, Gainesville, Florida 32611, USA
| | - Micah Smith
- Institute for Bioscience and Biotechnology Research, National Institute of Standards and Technology and the University of Maryland, Rockville, MD 20850, USA
| | - Daniel R Roe
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Ramón Alain Miranda-Quintana
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, USA
- Quantum Theory Project, University of Florida, Gainesville, Florida 32611, USA
| |
Collapse
|
11
|
López-Pérez K, Avellaneda-Tamayo JF, Chen L, López-López E, Juárez-Mercado KE, Medina-Franco JL, Miranda-Quintana RA. Molecular similarity: Theory, applications, and perspectives. ARTIFICIAL INTELLIGENCE CHEMISTRY 2024; 2:100077. [PMID: 40124654 PMCID: PMC11928018 DOI: 10.1016/j.aichem.2024.100077] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/25/2025]
Abstract
Molecular similarity pervades much of our understanding and rationalization of chemistry. This has become particularly evident in the current data-intensive era of chemical research, with similarity measures serving as the backbone of many Machine Learning (ML) supervised and unsupervised procedures. Here, we present a discussion on the role of molecular similarity in drug design, chemical space exploration, chemical "art" generation, molecular representations, and many more. We also discuss more recent topics in molecular similarity, like the ability to efficiently compare large molecular libraries.
Collapse
Affiliation(s)
- Kenneth López-Pérez
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, FL 32611, USA
| | - Juan F. Avellaneda-Tamayo
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico
| | - Lexin Chen
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, FL 32611, USA
| | - Edgar López-López
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico
- Department of Chemistry and Graduate Program in Pharmacology, Center for Research and Advanced Studies of the National Polytechnic Institute, Section 14-740, Mexico City 07000, Mexico
| | - K. Eurídice Juárez-Mercado
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico
| | - José L. Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico
| | | |
Collapse
|
12
|
Panda G, Ray A. Deciphering Cas9 specificity: Role of domain dynamics and RNA:DNA hybrid interactions revealed through machine learning and accelerated molecular simulations. Int J Biol Macromol 2024; 283:137835. [PMID: 39566771 DOI: 10.1016/j.ijbiomac.2024.137835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Revised: 11/04/2024] [Accepted: 11/17/2024] [Indexed: 11/22/2024]
Abstract
CRISPR/Cas9 technology is widely used for gene editing, but off-targeting still remains a major concern in therapeutic applications. Although Cas9 variants with better mismatch discrimination have been developed, they have significantly lower rates of on-target DNA cleavage. This study compares the dynamics of the highly specific Cas9 from Francisella novicida (FnCas9) to the commonly used SpCas9. Using long-scale atomistic Gaussian accelerated molecular dynamic simulations and machine learning techniques, we deciphered the structural factors behind FnCas9's higher specificity in native and off-target forms. Our analysis revealed that Cas9's cleavage specificity relies more on its domain rearrangement than on RNA:DNA heteroduplex shape, with significant conformational variations in Cas9 domains among off-target forms, while the RNA:DNA hybrid showed minimal changes, especially in FnCas9 compared to SpCas9. REC1-REC3 domains contacts with the RNA:DNA hybrid in FnCas9 acted as critical discriminator of off-target effects playing a pivotal role in influencing specificity. In FnCas9, allosteric signal transmission involves the REC3 and HNH domain, bypassing REC2, leading to a superior efficiency in information transmission. This study offers a quantitative framework for understanding the structural basis of elevated specificity, paving the way for the rational design of Cas9 variants with improved precision and specificity in genome editing applications.
Collapse
Affiliation(s)
- Gayatri Panda
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Arjun Ray
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.
| |
Collapse
|
13
|
França VLB, Amaral JL, do Ó Pessoa C, Carvalho HF, Freire VN. Shedding light on cancer immunology at the molecular level: A quantum biochemistry study of representative PD-1/PD-L1 conformations. Biochem Biophys Res Commun 2024; 735:150832. [PMID: 39423575 DOI: 10.1016/j.bbrc.2024.150832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Revised: 09/06/2024] [Accepted: 10/12/2024] [Indexed: 10/21/2024]
Abstract
BACKGROUND Programmed death 1 (PD-1) binding to PD-L1 is a potent mechanism used by immunogenic tumors to evade the immune system and the immune checkpoint PD-1PD-L1 has emerged as a promising target in the search for new drugs to improve cancer treatment. The crystallographic structure of humanPD-1humanPD-L1 shed light on the molecular characterization of this system and allowed computational studies to be carried out to characterize structural behaviors. METHODS This study demonstrated the importance of analyzing the flexibility of protein systems through molecular dynamics simulations (MDS) and its impacts on the interaction energy obtained through quantum biochemistry. RESULTS The computational results obtained provide a description of the flexibility and energetic profile of the PD-1PD-L1 contact surface using representative conformations from MDS. Variations of up to 50 % in the total interaction energy values were detected depending on the scrutinized conformation, which can be mainly attributed to the flexibility of the CC' loop, FG loop and ASP85-GLN91 of PD-1 and the MET58-LYS62 segment of PD-L1. Quantum biochemistry revealed the three hot spots in PD-L1: ARG113L-ARG125L > ILE54L-VAL76L > ALA18L-ASP26L; and two energetic hot spots in PD-1: ALA125-ARG139 > VAL63-GLN88. Nonetheless, VAL63-GLN88 and GLY124-ARG139 exhibit significant variation in interaction energy between different conformations, while ARG113L-ARG125L is the only hot spot with high energetic fluctuation on the PD-L1 surface. CONCLUSION This is the first application of MDS coupled to dimensionality reduction and density functional theory (DFT) demonstrating new structural and energetic features that might be useful in discovering/designing more potent PD-1PD-L1 inhibitors.
Collapse
Affiliation(s)
- Victor L B França
- Department of Physiology and Pharmacology, Federal University of Ceará, 60430-270, Fortaleza, Ceará, Brazil; Department of Physics, Federal University of Ceará, Fortaleza, 60440-900, Brazil
| | - Jackson L Amaral
- Department of Biological Sciences, Federal University of Piauí, Bom Jesus, 64900-000, Brazil.
| | - Cláudia do Ó Pessoa
- Department of Physiology and Pharmacology, Federal University of Ceará, Fortaleza, 60430-275, Brazil
| | - Hernandes F Carvalho
- Department of Structural and Functional Biology, Institute of Biology, State University of Campinas, 13083-864, Campinas, São Paulo, Brazil
| | - Valder N Freire
- Department of Physics, Federal University of Ceará, Fortaleza, 60440-900, Brazil
| |
Collapse
|
14
|
Lara-Ramírez EE, Rivera G, Oliva-Hernández AA, Bocanegra-Garcia V, López JA, Guo X. Unsupervised learning analysis on the proteomes of Zika virus. PeerJ Comput Sci 2024; 10:e2443. [PMID: 39650519 PMCID: PMC11623125 DOI: 10.7717/peerj-cs.2443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2024] [Accepted: 10/01/2024] [Indexed: 12/11/2024]
Abstract
Background The Zika virus (ZIKV), which is transmitted by mosquito vectors to nonhuman primates and humans, causes devastating outbreaks in the poorest tropical regions of the world. Molecular epidemiology, supported by clustering phylogenetic gold standard studies using sequence data, has provided valuable information for tracking and controlling the spread of ZIKV. Unsupervised learning (UL), a form of machine learning algorithm, can be applied on the datasets without the need of known information for training. Methods In this work, unsupervised Random Forest (URF), followed by the application of dimensional reduction algorithms such as principal component analysis (PCA), Uniform Manifold Approximation and Projection (UMAP), t-distributed stochastic neighbor embedding (t-SNE), and autoencoders were used to uncover hidden patterns from polymorphic amino acid sites extracted on the proteome ZIKV multi-alignments, without the need of an underlying evolutionary model. Results The four UL algorithms revealed specific host and geographical clustering patterns for ZIKV. Among the four dimensionality reduction (DR) algorithms, the performance was better for UMAP. The four algorithms allowed the identification of imported viruses for specific geographical clusters. The UL dimension coordinates showed a significant correlation with phylogenetic tree branch lengths and significant phylogenetic dependence in Abouheif's Cmean and Pagel's Lambda tests (p value < 0.01) that showed comparable performance with the phylogenetic method. This analytical strategy was generalizable to an external large dengue type 2 dataset. Conclusion These UL algorithms could be practical evolutionary analytical techniques to track the dispersal of viral pathogens.
Collapse
Affiliation(s)
- Edgar E. Lara-Ramírez
- Laboratorio de Biotecnología Farmacéutica, Centro de Biotecnología Genómica, Instituto Politécnico Nacional, Reynosa, Tamaulipas, México
| | - Gildardo Rivera
- Laboratorio de Biotecnología Farmacéutica, Centro de Biotecnología Genómica, Instituto Politécnico Nacional, Reynosa, Tamaulipas, México
| | - Amanda Alejandra Oliva-Hernández
- Laboratorio de Biotecnología Experimental, Centro de Biotecnología Genómica, Instituto Politécnico Nacional, Reynosa, Tamaulipas, México
| | - Virgilio Bocanegra-Garcia
- Laboratorio de Interacción Ambiente Microorganismo, Centro de Biotecnología Genómica, Instituto Politécnico Nacional, Reynosa, Tamaulipas, México
| | - Jesús Adrián López
- Laboratorio de microRNAs y Cáncer, Unidad Académica de Ciencias Biológicas, Universidad Autónoma de Zacatecas, Zacatecas, Zacatecas, México
| | - Xianwu Guo
- Laboratorio de Biotecnología Genómica, Centro de Biotecnología Genómica, Instituto Politécnico Nacional, Reynosa, Tamaulipas, México
| |
Collapse
|
15
|
Zhu K, Han Y, Jian Y, Jiang G, Lu D, Liu Z. Anionic cardiolipin stabilizes the transmembrane region of hyaluronan synthase and promotes catalysis-relevant dynamics. Arch Biochem Biophys 2024; 761:110165. [PMID: 39332577 DOI: 10.1016/j.abb.2024.110165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 09/21/2024] [Accepted: 09/24/2024] [Indexed: 09/29/2024]
Abstract
Hyaluronic acid (HA) is a glycosaminoglycan essential for cellular processes and finding increasingly applications in medicine, pharmaceuticals, and cosmetics. While membrane-integrated Class I hyaluronan synthase (HAS) catalyzes HA synthesis in most organisms, the molecular mechanisms by which HAS-lipid interactions impact HAS catalysis remain unclear. This study employed coarse-grained molecular dynamics simulation combined with dimensionality reduction to uncover the interplay between lipids and Streptococcus equisimilis HAS (SeHAS). A minimum of 67 % cardiolipin is necessary for HA synthesis, as determined through simulations using gradient-composed membranes. The anionic cardiolipin stabilizes the cationic transmembrane regions of SeHAS and thereby maintains its conformation. Moreover, the highly dynamic cardiolipin is required to modulate the catalysis-relevant motions in HAS and thus facilitate HA synthesis. These findings provide molecular insights essential not only for understanding the physiological functions of HAS, but also for the development of cell factories and enzyme catalysts for HA production.
Collapse
Affiliation(s)
- Kaiyi Zhu
- Key Lab of Industrial Biocatalysis, Ministry of Education, Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Yilei Han
- Key Lab of Industrial Biocatalysis, Ministry of Education, Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Yupei Jian
- Key Lab of Industrial Biocatalysis, Ministry of Education, Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Guoqiang Jiang
- Key Lab of Industrial Biocatalysis, Ministry of Education, Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Diannan Lu
- Key Lab of Industrial Biocatalysis, Ministry of Education, Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Zheng Liu
- Key Lab of Industrial Biocatalysis, Ministry of Education, Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
16
|
Djikic-Stojsic T, Bret G, Blond G, Girard N, Le Guen C, Marsol C, Schmitt M, Schneider S, Bihel F, Bonnet D, Gulea M, Kellenberger E. The IMS Library: from IN-Stock to Virtual. ChemMedChem 2024; 19:e202400381. [PMID: 39031900 DOI: 10.1002/cmdc.202400381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2024] [Revised: 06/18/2024] [Accepted: 06/19/2024] [Indexed: 07/22/2024]
Abstract
A chemical library is a key element in the early stages of pharmaceutical research. Its design encompasses various factors, such as diversity, size, ease of synthesis, aimed at increasing the likelihood of success in drug discovery. This article explores the collaborative efforts of computational and synthetic chemists in tailoring chemical libraries for cost-effective and resource-efficient use, particularly in the context of academic research projects. It proposes chemoinformatics methodologies that address two pivotal questions: first, crafting a diverse panel of under 1000 compounds from an existing pool through synthetic efforts, leveraging the expertise of organic chemists; and second, expanding pharmacophoric diversity within this panel by creating a highly accessible virtual chemical library. Chemoinformatics tools were developed to analyse initial panel of about 10,000 compounds into two tailored libraries: eIMS and vIMS. The eIMS Library comprises 578 diverse in-stock compounds ready for screening. Its virtual counterpart, vIMS, features novel compounds guided by chemists, ensuring synthetic accessibility. vIMS offers a broader array of binding motifs and improved drug-like characteristics achieved through the addition of diverse functional groups to eIMS scaffolds followed by filtering of reactive or unusual structures. The uniqueness of vIMS is emphasized through a comparison with commercial suppliers' virtual chemical space.
Collapse
Affiliation(s)
- Teodora Djikic-Stojsic
- Laboratoire d'Innovation Thérapeutique, UMR7200 CNRS - Université de Strasbourg, Faculté de Pharmacie, 74 route du Rhin, Illkirch-Graffenstaden, 67400, France
| | - Guillaume Bret
- Laboratoire d'Innovation Thérapeutique, UMR7200 CNRS - Université de Strasbourg, Faculté de Pharmacie, 74 route du Rhin, Illkirch-Graffenstaden, 67400, France
| | - Gaëlle Blond
- Laboratoire d'Innovation Thérapeutique, UMR7200 CNRS - Université de Strasbourg, Faculté de Pharmacie, 74 route du Rhin, Illkirch-Graffenstaden, 67400, France
| | - Nicolas Girard
- Laboratoire d'Innovation Thérapeutique, UMR7200 CNRS - Université de Strasbourg, Faculté de Pharmacie, 74 route du Rhin, Illkirch-Graffenstaden, 67400, France
| | - Clothilde Le Guen
- Laboratoire d'Innovation Thérapeutique, UMR7200 CNRS - Université de Strasbourg, Faculté de Pharmacie, 74 route du Rhin, Illkirch-Graffenstaden, 67400, France
- Inovarion, 251 rue St Jacques, Paris, 75005, France
| | - Claire Marsol
- Laboratoire d'Innovation Thérapeutique, UMR7200 CNRS - Université de Strasbourg, Faculté de Pharmacie, 74 route du Rhin, Illkirch-Graffenstaden, 67400, France
| | - Martine Schmitt
- Laboratoire d'Innovation Thérapeutique, UMR7200 CNRS - Université de Strasbourg, Faculté de Pharmacie, 74 route du Rhin, Illkirch-Graffenstaden, 67400, France
| | - Séverine Schneider
- Laboratoire d'Innovation Thérapeutique, UMR7200 CNRS - Université de Strasbourg, Faculté de Pharmacie, 74 route du Rhin, Illkirch-Graffenstaden, 67400, France
| | - Frederic Bihel
- Laboratoire d'Innovation Thérapeutique, UMR7200 CNRS - Université de Strasbourg, Faculté de Pharmacie, 74 route du Rhin, Illkirch-Graffenstaden, 67400, France
| | - Dominique Bonnet
- Laboratoire d'Innovation Thérapeutique, UMR7200 CNRS - Université de Strasbourg, Faculté de Pharmacie, 74 route du Rhin, Illkirch-Graffenstaden, 67400, France
| | - Mihaela Gulea
- Laboratoire d'Innovation Thérapeutique, UMR7200 CNRS - Université de Strasbourg, Faculté de Pharmacie, 74 route du Rhin, Illkirch-Graffenstaden, 67400, France
| | - Esther Kellenberger
- Laboratoire d'Innovation Thérapeutique, UMR7200 CNRS - Université de Strasbourg, Faculté de Pharmacie, 74 route du Rhin, Illkirch-Graffenstaden, 67400, France
| |
Collapse
|
17
|
França VLB, Bezerra EM, da Costa RF, Carvalho HF, Freire VN, Matos G. Alzheimer's Disease Immunotherapy and Mimetic Peptide Design for Drug Development: Mutation Screening, Molecular Dynamics, and a Quantum Biochemistry Approach Focusing on Aducanumab::Aβ2-7 Binding Affinity. ACS Chem Neurosci 2024; 15:3543-3562. [PMID: 39302203 PMCID: PMC11450751 DOI: 10.1021/acschemneuro.4c00453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 09/06/2024] [Accepted: 09/09/2024] [Indexed: 09/22/2024] Open
Abstract
Seven treatments are approved for Alzheimer's disease, but five of them only relieve symptoms and do not alter the course of the disease. Aducanumab (Adu) and lecanemab are novel disease-modifying antiamyloid-β (Aβ) human monoclonal antibodies that specifically target the pathophysiology of Alzheimer's disease (AD) and were recently approved for its treatment. However, their administration is associated with serious side effects, and their use is limited to early stages of the disease. Therefore, drug discovery remains of great importance in AD research. To gain new insights into the development of novel drugs for Alzheimer's disease, a combination of techniques was employed, including mutation screening, molecular dynamics, and quantum biochemistry. These were used to outline the interfacial interactions of the Aducanumab::Aβ2-7 complex. Our analysis identified critical stabilizing contacts, revealing up to 40% variation in the affinity of the Adu chains for Aβ2-7 depending on the conformation outlined. Remarkably, two complementarity determining regions (CDRs) of the Adu heavy chain (HCDR3 and HCDR2) and one CDR of the Adu light chain (LCDR3) accounted for approximately 77% of the affinity of Adu for Aβ2-7, confirming their critical role in epitope recognition. A single mutation, originally reported to have the potential to increase the affinity of Adu for Aβ2-7, was shown to decrease its structural stability without increasing the overall binding affinity. Mimetic peptides that have the potential to inhibit Aβ aggregation were designed by using computational outcomes. Our results support the use of these peptides as promising drugs with great potential as inhibitors of Aβ aggregation.
Collapse
Affiliation(s)
- Victor L. B. França
- Department
of Physiology and Pharmacology, Federal
University of Ceará, 60430-270 Fortaleza, Ceará, Brazil
| | - Eveline M. Bezerra
- Department
of Sciences, Mathematics and Statistics, Federal Rural University of Semi-Arid (UFERSA), 59625-900 Mossoró, RN, Brazil
| | - Roner F. da Costa
- Department
of Sciences, Mathematics and Statistics, Federal Rural University of Semi-Arid (UFERSA), 59625-900 Mossoró, RN, Brazil
| | - Hernandes F. Carvalho
- Department
of Structural and Functional Biology, Institute of Biology, State University of Campinas, 13083-864 Campinas, São
Paulo, Brazil
| | - Valder N. Freire
- Department
of Physics, Federal University of Ceará, 60430-270 Fortaleza, Ceará, Brazil
| | - Geanne Matos
- Department
of Physiology and Pharmacology, Federal
University of Ceará, 60430-270 Fortaleza, Ceará, Brazil
| |
Collapse
|
18
|
Omwansu W, Musembi R, Derese S. Graph-based analysis of H-bond networks and unsupervised learning reveal conformational coupling in prion peptide segments. Phys Chem Chem Phys 2024. [PMID: 39291469 DOI: 10.1039/d4cp02123a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/19/2024]
Abstract
In this study, we employed a comprehensive computational approach to investigate the physical chemistry of the water networks surrounding hydrated peptide segments, as derived from molecular dynamics simulations. Our analysis uncovers a complex interplay of direct and water-mediated hydrogen bonds that intricately weave through the peptides. We demonstrate that these hydrogen bond networks encode critical information about the peptides' conformational behavior, with the dimensionality of these networks showing sensitivity to the peptides' conformations. Additionally, we estimated the free-energy landscape of the peptides across various conformations, revealing that their structures are predominantly characterized by unfolded, partially folded, and folded configurations, resulting in broad and rugged free-energy surfaces due to the numerous degrees of freedom contributed by the surrounding solvent. Importantly, the structured nature of this free-energy landscape becomes obscured when conventional collective variables, such as the number of hydrogen bonds, are used. Our findings provide new insights into the molecular mechanisms that couple protein and solvent degrees of freedom, highlighting their significance in the functioning of biological systems.
Collapse
Affiliation(s)
- Wycliffe Omwansu
- Department of Physics, University of Nairobi, P.O. Box 30197-00100, Nairobi, Kenya.
- The Abdus Salam International Centre for Theoretical Physics, Strada Costiera 11, 34151 Trieste, Italy
| | - Robinson Musembi
- Department of Physics, University of Nairobi, P.O. Box 30197-00100, Nairobi, Kenya.
| | - Solomon Derese
- Department of Chemistry, University of Nairobi, P.O. Box 30197-00100, Nairobi, Kenya
| |
Collapse
|
19
|
Jin Y, Perez-Lemus GR, Zubieta Rico PF, de Pablo JJ. Improving Machine Learned Force Fields for Complex Fluids through Enhanced Sampling: A Liquid Crystal Case Study. J Phys Chem A 2024; 128:7257-7268. [PMID: 39150905 DOI: 10.1021/acs.jpca.4c01546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/18/2024]
Abstract
Machine learned force fields offer the potential for faster execution times while retaining the accuracy of traditional DFT calculations, making them promising candidates for molecular simulations in cases where reliable classical force fields are not available. Some of the challenges associated with machine learned force fields include simulation stability over extended periods of time and ensuring that the statistical and dynamical properties of the underlying simulated systems are correctly captured. In this work, we propose a systematic training pipeline for such force fields that leads to improved model quality, compared to that achieved by traditional data generation and training approaches. That pipeline relies on the use of enhanced sampling techniques, and it is demonstrated here in the context of a liquid crystal, which exemplifies many of the challenges that are encountered in fluids and materials with complex free energy landscapes. Our results indicate that, whereas the majority of traditional machine learned force field training approaches lead to molecular dynamics simulations that are only stable over hundred-picosecond trajectories, our approach allows for stable simulations over tens of nanoseconds for organic molecular systems comprising thousands of atoms.
Collapse
Affiliation(s)
- Yezhi Jin
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637-1476, United States
| | - Gustavo R Perez-Lemus
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637-1476, United States
| | - Pablo F Zubieta Rico
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637-1476, United States
| | - Juan J de Pablo
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637-1476, United States
| |
Collapse
|
20
|
Bakker MJ, Gaffour A, Juhás M, Zapletal V, Stošek J, Bratholm LA, Pavlíková Přecechtělová J. Streamlining NMR Chemical Shift Predictions for Intrinsically Disordered Proteins: Design of Ensembles with Dimensionality Reduction and Clustering. J Chem Inf Model 2024; 64:6542-6556. [PMID: 39099394 PMCID: PMC11412307 DOI: 10.1021/acs.jcim.4c00809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/06/2024]
Abstract
By merging advanced dimensionality reduction (DR) and clustering algorithm (CA) techniques, our study advances the sampling procedure for predicting NMR chemical shifts (CS) in intrinsically disordered proteins (IDPs), making a significant leap forward in the field of protein analysis/modeling. We enhance NMR CS sampling by generating clustered ensembles that accurately reflect the different properties and phenomena encapsulated by the IDP trajectories. This investigation critically assessed different rapid CS predictors, both neural network (e.g., Sparta+ and ShiftX2) and database-driven (ProCS-15), and highlighted the need for more advanced quantum calculations and the subsequent need for more tractable-sized conformational ensembles. Although neural network CS predictors outperformed ProCS-15 for all atoms, all tools showed poor agreement with HN CSs, and the neural network CS predictors were unable to capture the influence of phosphorylated residues, highly relevant for IDPs. This study also addressed the limitations of using direct clustering with collective variables, such as the widespread implementation of the GROMOS algorithm. Clustered ensembles (CEs) produced by this algorithm showed poor performance with chemical shifts compared to sequential ensembles (SEs) of similar size. Instead, we implement a multiscale DR and CA approach and explore the challenges and limitations of applying these algorithms to obtain more robust and tractable CEs. The novel feature of this investigation is the use of solvent-accessible surface area (SASA) as one of the fingerprints for DR alongside previously investigated α carbon distance/angles or ϕ/ψ dihedral angles. The ensembles produced with SASA tSNE DR produced CEs better aligned with the experimental CS of between 0.17 and 0.36 r2 (0.18-0.26 ppm) depending on the system and replicate. Furthermore, this technique produced CEs with better agreement than traditional SEs in 85.7% of all ensemble sizes. This study investigates the quality of ensembles produced based on different input features, comparing latent spaces produced by linear vs nonlinear DR techniques and a novel integrated silhouette score scanning protocol for tSNE DR.
Collapse
Affiliation(s)
- Michael J Bakker
- Faculty of Pharmacy in Hradec Králové, Charles University, Akademika Heyrovského 1203/8, 500 05 Hradec Králové, Czech Republic
| | - Amina Gaffour
- Faculty of Pharmacy in Hradec Králové, Charles University, Akademika Heyrovského 1203/8, 500 05 Hradec Králové, Czech Republic
| | - Martin Juhás
- Faculty of Pharmacy in Hradec Králové, Charles University, Akademika Heyrovského 1203/8, 500 05 Hradec Králové, Czech Republic
- Department of Chemistry, Faculty of Science, University of Hradec Králové, Rokitanského 62, 500 03 Hradec Králové, Czech Republic
| | - Vojtěch Zapletal
- Faculty of Pharmacy in Hradec Králové, Charles University, Akademika Heyrovského 1203/8, 500 05 Hradec Králové, Czech Republic
| | - Jakub Stošek
- Faculty of Pharmacy in Hradec Králové, Charles University, Akademika Heyrovského 1203/8, 500 05 Hradec Králové, Czech Republic
- Department of Chemistry, Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic
| | - Lars A Bratholm
- School of Chemistry, University of Bristol, Cantock's Close, BS8 1TS Bristol, U.K
| | - Jana Pavlíková Přecechtělová
- Faculty of Pharmacy in Hradec Králové, Charles University, Akademika Heyrovského 1203/8, 500 05 Hradec Králové, Czech Republic
| |
Collapse
|
21
|
Dong J, Wang S, Cui W, Sun X, Guo H, Yan H, Vogel H, Wang Z, Yuan S. Machine Learning Deciphered Molecular Mechanistics with Accurate Kinetic and Thermodynamic Prediction. J Chem Theory Comput 2024; 20:4499-4513. [PMID: 38394691 DOI: 10.1021/acs.jctc.3c01412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2024]
Abstract
Time-lagged independent component analysis (tICA) and the Markov state model (MSM) have been extensively employed for extracting conformational dynamics and kinetic community networks from unbiased trajectory ensembles. However, these techniques may not be the optimal choice for elucidating transition mechanisms within low-dimensional representations, especially for intricate biosystems. Unraveling the association mechanism in such complex systems always necessitates permutations of several essential independent components or collective variables, a process that is inherently obscure and may require empirical knowledge for selection. To address these challenges, we have implemented an integrated unsupervised dimension reduction model: uniform manifold approximation and projection (UMAP) with hierarchy density-based spatial clustering of applications with noise (HDBSCAN). This approach effectively generates low-dimensional configurational embeddings. The hierarchical application of this architecture, in conjunction with MSM, reveals global kinetic connectivity while identifying local conformational states. Consequently, our methodology establishes a multiscale mechanistic elucidation framework. Leveraging the benefits of the uniform sample distribution and a denoising approach, our model demonstrates robustness in preserving global and local data structures compared to traditional dimension reduction methods in the field of MD analysis area. The interpretability of hyperparameter selection and compatibility with downstream tasks are cross-validated across various simulation data sets, utilizing both computational evaluation metrics and experimental kinetic observables. Furthermore, the predicted Mcl1-BH3 association kinetics (0.76 s-1) is in close agreement with surface plasmon resonance experiments (0.12 s-1), affirming the plausibility of the identified pathway composed of representative conformations. We anticipate that the devised workflow will serve as a foundational framework for studying recognition patterns in complex biological systems. Its contributions extend to the exploration of protein functional dynamics and rational drug design, offering a potent avenue for advancing research in these domains.
Collapse
Affiliation(s)
- Junlin Dong
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shiyu Wang
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- AlphaMol Science Ltd, Shenzhen 518055, China
| | - Wenqiang Cui
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiaolin Sun
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Haojie Guo
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Hailu Yan
- School of Biological Sciences, College of Science and Engineering, University of Edinburgh, Edinburgh EH8 9YL, U.K
| | - Horst Vogel
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Zhi Wang
- Artificial Intelligence Department, Zhejiang Financial College, Hangzhou 310018, China
| | - Shuguang Yuan
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- AlphaMol Science Ltd, Shenzhen 518055, China
| |
Collapse
|
22
|
Schilter O, Gutierrez DP, Folkmann LM, Castrogiovanni A, García-Durán A, Zipoli F, Roch LM, Laino T. Combining Bayesian optimization and automation to simultaneously optimize reaction conditions and routes. Chem Sci 2024; 15:7732-7741. [PMID: 38784737 PMCID: PMC11110165 DOI: 10.1039/d3sc05607d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 04/05/2024] [Indexed: 05/25/2024] Open
Abstract
Reaching optimal reaction conditions is crucial to achieve high yields, minimal by-products, and environmentally sustainable chemical reactions. With the recent rise of artificial intelligence, there has been a shift from traditional Edisonian trial-and-error optimization to data-driven and automated approaches, which offer significant advantages. Here, we showcase the capabilities of an integrated platform; we conducted simultaneous optimizations of four different terminal alkynes and two reaction routes using an automation platform combined with a Bayesian optimization platform. Remarkably, we achieved a conversion rate of over 80% for all four substrates in 23 experiments, covering ca. 0.2% of the combinatorial space. Further analysis allowed us to identify the influence of different reaction parameters on the reaction outcomes, demonstrating the potential for expedited reaction condition optimization and the prospect of more efficient chemical processes in the future.
Collapse
Affiliation(s)
- Oliver Schilter
- IBM Research Europe Säumerstrasse 4 8803 Rüschlikon Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis) Switzerland
| | | | - Linnea M Folkmann
- Atinary Technologies Route de la Corniche 4 1066 Epalinges Switzerland
| | | | | | - Federico Zipoli
- IBM Research Europe Säumerstrasse 4 8803 Rüschlikon Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis) Switzerland
| | - Loïc M Roch
- Atinary Technologies Route de la Corniche 4 1066 Epalinges Switzerland
| | - Teodoro Laino
- IBM Research Europe Säumerstrasse 4 8803 Rüschlikon Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis) Switzerland
| |
Collapse
|
23
|
Klyshko E, Kim JSH, McGough L, Valeeva V, Lee E, Ranganathan R, Rauscher S. Functional protein dynamics in a crystal. Nat Commun 2024; 15:3244. [PMID: 38622111 PMCID: PMC11018856 DOI: 10.1038/s41467-024-47473-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 04/02/2024] [Indexed: 04/17/2024] Open
Abstract
Proteins are molecular machines and to understand how they work, we need to understand how they move. New pump-probe time-resolved X-ray diffraction methods open up ways to initiate and observe protein motions with atomistic detail in crystals on biologically relevant timescales. However, practical limitations of these experiments demands parallel development of effective molecular dynamics approaches to accelerate progress and extract meaning. Here, we establish robust and accurate methods for simulating dynamics in protein crystals, a nontrivial process requiring careful attention to equilibration, environmental composition, and choice of force fields. With more than seven milliseconds of sampling of a single chain, we identify critical factors controlling agreement between simulation and experiments and show that simulated motions recapitulate ligand-induced conformational changes. This work enables a virtuous cycle between simulation and experiments for visualizing and understanding the basic functional motions of proteins.
Collapse
Affiliation(s)
- Eugene Klyshko
- Department of Physics, University of Toronto, Toronto, ON, Canada
- Department of Chemical and Physical Sciences, University of Toronto Mississauga, Mississauga, ON, Canada
| | - Justin Sung-Ho Kim
- Department of Physics, University of Toronto, Toronto, ON, Canada
- Department of Chemical and Physical Sciences, University of Toronto Mississauga, Mississauga, ON, Canada
| | - Lauren McGough
- Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
| | - Victoria Valeeva
- Department of Chemical and Physical Sciences, University of Toronto Mississauga, Mississauga, ON, Canada
| | - Ethan Lee
- Department of Chemical and Physical Sciences, University of Toronto Mississauga, Mississauga, ON, Canada
- Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Rama Ranganathan
- Center for Physics of Evolving Systems and Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL, USA
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL, USA
| | - Sarah Rauscher
- Department of Physics, University of Toronto, Toronto, ON, Canada.
- Department of Chemical and Physical Sciences, University of Toronto Mississauga, Mississauga, ON, Canada.
- Department of Chemistry, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
24
|
Chen Z, Zhang L, Zhang P, Guo H, Zhang R, Li L, Li X. Prediction of Cytochrome P450 Inhibition Using a Deep Learning Approach and Substructure Pattern Recognition. J Chem Inf Model 2024; 64:2528-2538. [PMID: 37864562 DOI: 10.1021/acs.jcim.3c01396] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2023]
Abstract
Cytochrome P450 (CYP) is a family of enzymes that are responsible for about 75% of all metabolic reactions. Among them, CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4 participate in the metabolism of most drugs and mediate many adverse drug reactions. Therefore, it is necessary to estimate the chemical inhibition of Cytochrome P450 enzymes in drug discovery and the food industry. In the past few decades, many computational models have been reported, and some provided good performance. However, there are still several issues that should be resolved for these models, such as single isoform, models with unbalanced performance, lack of structural characteristics analysis, and poor availability. In the present study, the deep learning models based on python using the Keras framework and TensorFlow were developed for the chemical inhibition of each CYP isoform. These models were established based on a large data set containing 85715 compounds extracted from the PubChem bioassay database. On external validation, the models provided good AUC values with 0.97, 0.94, 0.94, 0.96, and 0.94 for CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A4, respectively. The models can be freely accessed on the Web server named CYPi-DNNpredictor (cypi.sapredictor.cn), and the codes for the model were made open source in the Supporting Information. In addition, we also analyzed the structural characteristics of chemicals with CYP450 inhibition and detected the structural alerts (SAs), which should be responsible for the inhibition. The SAs were also made available online, named CYPi-SAdetector (cypisa.sapredictor.cn). The models can be used as a powerful tool for the prediction of CYP450 inhibitors, and the SAs should provide useful information for the mechanisms of Cytochrome P450 inhibition.
Collapse
Affiliation(s)
- Zhaoyang Chen
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Le Zhang
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Pei Zhang
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Huizhu Guo
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Ruiqiu Zhang
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Ling Li
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| | - Xiao Li
- Department of Clinical Pharmacy, The First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Shandong Engineering and Technology Research Center for Pediatric Drug Development, Shandong Medicine and Health Key Laboratory of Clinical Pharmacy, Jinan 250014, China
| |
Collapse
|
25
|
Hafiz R, Saeed S. Hybrid whale algorithm with evolutionary strategies and filtering for high-dimensional optimization: Application to microarray cancer data. PLoS One 2024; 19:e0295643. [PMID: 38466740 PMCID: PMC10927076 DOI: 10.1371/journal.pone.0295643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 11/28/2023] [Indexed: 03/13/2024] Open
Abstract
The standard whale algorithm is prone to suboptimal results and inefficiencies in high-dimensional search spaces. Therefore, examining the whale optimization algorithm components is critical. The computer-generated initial populations often exhibit an uneven distribution in the solution space, leading to low diversity. We propose a fusion of this algorithm with a discrete recombinant evolutionary strategy to enhance initialization diversity. We conduct simulation experiments and compare the proposed algorithm with the original WOA on thirteen benchmark test functions. Simulation experiments on unimodal or multimodal benchmarks verified the better performance of the proposed RESHWOA, such as accuracy, minimum mean, and low standard deviation rate. Furthermore, we performed two data reduction techniques, Bhattacharya distance and signal-to-noise ratio. Support Vector Machine (SVM) excels in dealing with high-dimensional datasets and numerical features. When users optimize the parameters, they can significantly improve the SVM's performance, even though it already works well with its default settings. We applied RESHWOA and WOA methods on six microarray cancer datasets to optimize the SVM parameters. The exhaustive examination and detailed results demonstrate that the new structure has addressed WOA's main shortcomings. We conclude that the proposed RESHWOA performed significantly better than the WOA.
Collapse
Affiliation(s)
- Rahila Hafiz
- College of Statistical Sciences, University of the Punjab, Lahore, Pakistan
| | - Sana Saeed
- College of Statistical Sciences, University of the Punjab, Lahore, Pakistan
| |
Collapse
|
26
|
Hadad A, França VLB, Crisostomo MW, Brunaldi K, Carvalho HF, Freire VN. Unveiling fructose and glucose binding to human serum albumin: fluorescence measurements and docking, molecular dynamics and quantum biochemistry computations. J Biomol Struct Dyn 2024:1-21. [PMID: 38288929 DOI: 10.1080/07391102.2024.2310211] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Accepted: 01/19/2024] [Indexed: 02/28/2025]
Abstract
This research examines the interaction between human serum albumin (HSA) and various sugar forms (β-D-fructofuranose (FRC), α-D-glucopyranose (GLC), Keto-D-fructose (FRO), Aldehydo-D-glucose (GLO), and modified Aldehydo-D-glucose (GLOm)) using fluorescent spectroscopy, molecular docking simulations, molecular dynamics, protein conformational clusters (EnGens), molecular fractionation with conjugate caps (MFCC) and quantum biochemistry analysis. We analyze molecular and quantum aspects, uncovering interaction energies between sugar atoms and amino acids. Total interaction energy considers protein fragmentation, energetic decomposition, and interaction energy from a bottom-up perspective. Molecular dynamics reveal that unmodified Aldehydo-D-glucose (GLO) escapes HSA binding sites, explaining gradual glycation. We pioneer studying HSA's binding mechanism with glucose and fructose in a 1:1 ratio using long molecular dynamics simulations. Results suggest the transitional GLOm form has a higher Sudlow I site propensity than unmodified glucose, crucial for K195 glycation. FRO and GLOm interaction tendencies move toward a deeper FA7 cavity, near its center. This approach effectively elucidates small molecule binding mechanisms, consistent with previous experimental results.
Collapse
Affiliation(s)
- André Hadad
- Department of Physics, Federal University of Ceará, Fortaleza, Ceará, Brazil
| | - Victor L B França
- Department of Physics, Federal University of Ceará, Fortaleza, Ceará, Brazil
- Department of Physiology and Pharmacology, Faculty of Medicine, Federal University of Ceará, Fortaleza, Ceará, Brazil
| | | | - Kellen Brunaldi
- Department of Physiological Sciences, State University of Maringá, Maringá, Paraná, Brazil
| | - Hernandes F Carvalho
- Department of Structural and Functional Biology, Institute of Biology, State University of Campinas, Campinas, São Paulo, Brazil
| | - Valder N Freire
- Department of Physics, Federal University of Ceará, Fortaleza, Ceará, Brazil
| |
Collapse
|
27
|
Liu X, Xing J, Fu H, Shao X, Cai W. Analyzing Molecular Dynamics Trajectories Thermodynamically through Artificial Intelligence. J Chem Theory Comput 2024; 20:665-676. [PMID: 38193858 DOI: 10.1021/acs.jctc.3c00975] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2024]
Abstract
Molecular dynamics simulations produce trajectories that correspond to vast amounts of structure when exploring biochemical processes. Extracting valuable information, e.g., important intermediate states and collective variables (CVs) that describe the major movement modes, from molecular trajectories to understand the underlying mechanisms of biological processes presents a significant challenge. To achieve this goal, we introduce a deep learning approach, coined DIKI (deep identification of key intermediates), to determine low-dimensional CVs distinguishing key intermediate conformations without a-priori assumptions. DIKI dynamically plans the distribution of latent space and groups together similar conformations within the same cluster. Moreover, by incorporating two user-defined parameters, namely, coarse focus knob and fine focus knob, to help identify conformations with low free energy and differentiate the subtle distinctions among these conformations, resolution-tunable clustering was achieved. Furthermore, the integration of DIKI with a path-finding algorithm contributes to the identification of crucial intermediates along the lowest free-energy pathway. We postulate that DIKI is a robust and flexible tool that can find widespread applications in the analysis of complex biochemical processes.
Collapse
Affiliation(s)
- Xuyang Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Jingya Xing
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Haohao Fu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
28
|
Alshahrani M, Gupta G, Xiao S, Tao P, Verkhivker G. Comparative Analysis of Conformational Dynamics and Systematic Characterization of Cryptic Pockets in the SARS-CoV-2 Omicron BA.2, BA.2.75 and XBB.1 Spike Complexes with the ACE2 Host Receptor: Confluence of Binding and Structural Plasticity in Mediating Networks of Conserved Allosteric Sites. Viruses 2023; 15:2073. [PMID: 37896850 PMCID: PMC10612107 DOI: 10.3390/v15102073] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 10/04/2023] [Accepted: 10/06/2023] [Indexed: 10/29/2023] Open
Abstract
In the current study, we explore coarse-grained simulations and atomistic molecular dynamics together with binding energetics scanning and cryptic pocket detection in a comparative examination of conformational landscapes and systematic characterization of allosteric binding sites in the SARS-CoV-2 Omicron BA.2, BA.2.75 and XBB.1 spike full-length trimer complexes with the host receptor ACE2. Microsecond simulations, Markov state models and mutational scanning of binding energies of the SARS-CoV-2 BA.2 and BA.2.75 receptor binding domain complexes revealed the increased thermodynamic stabilization of the BA.2.75 variant and significant dynamic differences between these Omicron variants. Molecular simulations of the SARS-CoV-2 Omicron spike full-length trimer complexes with the ACE2 receptor complemented atomistic studies and enabled an in-depth analysis of mutational and binding effects on conformational dynamic and functional adaptability of the Omicron variants. Despite considerable structural similarities, Omicron variants BA.2, BA.2.75 and XBB.1 can induce unique conformational dynamic signatures and specific distributions of the conformational states. Using conformational ensembles of the SARS-CoV-2 Omicron spike trimer complexes with ACE2, we conducted a comprehensive cryptic pocket screening to examine the role of Omicron mutations and ACE2 binding on the distribution and functional mechanisms of the emerging allosteric binding sites. This analysis captured all experimentally known allosteric sites and discovered networks of inter-connected and functionally relevant allosteric sites that are governed by variant-sensitive conformational adaptability of the SARS-CoV-2 spike structures. The results detailed how ACE2 binding and Omicron mutations in the BA.2, BA.2.75 and XBB.1 spike complexes modulate the distribution of conserved and druggable allosteric pockets harboring functionally important regions. The results are significant for understanding the functional roles of druggable cryptic pockets that can be used for allostery-mediated therapeutic intervention targeting conformational states of the Omicron variants.
Collapse
Affiliation(s)
- Mohammed Alshahrani
- Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA 92866, USA; (M.A.); (G.G.)
| | - Grace Gupta
- Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA 92866, USA; (M.A.); (G.G.)
| | - Sian Xiao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, TX 75275, USA; (S.X.); (P.T.)
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, TX 75275, USA; (S.X.); (P.T.)
| | - Gennady Verkhivker
- Keck Center for Science and Engineering, Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA 92866, USA; (M.A.); (G.G.)
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA 92618, USA
| |
Collapse
|
29
|
Alshahrani M, Gupta G, Xiao S, Tao P, Verkhivker G. Examining Functional Linkages Between Conformational Dynamics, Protein Stability and Evolution of Cryptic Binding Pockets in the SARS-CoV-2 Omicron Spike Complexes with the ACE2 Host Receptor: Recombinant Omicron Variants Mediate Variability of Conserved Allosteric Sites and Binding Epitopes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.11.557205. [PMID: 37745525 PMCID: PMC10515794 DOI: 10.1101/2023.09.11.557205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
In the current study, we explore coarse-grained simulations and atomistic molecular dynamics together with binding energetics scanning and cryptic pocket detection in a comparative examination of conformational landscapes and systematic characterization of allosteric binding sites in the SARS-CoV-2 Omicron BA.2, BA.2.75 and XBB.1 spike full-length trimer complexes with the host receptor ACE2. Microsecond simulations, Markov state models and mutational scanning of binding energies of the SARS-CoV-2 BA.2 and BA.2.75 receptor binding domain complexes revealed the increased thermodynamic stabilization of the BA.2.75 variant and significant dynamic differences between these Omicron variants. Molecular simulations of the SARS-CoV-2 Omicron spike full length trimer complexes with the ACE2 receptor complemented atomistic studies and enabled an in-depth analysis of mutational and binding effects on conformational dynamic and functional adaptability of the Omicron variants. Despite considerable structural similarities, Omicron variants BA.2, BA.2.75 and XBB.1 can induce unique conformational dynamic signatures and specific distributions of the conformational states. Using conformational ensembles of the SARS-CoV-2 Omicron spike trimer complexes with ACE2, we conducted a comprehensive cryptic pocket screening to examine the role of Omicron mutations and ACE2 binding on the distribution and functional mechanisms of the emerging allosteric binding sites. This analysis captured all experimentally known allosteric sites and discovered networks of inter-connected and functionally relevant allosteric sites that are governed by variant-sensitive conformational adaptability of the SARS-CoV-2 spike structures. The results detailed how ACE2 binding and Omicron mutations in the BA.2, BA.2.75 and XBB.1 spike complexes modulate the distribution of conserved and druggable allosteric pockets harboring functionally important regions. The results of are significant for understanding functional roles of druggable cryptic pockets that can be used for allostery-mediated therapeutic intervention targeting conformational states of the Omicron variants.
Collapse
|
30
|
Xiao S, Alshahrani M, Gupta G, Tao P, Verkhivker G. Markov State Models and Perturbation-Based Approaches Reveal Distinct Dynamic Signatures and Hidden Allosteric Pockets in the Emerging SARS-Cov-2 Spike Omicron Variant Complexes with the Host Receptor: The Interplay of Dynamics and Convergent Evolution Modulates Allostery and Functional Mechanisms. J Chem Inf Model 2023; 63:5272-5296. [PMID: 37549201 PMCID: PMC11162552 DOI: 10.1021/acs.jcim.3c00778] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/09/2023]
Abstract
The new generation of SARS-CoV-2 Omicron variants displayed a significant growth advantage and increased viral fitness by acquiring convergent mutations, suggesting that the immune pressure can promote convergent evolution leading to the sudden acceleration of SARS-CoV-2 evolution. In the current study, we combined structural modeling, microsecond molecular dynamics simulations, and Markov state models to characterize conformational landscapes and identify specific dynamic signatures of the SARS-CoV-2 spike complexes with the host receptor ACE2 for the recently emerged highly transmissible XBB.1, XBB.1.5, BQ.1, and BQ.1.1 Omicron variants. Microsecond simulations and Markovian modeling provided a detailed characterization of the functional conformational states and revealed the increased thermodynamic stabilization of the XBB.1.5 subvariant, which can be contrasted to more dynamic BQ.1 and BQ.1.1 subvariants. Despite considerable structural similarities, Omicron mutations can induce unique dynamic signatures and specific distributions of the conformational states. The results suggested that variant-specific changes of the conformational mobility in the functional interfacial loops of the receptor-binding domain in the SARS-CoV-2 spike protein can be fine-tuned through crosstalk between convergent mutations which could provide an evolutionary path for modulation of immune escape. By combining atomistic simulations and Markovian modeling analysis with perturbation-based approaches, we determined important complementary roles of convergent mutation sites as effectors and receivers of allosteric signaling involved in modulation of conformational plasticity and regulation of allosteric communications. This study also revealed hidden allosteric pockets and suggested that convergent mutation sites could control evolution and distribution of allosteric pockets through modulation of conformational plasticity in the flexible adaptable regions.
Collapse
Affiliation(s)
- Sian Xiao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75275, United States
| | - Mohammed Alshahrani
- Keck Center for Science and Engineering, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States
| | - Grace Gupta
- Keck Center for Science and Engineering, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75275, United States
| | - Gennady Verkhivker
- Keck Center for Science and Engineering, Schmid College of Science and Technology, Chapman University, Orange, California 92866, United States
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, California 92618, United States
| |
Collapse
|
31
|
Appadurai R, Koneru JK, Bonomi M, Robustelli P, Srivastava A. Clustering Heterogeneous Conformational Ensembles of Intrinsically Disordered Proteins with t-Distributed Stochastic Neighbor Embedding. J Chem Theory Comput 2023; 19:4711-4727. [PMID: 37338049 PMCID: PMC11108026 DOI: 10.1021/acs.jctc.3c00224] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/21/2023]
Abstract
Intrinsically disordered proteins (IDPs) populate a range of conformations that are best described by a heterogeneous ensemble. Grouping an IDP ensemble into "structurally similar" clusters for visualization, interpretation, and analysis purposes is a much-desired but formidable task, as the conformational space of IDPs is inherently high-dimensional and reduction techniques often result in ambiguous classifications. Here, we employ the t-distributed stochastic neighbor embedding (t-SNE) technique to generate homogeneous clusters of IDP conformations from the full heterogeneous ensemble. We illustrate the utility of t-SNE by clustering conformations of two disordered proteins, Aβ42, and α-synuclein, in their APO states and when bound to small molecule ligands. Our results shed light on ordered substates within disordered ensembles and provide structural and mechanistic insights into binding modes that confer specificity and affinity in IDP ligand binding. t-SNE projections preserve the local neighborhood information, provide interpretable visualizations of the conformational heterogeneity within each ensemble, and enable the quantification of cluster populations and their relative shifts upon ligand binding. Our approach provides a new framework for detailed investigations of the thermodynamics and kinetics of IDP ligand binding and will aid rational drug design for IDPs.
Collapse
Affiliation(s)
- Rajeswari Appadurai
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka 560012, India
| | | | - Massimiliano Bonomi
- Structural Bioinformatics Unit, Department of Structural Biology and Chemistry. CNRS UMR 3528, C3BI, CNRS USR 3756, Institut Pasteur, Paris, France
| | - Paul Robustelli
- Dartmouth College, Department of Chemistry, Hanover, NH, 03755, USA
| | - Anand Srivastava
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka 560012, India
| |
Collapse
|
32
|
Conev A, Rigo MM, Devaurs D, Fonseca AF, Kalavadwala H, de Freitas MV, Clementi C, Zanatta G, Antunes DA, Kavraki LE. EnGens: a computational framework for generation and analysis of representative protein conformational ensembles. Brief Bioinform 2023; 24:bbad242. [PMID: 37418278 PMCID: PMC10359083 DOI: 10.1093/bib/bbad242] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Revised: 05/23/2023] [Accepted: 06/10/2023] [Indexed: 07/08/2023] Open
Abstract
Proteins are dynamic macromolecules that perform vital functions in cells. A protein structure determines its function, but this structure is not static, as proteins change their conformation to achieve various functions. Understanding the conformational landscapes of proteins is essential to understand their mechanism of action. Sets of carefully chosen conformations can summarize such complex landscapes and provide better insights into protein function than single conformations. We refer to these sets as representative conformational ensembles. Recent advances in computational methods have led to an increase in the number of available structural datasets spanning conformational landscapes. However, extracting representative conformational ensembles from such datasets is not an easy task and many methods have been developed to tackle it. Our new approach, EnGens (short for ensemble generation), collects these methods into a unified framework for generating and analyzing representative protein conformational ensembles. In this work, we: (1) provide an overview of existing methods and tools for representative protein structural ensemble generation and analysis; (2) unify existing approaches in an open-source Python package, and a portable Docker image, providing interactive visualizations within a Jupyter Notebook pipeline; (3) test our pipeline on a few canonical examples from the literature. Representative ensembles produced by EnGens can be used for many downstream tasks such as protein-ligand ensemble docking, Markov state modeling of protein dynamics and analysis of the effect of single-point mutations.
Collapse
Affiliation(s)
- Anja Conev
- Department of Computer Science, Rice University, Houston 77005, TX, USA
| | | | - Didier Devaurs
- MRC Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK
| | | | - Hussain Kalavadwala
- Department of Biology and Biochemistry, University of Houston, Houston 77004, TX, USA
| | | | - Cecilia Clementi
- Department of Physics, Freie Universität Berlin, Berlin 14195, Germany
| | - Geancarlo Zanatta
- Department of Biophysics, Institute of Biosciences, Federal University of Rio Grande do Sul, Porto Alegre 91501-970, Brazil
| | - Dinler Amaral Antunes
- Department of Biology and Biochemistry, University of Houston, Houston 77004, TX, USA
| | - Lydia E Kavraki
- Department of Computer Science, Rice University, Houston 77005, TX, USA
| |
Collapse
|
33
|
Qu G, Liu H, Li J, Huang S, Zhao N, Zeng L, Deng J. GPX4 is a key ferroptosis biomarker and correlated with immune cell populations and immune checkpoints in childhood sepsis. Sci Rep 2023; 13:11358. [PMID: 37443372 PMCID: PMC10345139 DOI: 10.1038/s41598-023-32992-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 04/05/2023] [Indexed: 07/15/2023] Open
Abstract
Sepsis is the uncontrolled reaction of the body to infection-induced inflammation, which results in life-threatening multiple-organ dysfunction (MODS). Although the research on sepsis has advanced significantly in recent years, its pathophysiology remains entirely unknown. Ferroptosis is a new-fashioned type of programmed cell death that may have an impact on sepsis development. However, the precise mechanism still needs to be explored. In this paper, Four pediatric sepsis datasets [training datasets (GSE26378 and GSE26440) and validation datasets (GSE11755 and GSE11281)] were chosen through the GEO (Gene Expression Omnibus) database, and 63 differentially expressions of ferroptosis-relation-genes (DE-FRGs) were eventually discovered using bioinformatics investigation. Functional annotation was performed using GO and KEGG pathway enrichment analysis. Then, four Core-FRGs (FTH1, GPX4, ACSL1, and ACSL6) were extracted after the construction of the protein-protein interaction (PPI) network and the research of the MCODE module. Consequently, Hub-FRG (GPX4) was found using the validation datasets, and correlation exploration of immunity populations (neutrophils, r = - 0.52; CD8 T-cells, r = 0.43) and immunity checkpoints (CD274, r = - 0.42) was implemented. The usefulness of GPX4 as a marker in sepsis was assessed in a mouse model of sepsis. The findings demonstrate that GPX4 is a crucial biomarker and a new latent immunotherapy target for the prediction and therapy of pediatric sepsis.
Collapse
Affiliation(s)
- Guoxin Qu
- The First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 570100, People's Republic of China
- The Affiliated Hospital of Guizhou Medical University, Guizhou Medical University, Guiyang, 550001, People's Republic of China
- State Key Laboratory of Trauma, Burns and Combined Injury, Research Institute of Surgery, Daping Hospital, Army Medical University, Chongqing, 400042, People's Republic of China
| | - Hui Liu
- The First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 570100, People's Republic of China
| | - Jin Li
- State Key Laboratory of Trauma, Burns and Combined Injury, Research Institute of Surgery, Daping Hospital, Army Medical University, Chongqing, 400042, People's Republic of China
| | - Siyuan Huang
- State Key Laboratory of Trauma, Burns and Combined Injury, Research Institute of Surgery, Daping Hospital, Army Medical University, Chongqing, 400042, People's Republic of China
| | - Nannan Zhao
- The First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 570100, People's Republic of China.
| | - Ling Zeng
- State Key Laboratory of Trauma, Burns and Combined Injury, Research Institute of Surgery, Daping Hospital, Army Medical University, Chongqing, 400042, People's Republic of China.
| | - Jin Deng
- The Affiliated Hospital of Guizhou Medical University, Guizhou Medical University, Guiyang, 550001, People's Republic of China.
| |
Collapse
|
34
|
Schreiner W, Karch R, Cibena M, Tomasiak L, Kenn M, Pfeiler G. Clustering molecular dynamics conformations of the CC'-loop of the PD-1 immuno-checkpoint receptor. Comput Struct Biotechnol J 2023; 21:3920-3932. [PMID: 37602229 PMCID: PMC10432919 DOI: 10.1016/j.csbj.2023.07.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 06/16/2023] [Accepted: 07/03/2023] [Indexed: 08/22/2023] Open
Abstract
Molecular mechanisms within the checkpoint receptor PD-1 are essential for its activation by PD-L1 as well as for blocking such an activation via checkpoint inhibitors. We use molecular dynamics to scrutinize patterns of atomic motion in PD-1 without a ligand. Molecular dynamics is performed for the whole extracellular domain of PD-1, and the analysis focuses on its CC'-loop and some adjacent Cα-atoms. We extend previous work by applying common nearest neighbor clustering (Cnn) and compare the performance of this method with Daura clustering as well as UMAP dimension reduction and subsequent agglomerative linkage clustering. As compared to Daura clustering, we found Cnn less sensitive to cutoff selection and better able to return representative clusters for sets of different 3D atomic conformations. Interestingly, Cnn yields results quite similar to UMAP plus linkage clustering.
Collapse
Affiliation(s)
- Wolfgang Schreiner
- Medical University of Vienna, Center for Medical Data Science, Spitalgasse 23, A-1090, Vienna, Austria
| | - Rudolf Karch
- Medical University of Vienna, Center for Medical Data Science, Spitalgasse 23, A-1090, Vienna, Austria
| | - Michael Cibena
- Medical University of Vienna, Center for Medical Data Science, Spitalgasse 23, A-1090, Vienna, Austria
| | - Lisa Tomasiak
- Medical University of Vienna, Center for Medical Data Science, Spitalgasse 23, A-1090, Vienna, Austria
| | - Michael Kenn
- Medical University of Vienna, Center for Medical Data Science, Spitalgasse 23, A-1090, Vienna, Austria
| | - Georg Pfeiler
- Medical University of Vienna, Department of Obstetrics and Gynecology, Division of General Gynecology and Gynecologic Oncology, Währinger Gürtel 18-20, A-1090, Vienna, Austria
| |
Collapse
|
35
|
Vuillemot R, Rouiller I, Jonić S. MDTOMO method for continuous conformational variability analysis in cryo electron subtomograms based on molecular dynamics simulations. Sci Rep 2023; 13:10596. [PMID: 37391578 PMCID: PMC10313669 DOI: 10.1038/s41598-023-37037-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 06/14/2023] [Indexed: 07/02/2023] Open
Abstract
Cryo electron tomography (cryo-ET) allows observing macromolecular complexes in their native environment. The common routine of subtomogram averaging (STA) allows obtaining the three-dimensional (3D) structure of abundant macromolecular complexes, and can be coupled with discrete classification to reveal conformational heterogeneity of the sample. However, the number of complexes extracted from cryo-ET data is usually small, which restricts the discrete-classification results to a small number of enough populated states and, thus, results in a largely incomplete conformational landscape. Alternative approaches are currently being investigated to explore the continuity of the conformational landscapes that in situ cryo-ET studies could provide. In this article, we present MDTOMO, a method for analyzing continuous conformational variability in cryo-ET subtomograms based on Molecular Dynamics (MD) simulations. MDTOMO allows obtaining an atomic-scale model of conformational variability and the corresponding free-energy landscape, from a given set of cryo-ET subtomograms. The article presents the performance of MDTOMO on a synthetic ABC exporter dataset and an in situ SARS-CoV-2 spike dataset. MDTOMO allows analyzing dynamic properties of molecular complexes to understand their biological functions, which could also be useful for structure-based drug discovery.
Collapse
Affiliation(s)
- Rémi Vuillemot
- IMPMC-UMR 7590 CNRS, Sorbonne Université, Muséum National d'Histoire Naturelle, CC 115, 4 Place Jussieu, 75005, Paris, France
- Department of Biochemistry and Pharmacology and Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Isabelle Rouiller
- Department of Biochemistry and Pharmacology and Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Melbourne, VIC, 3010, Australia
- Australian Research Council Centre for Cryo-Electron Microscopy of Membrane Proteins, Parkville, VIC, 3052, Australia
| | - Slavica Jonić
- IMPMC-UMR 7590 CNRS, Sorbonne Université, Muséum National d'Histoire Naturelle, CC 115, 4 Place Jussieu, 75005, Paris, France.
| |
Collapse
|
36
|
Wu D, Salsbury FR. Unraveling the Role of Hydrogen Bonds in Thrombin via Two Machine Learning Methods. J Chem Inf Model 2023; 63:3705-3718. [PMID: 37285464 PMCID: PMC11164249 DOI: 10.1021/acs.jcim.3c00153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Hydrogen bonds play a critical role in the folding and stability of proteins, such as proteins and nucleic acids, by providing strong and directional interactions. They help to maintain the secondary and 3D structure of proteins, and structural changes in these molecules often result from the formation or breaking of hydrogen bonds. To gain insights into these hydrogen bonding networks, we applied two machine learning models - a logistic regression model and a decision tree model - to study four variants of thrombin: wild-type, ΔK9, E8K, and R4A. Our results showed that both models have their unique advantages. The logistic regression model highlighted potential key residues (GLU295) in thrombin's allosteric pathways, while the decision tree model identified important hydrogen bonding motifs. This information can aid in understanding the mechanisms of folding in proteins and has potential applications in drug design and other therapies. The use of these two models highlights their usefulness in studying hydrogen bonding networks in proteins.
Collapse
Affiliation(s)
- Dizhou Wu
- Department of Physics, Wake Forest University, Winston-Salem, North Carolina 27106, United States
| | - Freddie R Salsbury
- Department of Physics, Wake Forest University, Winston-Salem, North Carolina 27106, United States
| |
Collapse
|
37
|
Xiao S, Song Z, Tian H, Tao P. Assessments of Variational Autoencoder in Protein Conformation Exploration. JOURNAL OF COMPUTATIONAL BIOPHYSICS AND CHEMISTRY 2023; 22:489-501. [PMID: 38826699 PMCID: PMC11138204 DOI: 10.1142/s2737416523500217] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Molecular dynamics (MD) simulations have been extensively used to study protein dynamics and subsequently functions. However, MD simulations are often insufficient to explore adequate conformational space for protein functions within reachable timescales. Accordingly, many enhanced sampling methods, including variational autoencoder (VAE) based methods, have been developed to address this issue. The purpose of this study is to evaluate the feasibility of using VAE to assist in the exploration of protein conformational landscapes. Using three modeling systems, we showed that VAE could capture high-level hidden information which distinguishes protein conformations. These models could also be used to generate new physically plausible protein conformations for direct sampling in favorable conformational spaces. We also found that VAE worked better in interpolation than extrapolation and increasing latent space dimension could lead to a trade-off between performances and complexities.
Collapse
Affiliation(s)
- Sian Xiao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75205, United States
| | - Zilin Song
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75205, United States
| | - Hao Tian
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75205, United States
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75205, United States
| |
Collapse
|
38
|
Xiao S, Alshahrani M, Gupta G, Tao P, Verkhivker G. Markov State Models and Perturbation-Based Approaches Reveal Distinct Dynamic Signatures and Hidden Allosteric Pockets in the Emerging SARS-Cov-2 Spike Omicron Variants Complexes with the Host Receptor: The Interplay of Dynamics and Convergent Evolution Modulates Allostery and Functional Mechanisms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.20.541592. [PMID: 37292827 PMCID: PMC10245745 DOI: 10.1101/2023.05.20.541592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The new generation of SARS-CoV-2 Omicron variants displayed a significant growth advantage and the increased viral fitness by acquiring convergent mutations, suggesting that the immune pressure can promote convergent evolution leading to the sudden acceleration of SARS-CoV-2 evolution. In the current study, we combined structural modeling, extensive microsecond MD simulations and Markov state models to characterize conformational landscapes and identify specific dynamic signatures of the SARS-CoV-2 spike complexes with the host receptor ACE2 for the recently emerged highly transmissible XBB.1, XBB.1.5, BQ.1, and BQ.1.1 Omicron variants. Microsecond simulations and Markovian modeling provided a detailed characterization of the conformational landscapes and revealed the increased thermodynamic stabilization of the XBB.1.5 subvariant which is contrasted to more dynamic BQ.1 and BQ.1.1 subvariants. Despite considerable structural similarities, Omicron mutations can induce unique dynamic signatures and specific distributions of conformational states. The results suggested that variant-specific changes of conformational mobility in the functional interfacial loops of the spike receptor binding domain can be fine-tuned through cross-talk between convergent mutations thereby providing an evolutionary path for modulation of immune escape. By combining atomistic simulations and Markovian modeling analysis with perturbation-based approaches, we determined important complementary roles of convergent mutation sites as effectors and receivers of allosteric signaling involved in modulating conformational plasticity at the binding interface and regulating allosteric responses. This study also characterized the dynamics-induced evolution of allosteric pockets in the Omicron complexes that revealed hidden allosteric pockets and suggested that convergent mutation sites could control evolution and distribution of allosteric pockets through modulation of conformational plasticity in the flexible adaptable regions. Through integrative computational approaches, this investigation provides a systematic analysis and comparison of the effects of Omicron subvariants on conformational dynamics and allosteric signaling in the complexes with the ACE2 receptor. For Table of Contents Use Only
Collapse
|
39
|
Conev A, Rigo MM, Devaurs D, Fonseca AF, Kalavadwala H, de Freitas MV, Clementi C, Zanatta G, Antunes DA, Kavraki L. EnGens: a computational framework for generation and analysis of representative protein conformational ensembles. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.24.538094. [PMID: 37163076 PMCID: PMC10168271 DOI: 10.1101/2023.04.24.538094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Proteins are dynamic macromolecules that perform vital functions in cells. A protein structure determines its function, but this structure is not static, as proteins change their conformation to achieve various functions. Understanding the conformational landscapes of proteins is essential to understand their mechanism of action. Sets of carefully chosen conformations can summarize such complex landscapes and provide better insights into protein function than single conformations. We refer to these sets as representative conformational ensembles. Recent advances in computational methods have led to an increase in number of available structural datasets spanning conformational landscapes. However, extracting representative conformational ensembles from such datasets is not an easy task and many methods have been developed to tackle it. Our new approach, EnGens (short for ensemble generation), collects these methods into a unified framework for generating and analyzing protein conformational ensembles. In this work we: (1) provide an overview of existing methods and tools for protein structural ensemble generation and analysis; (2) unify existing approaches in an open-source Python package, and a portable Docker image, providing interactive visualizations within a Jupyter Notebook pipeline; (3) test our pipeline on a few canonical examples found in the literature. Representative ensembles produced by EnGens can be used for many downstream tasks such as protein-ligand ensemble docking, Markov state modeling of protein dynamics and analysis of the effect of single-point mutations.
Collapse
Affiliation(s)
- Anja Conev
- Department of Computer Science, Rice University, Houston, TX 77005, USA
| | | | - Didier Devaurs
- MRC Institute of Genetics and Cancer, University of Edinburgh, EH4 2XU, UK
| | | | - Hussain Kalavadwala
- Department of Biology and Biochemistry, University of Houston, Houston, TX 77004, USA
| | | | - Cecilia Clementi
- Department of Physics, Freie Universität Berlin, Berlin, 14195 Germany
| | - Geancarlo Zanatta
- Department of Biophysics, Institute of Biosciences, Federal University of Rio Grande do Sul, Porto Alegre, 91501-970 Brazil
| | - Dinler Amaral Antunes
- Department of Biology and Biochemistry, University of Houston, Houston, TX 77004, USA
| | - Lydia Kavraki
- Department of Computer Science, Rice University, Houston, TX 77005, USA
| |
Collapse
|
40
|
Bowler S, Papoutsoglou G, Karanikas A, Tsamardinos I, Corley MJ, Ndhlovu LC. A machine learning approach utilizing DNA methylation as an accurate classifier of COVID-19 disease severity. Sci Rep 2022; 12:17480. [PMID: 36261477 PMCID: PMC9580434 DOI: 10.1038/s41598-022-22201-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 10/11/2022] [Indexed: 01/12/2023] Open
Abstract
Since the onset of the COVID-19 pandemic, increasing cases with variable outcomes continue globally because of variants and despite vaccines and therapies. There is a need to identify at-risk individuals early that would benefit from timely medical interventions. DNA methylation provides an opportunity to identify an epigenetic signature of individuals at increased risk. We utilized machine learning to identify DNA methylation signatures of COVID-19 disease from data available through NCBI Gene Expression Omnibus. A training cohort of 460 individuals (164 COVID-19-infected and 296 non-infected) and an external validation dataset of 128 individuals (102 COVID-19-infected and 26 non-COVID-associated pneumonia) were reanalyzed. Data was processed using ChAMP and beta values were logit transformed. The JADBio AutoML platform was leveraged to identify a methylation signature associated with severe COVID-19 disease. We identified a random forest classification model from 4 unique methylation sites with the power to discern individuals with severe COVID-19 disease. The average area under the curve of receiver operator characteristic (AUC-ROC) of the model was 0.933 and the average area under the precision-recall curve (AUC-PRC) was 0.965. When applied to our external validation, this model produced an AUC-ROC of 0.898 and an AUC-PRC of 0.864. These results further our understanding of the utility of DNA methylation in COVID-19 disease pathology and serve as a platform to inform future COVID-19 related studies.
Collapse
Affiliation(s)
- Scott Bowler
- Division of Infectious Diseases, Department of Medicine, Weill Cornell Medicine, 413 E 69th St, New York, NY, 10021, USA
| | - Georgios Papoutsoglou
- JADBio - Gnosis DA S.A, Science and Technology Park of Crete, 70013, Heraklion, Greece
| | - Aristides Karanikas
- JADBio - Gnosis DA S.A, Science and Technology Park of Crete, 70013, Heraklion, Greece
| | - Ioannis Tsamardinos
- JADBio - Gnosis DA S.A, Science and Technology Park of Crete, 70013, Heraklion, Greece
- Department of Computer Science, University of Crete, 70013, Heraklion, Greece
| | - Michael J Corley
- Division of Infectious Diseases, Department of Medicine, Weill Cornell Medicine, 413 E 69th St, New York, NY, 10021, USA
| | - Lishomwa C Ndhlovu
- Division of Infectious Diseases, Department of Medicine, Weill Cornell Medicine, 413 E 69th St, New York, NY, 10021, USA.
| |
Collapse
|
41
|
Wu W, Dong J, Lv Y, Chang D. Cuproptosis-Related genes in the prognosis of colorectal cancer and their correlation with the tumor microenvironment. Front Genet 2022; 13:984158. [PMID: 36246586 PMCID: PMC9554006 DOI: 10.3389/fgene.2022.984158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 09/08/2022] [Indexed: 11/28/2022] Open
Abstract
Colorectal cancer (CRC) is a common tumor disease of the digestive system with high incidence and mortality. Cuproptosis has recently been found to be a new form of cell death. The clinical significance of cuproptosis-related genes (CRGs) in CRC is not clear. In this study, The Cancer Genome Atlas Colon and Rectal Cancer dataset was used to analyze the relationship between CRGs and clinical characteristics of CRC by differential expression analysis and Kaplan–Meier survival (K-M) analysis. Based on CRGs, prognosis model and risk score of CRC was constructed in COADREAD by multivariate Cox analysis. Receiver operating curves (ROC) analysis, K-M analysis and calibration analysis in GDC TCGA Colon Cancer dataset were applied to validating model. Subsequently, the relationship between risk score of CRC and immune microenvironment was analyzed by multiple immune score algorithms. Finally, we found that most CRGs were differentially expressed between tumors and normal tissues. Some CRGs were differentially expressed among different clinical characteristics. K-M analysis showed that the CRGs were related to overall survival (OS), disease-specific survival, and progression-free survival. Subsequently, DLAT and CDKN2A were identified as risk factors for OS in CRC by multivariate Cox analysis, and the risk score was established. K–M analysis showed that there was a significant difference in OS between the high-risk and low-risk groups, which were grouped by risk score median. ROC analysis showed that the risk score performs well in predicting the 1-year, 3-year and 5-year OS. Enrichment analysis showed that the differentially expressed genes between the high- and low-risk groups were enriched in immune-related signaling pathways. Further analysis showed that there were significant differences in the levels of immune cells and stromal cells between the high- and low-risk groups. The high-risk group had higher levels of immune cells and interstitial cells. At the same time, the high-risk group had a higher immune escape ability, and the predicted immune treatment response in the high-risk group was poor. In conclusion, CRGs can be used as prognostic factors in CRC and are closely related to the levels of immune cells and stromal cells in the tumor microenvironment.
Collapse
Affiliation(s)
- Weiqiang Wu
- Department of Surgical Oncology, The First Affiliated Hospital of Xi’an Jiaotong University, Xi’an, Shaanxi, China
- Department of Ophthalmology, The 940th Hospital of Joint Logistics Support Force of Chinese PLA, Lanzhou, China
| | - Jingqing Dong
- Department of General Surgery, Guangzhou Red Cross Hospital, Medical College, Jinan University, Guangzhou, China
| | - Yang Lv
- Department of Ophthalmology, The 940th Hospital of Joint Logistics Support Force of Chinese PLA, Lanzhou, China
| | - Dongmin Chang
- Department of Surgical Oncology, The First Affiliated Hospital of Xi’an Jiaotong University, Xi’an, Shaanxi, China
- *Correspondence: Dongmin Chang,
| |
Collapse
|
42
|
Bhakat S. Collective variable discovery in the age of machine learning: reality, hype and everything in between. RSC Adv 2022; 12:25010-25024. [PMID: 36199882 PMCID: PMC9437778 DOI: 10.1039/d2ra03660f] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 08/20/2022] [Indexed: 11/21/2022] Open
Abstract
Understanding the kinetics and thermodynamics profile of biomolecules is necessary to understand their functional roles which has a major impact in mechanism driven drug discovery. Molecular dynamics simulation has been routinely used to understand conformational dynamics and molecular recognition in biomolecules. Statistical analysis of high-dimensional spatiotemporal data generated from molecular dynamics simulation requires identification of a few low-dimensional variables which can describe the essential dynamics of a system without significant loss of information. In physical chemistry, these low-dimensional variables are often called collective variables. Collective variables are used to generate reduced representations of free energy surfaces and calculate transition probabilities between different metastable basins. However the choice of collective variables is not trivial for complex systems. Collective variables range from geometric criteria such as distances and dihedral angles to abstract ones such as weighted linear combinations of multiple geometric variables. The advent of machine learning algorithms led to increasing use of abstract collective variables to represent biomolecular dynamics. In this review, I will highlight several nuances of commonly used collective variables ranging from geometric to abstract ones. Further, I will put forward some cases where machine learning based collective variables were used to describe simple systems which in principle could have been described by geometric ones. Finally, I will put forward my thoughts on artificial general intelligence and how it can be used to discover and predict collective variables from spatiotemporal data generated by molecular dynamics simulations.
Collapse
Affiliation(s)
- Soumendranath Bhakat
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania Pennsylvania 19104-6059 USA +1 30549 32620
| |
Collapse
|
43
|
Oide M, Sugita Y. Protein Folding Intermediates on the Dimensionality Reduced Landscape with UMAP and Native Contact Likelihood. J Chem Phys 2022; 157:075101. [DOI: 10.1063/5.0099094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
To understand protein folding mechanisms from molecular dynamics (MD) simulations, it is important to explore not only folded/unfolded states but also representative intermediate structures on the conformational landscape. Here, we propose a novel approach to construct the landscape using the uniform manifold approximation and projection (UMAP) method, which reduces the dimensionality without losing data-point proximity. In the approach, native contact likelihood is used as feature variables rather than the conventional Cartesian coordinates or dihedral angles of protein structures. We tested the performance of UMAP for coarse-grained MD simulation trajectories of B1 domain in protein G and observed on-pathway transient structures and other metastable states on the UMAP conformational landscape. In contrast, these structures were not clearly distinguished on the dimensionality reduced landscape using principal component analysis (PCA) or time-lagged independent component analysis (tICA). This approach is also useful to obtain dynamical information through Markov State Modeling and would be applicable to large-scale conformational changes in many other biomacromolecules.
Collapse
Affiliation(s)
| | - Yuji Sugita
- Theoretical Molecular Science Laboratory, RIKEN, Japan
| |
Collapse
|
44
|
Song Z, Trozzi F, Tian H, Yin C, Tao P. Mechanistic Insights into Enzyme Catalysis from Explaining Machine-Learned Quantum Mechanical and Molecular Mechanical Minimum Energy Pathways. ACS PHYSICAL CHEMISTRY AU 2022; 2:316-330. [PMID: 35936506 PMCID: PMC9344433 DOI: 10.1021/acsphyschemau.2c00005] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
With the increasing popularity of machine learning (ML) applications, the demand for explainable artificial intelligence techniques to explain ML models developed for computational chemistry has also emerged. In this study, we present the development of the Boltzmann-weighted cumulative integrated gradients (BCIG) approach for effective explanation of mechanistic insights into ML models trained on high-level quantum mechanical and molecular mechanical (QM/MM) minimum energy pathways. Using the acylation reactions of the Toho-1 β-lactamase and two antibiotics (ampicillin and cefalexin) as the model systems, we show that the BCIG approach could quantitatively attribute the energetic contribution in one system and the relative reactivity of individual steps across different systems to specific chemical processes such as the bond making/breaking and proton transfers. The proposed BCIG contribution attribution method quantifies chemistry-interpretable insights in terms of contributions from each elementary chemical process, which is in agreement with the validating QM/MM calculations and our intuitive mechanistic understandings of the model reactions.
Collapse
|
45
|
Klem H, Hocky GM, McCullagh M. Size-and-Shape Space Gaussian Mixture Models for Structural Clustering of Molecular Dynamics Trajectories. J Chem Theory Comput 2022; 18:3218-3230. [PMID: 35483073 DOI: 10.1021/acs.jctc.1c01290] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Determining the optimal number and identity of structural clusters from an ensemble of molecular configurations continues to be a challenge. Recent structural clustering methods have focused on the use of internal coordinates due to the innate rotational and translational invariance of these features. The vast number of possible internal coordinates necessitates a feature space supervision step to make clustering tractable but yields a protocol that can be system type-specific. Particle positions offer an appealing alternative to internal coordinates but suffer from a lack of rotational and translational invariance, as well as a perceived insensitivity to regions of structural dissimilarity. Here, we present a method, denoted shape-GMM, that overcomes the shortcomings of particle positions using a weighted maximum likelihood alignment procedure. This alignment strategy is then built into an expectation maximization Gaussian mixture model (GMM) procedure to capture metastable states in the free-energy landscape. The resulting algorithm distinguishes between a variety of different structures, including those indistinguishable by root-mean-square displacement and pairwise distances, as demonstrated on several model systems. Shape-GMM results on an extensive simulation of the fast-folding HP35 Nle/Nle mutant protein support a four-state folding/unfolding mechanism, which is consistent with previous experimental results and provides kinetic details comparable to previous state-of-the art clustering approaches, as measured by the VAMP-2 score. Currently, training of shape-GMMs is recommended for systems (or subsystems) that can be represented by ≲200 particles and ≲100k configurations to estimate high-dimensional covariance matrices and balance computational expense. Once a shape-GMM is trained, it can be used to predict the cluster identities of millions of configurations.
Collapse
Affiliation(s)
- Heidi Klem
- Department of Chemistry, Colorado State University, Fort Collins, Colorado 80523, United States
| | - Glen M Hocky
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Martin McCullagh
- Department of Chemistry, Oklahoma State University, Stillwater, Oklahoma 74078, United States
| |
Collapse
|
46
|
Su A, Cheng Y, Xue H, She Y, Rajan K. Artificial intelligence informed toxicity screening of amine chemistries used in the synthesis of hybrid
organic–inorganic
perovskites. AIChE J 2022. [DOI: 10.1002/aic.17699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- An Su
- College of Chemical Engineering Zhejiang University of Technology Hangzhou China
- Department of Materials Design and Innovation University at Buffalo Buffalo New York USA
| | - Yingying Cheng
- College of Chemical Engineering Zhejiang University of Technology Hangzhou China
| | - Haotian Xue
- Collaborative Innovation Center of Yangtze River Delta Region Green Pharmaceuticals Zhejiang University of Technology Hangzhou China
| | - Yuanbin She
- College of Chemical Engineering Zhejiang University of Technology Hangzhou China
| | - Krishna Rajan
- Department of Materials Design and Innovation University at Buffalo Buffalo New York USA
| |
Collapse
|
47
|
Ni D, Liu Y, Kong R, Yu Z, Lu S, Zhang J. Computational elucidation of allosteric communication in proteins for allosteric drug design. Drug Discov Today 2022; 27:2226-2234. [DOI: 10.1016/j.drudis.2022.03.012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 01/22/2022] [Accepted: 03/17/2022] [Indexed: 02/07/2023]
|
48
|
Single-Cell Transcriptome and Network Analyses Unveil Key Transcription Factors Regulating Mesophyll Cell Development in Maize. Genes (Basel) 2022; 13:genes13020374. [PMID: 35205426 PMCID: PMC8872562 DOI: 10.3390/genes13020374] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 02/14/2022] [Accepted: 02/17/2022] [Indexed: 12/17/2022] Open
Abstract
Background: Maize mesophyll (M) cells play important roles in various biological processes such as photosynthesis II and secondary metabolism. Functional differentiation occurs during M-cell development, but the underlying mechanisms for regulating M-cell development are largely unknown. Results: We conducted single-cell RNA sequencing (scRNA-seq) to profile transcripts in maize leaves. We then identified coregulated modules by analyzing the resulting pseudo-time-series data through gene regulatory network analyses. WRKY, ERF, NAC, MYB and Heat stress transcription factor (HSF) families were highly expressed in the early stage, whereas CONSTANS (CO)-like (COL) and ERF families were highly expressed in the late stage of M-cell development. Construction of regulatory networks revealed that these transcript factor (TF) families, especially HSF and COL, were the major players in the early and later stages of M-cell development, respectively. Integration of scRNA expression matrix with TF ChIP-seq and Hi-C further revealed regulatory interactions between these TFs and their targets. HSF1 and COL8 were primarily expressed in the leaf bases and tips, respectively, and their targets were validated with protoplast-based ChIP-qPCR, with the binding sites of HSF1 being experimentally confirmed. Conclusions: Our study provides evidence that several TF families, with the involvement of epigenetic regulation, play vital roles in the regulation of M-cell development in maize.
Collapse
|
49
|
Li C, Zhu Y, Chen W, Li M, Yang M, Shen Z, Zhou Y, Wang L, Wang H, Li S, Ma J, Gong M, Xu R. Circulating NAD+ Metabolism-Derived Genes Unveils Prognostic and Peripheral Immune Infiltration in Amyotrophic Lateral Sclerosis. Front Cell Dev Biol 2022; 10:831273. [PMID: 35155438 PMCID: PMC8831892 DOI: 10.3389/fcell.2022.831273] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 01/13/2022] [Indexed: 12/12/2022] Open
Abstract
Background: Nicotinamide adenine dinucleotide (NAD+) metabolism has drawn more attention on neurodegeneration research; however, the role in Amyotrophic Lateral Sclerosis (ALS) remains to be fully elucidated. Here, the purpose of this study was to investigate whether the circulating NAD+ metabolic-related gene signature could be identified as a reliable biomarker for ALS survival. Methods: A retrospective analysis of whole blood transcriptional profiles and clinical characteristics of 454 ALS patients was conducted in this study. A series of bioinformatics and machine-learning methods were combined to establish NAD+ metabolic-derived risk score (NPRS) to predict overall survival for ALS patients. The associations of clinical characteristic with NPRS were analyzed and compared. Receiver operating characteristic (ROC) and the calibration curve were utilized to assess the efficacy of prognostic model. Besides, the peripheral immune cell infiltration was assessed in different risk subgroups by applying the CIBERSORT algorithm. Results: Abnormal activation of the NAD+ metabolic pathway occurs in the peripheral blood of ALS patients. Four subtypes with distinct prognosis were constructed based on NAD+ metabolism-related gene expression patterns by using the consensus clustering method. A comparison of the expression profiles of genes related to NAD+ metabolism in different subtypes revealed that the synthase of NAD+ was closely associated with prognosis. Seventeen genes were selected to construct prognostic risk signature by LASSO regression. The NPRS exhibited stronger prognostic capacity compared to traditional clinic-pathological parameters. High NPRS was characterized by NAD+ metabolic exuberant with an unfavorable prognosis. The infiltration levels of several immune cells, such as CD4 naive T cells, CD8 T cells, neutrophils and macrophages, are significantly associated with NPRS. Further clinicopathological analysis revealed that NPRS is more appropriate for predicting the prognostic risk of patients with spinal onset. A prognostic nomogram exhibited more accurate survival prediction compared with other clinicopathological features. Conclusions: In conclusion, it was first proposed that the circulating NAD+ metabolism-derived gene signature is a promising biomarker to predict clinical outcomes, and ultimately facilitating the precise management of patients with ALS.
Collapse
Affiliation(s)
- Cheng Li
- Department of Neurology, Jiangxi Provincial People’s Hospital, Affiliated People’s Hospital of Nanchang University, Nanchang, China
| | - Yu Zhu
- Department of Neurology, Jiangxi Provincial People’s Hospital, Affiliated People’s Hospital of Nanchang University, Nanchang, China
- *Correspondence: Yu Zhu, , ; Renshi Xu, ,
| | - Wenzhi Chen
- Department of Neurology, Jiangxi Provincial People’s Hospital, Affiliated People’s Hospital of Nanchang University, Nanchang, China
| | - Menghua Li
- Department of Neurology, First Affiliated Hospital of Nanchang University, Nanchang, China
| | - Mi Yang
- Department of Medical Service, The First Hospital of Nanchang, Affiliated Nanchang Hospital of Sun Yat-sen University, Nanchang, China
| | - Ziyang Shen
- School of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| | - Yiyi Zhou
- Department of Neurology, First Affiliated Hospital of Nanchang University, Nanchang, China
| | - Lulu Wang
- Department of Neurology, First Affiliated Hospital of Nanchang University, Nanchang, China
| | - Huan Wang
- Department of Neurology, First Affiliated Hospital of Nanchang University, Nanchang, China
| | - Shu Li
- Department of Neurology, Jiangxi Provincial People’s Hospital, Affiliated People’s Hospital of Nanchang University, Nanchang, China
| | - Jiacheng Ma
- School of Aircraft Engineering, Nanchang Hangkong University, Nanchang, China
| | - Mengni Gong
- Medical Examination Center, First Affiliated Hospital of Nanchang University, Nanchang, China
| | - Renshi Xu
- Department of Neurology, Jiangxi Provincial People’s Hospital, Affiliated People’s Hospital of Nanchang University, Nanchang, China
- *Correspondence: Yu Zhu, , ; Renshi Xu, ,
| |
Collapse
|
50
|
Lyu Y, Guo C, Zhang H. Fatty acid metabolism-related genes in bronchoalveolar lavage fluid unveil prognostic and immune infiltration in idiopathic pulmonary fibrosis. Front Endocrinol (Lausanne) 2022; 13:1001563. [PMID: 36267568 PMCID: PMC9576944 DOI: 10.3389/fendo.2022.1001563] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Accepted: 09/21/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Idiopathic pulmonary fibrosis (IPF) is a chronic and progressive condition with an unfavorable prognosis. A recent study has demonstrated that IPF patients exhibit characteristic alterations in the fatty acid metabolism in their lungs, suggesting an association with IPF pathogenesis. Therefore, in this study, we have explored whether the gene signature associated with fatty acid metabolism could be used as a reliable biological marker for predicting the survival of IPF patients. METHODS Data on the fatty acid metabolism-related genes (FAMRGs) were extracted from databases like Kyoto Encyclopedia of Genes and Genomes (KEGG), Hallmark, and Reactome pathway. The GSE70866 dataset with information on IPF patients was retrieved from the Gene Expression Omnibus (GEO). Next, the consensus clustering method was used to identify novel molecular subgroups. Gene Set Enrichment Analysis (GSEA) was performed to understand the mechanisms involved. The Cell-type Identification by Estimating Relative Subsets of RNA Transcripts (CIBERSORT) algorithm was used to evaluate the level of immune cell infiltration in the identified subgroups based on gene expression signatures of immune cells. Finally, the Least Absolute Shrinkage and Selection Operator (LASSO) regression and multivariate Cox regression analysis were performed to develop a prognostic risk model. RESULTS The gene expression signature associated with fatty acid metabolism was used to create two subgroups with significantly different prognoses. GSEA reveals that immune-related pathways were significantly altered between the two subgroups, and the two subgroups had different metabolic characteristics. High infiltration of immune cells, mainly activated NK cells, monocytes, and activated mast cells, was observed in the subgroup with a poor prognosis. A risk model based on FAMRGs had an excellent ability to predict the prognosis of IPF. The nomogram constructed using the clinical features and the risk model could accurately predict the prognosis of IPF patients. CONCLUSION The fatty acid metabolism-related gene expression signature could be used as a potential biological marker for predicting clinical outcomes and the level of infiltration of immune cells. This could eventually enhance the accuracy of the treatment of IPF patients.
Collapse
Affiliation(s)
- Yin Lyu
- Thoracic Surgery Laboratory, Xuzhou Medical University, Xuzhou, China
- Department of Thoracic Surgery, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Chen Guo
- Thoracic Surgery Laboratory, Xuzhou Medical University, Xuzhou, China
- Department of Thoracic Surgery, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Hao Zhang
- Thoracic Surgery Laboratory, Xuzhou Medical University, Xuzhou, China
- Department of Thoracic Surgery, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
- *Correspondence: Hao Zhang,
| |
Collapse
|