1
|
Yamamoto Y. Algorithm for Efficient Superposition and Clustering of Molecular Assemblies Using the Branch-and-Bound Method. J Chem Inf Model 2025; 65:4512-4530. [PMID: 40276894 DOI: 10.1021/acs.jcim.4c02217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/26/2025]
Abstract
The root-mean-square deviation (RMSD) is one of the most common metrics for comparing the similarity of three-dimensional chemical structures. The chemical structure similarity plays an important role in data chemistry because it is closely related to chemical reactivity, physical properties, and bioactivity. Despite the wide applicability of the RMSD, the simultaneous determination of atom mapping and spatial superposition of RMSD remains a challenging problem to solve in polynomial time. We introduce an algorithm called mobbRMSD, which is formulated in molecular-oriented coordinates and uses the branch-and-bound method to obtain an exact solution for the RMSD. mobbRMSD can efficiently handle a wide range of chemical systems, such as molecular liquids, solute solvations, and self-assembly of large molecules, using chemical knowledge such as atom types, chemical bonding, and chirality. In benchmarks involving small molecular aggregates, mobbRMSD extends the limiting system size of existing exact solution methods by almost twice. Furthermore, mobbRMSD demonstrated the ability to analyze the structural similarity of large molecular micelles, which has been difficult with previous methods. We also propose a mobbRMSD-based structural clustering method designed for molecular dynamics trajectories, which improves the computational cost of branch-and-bound methods to asymptotically average the polynomial time as the number of data increases. Our algorithm is freely available at https://github.com/yymmt742/mobbrmsd.
Collapse
Affiliation(s)
- Yuki Yamamoto
- Department of Chemistry, Graduate School of Science, Kyoto University, Kitashirakawa Oiwake-cho, Sakyo-ku, Kyoto 606-8502, Japan
| |
Collapse
|
2
|
Bhattacharya S, Chakrabarty S. Mapping conformational landscape in protein folding: Benchmarking dimensionality reduction and clustering techniques on the Trp-Cage mini-protein. Biophys Chem 2025; 319:107389. [PMID: 39862593 DOI: 10.1016/j.bpc.2025.107389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2024] [Revised: 12/16/2024] [Accepted: 01/08/2025] [Indexed: 01/27/2025]
Abstract
Quantitative characterization of protein conformational landscapes is a computationally challenging task due to their high dimensionality and inherent complexity. In this study, we systematically benchmark several widely used dimensionality reduction and clustering methods to analyze the conformational states of the Trp-Cage mini-protein, a model system with well-documented folding dynamics. Dimensionality reduction techniques, including Principal Component Analysis (PCA), Time-lagged Independent Component Analysis (TICA), and Variational Autoencoders (VAE), were employed to project the high-dimensional free energy landscape onto 2D spaces for visualization. Additionally, clustering methods such as K-means, hierarchical clustering, HDBSCAN, and Gaussian Mixture Models (GMM) were used to identify discrete conformational states directly in the high-dimensional space. Our findings reveal that density-based clustering approaches, particularly HDBSCAN, provide physically meaningful representations of free energy minima. While highlighting the strengths and limitations of each method, our study underscores that no single technique is universally optimal for capturing the complex folding pathways, emphasizing the necessity for careful selection and interpretation of computational methods in biomolecular simulations. These insights will contribute to refining the available tools for analyzing protein conformational landscapes, enabling a deeper understanding of folding mechanisms and intermediate states.
Collapse
Affiliation(s)
- Sayari Bhattacharya
- Department of Chemical and Biological Sciences, S. N. Bose National Centre for Basic Sciences, Kolkata 700106, India
| | - Suman Chakrabarty
- Department of Chemical and Biological Sciences, S. N. Bose National Centre for Basic Sciences, Kolkata 700106, India.
| |
Collapse
|
3
|
Saha R, Bhattacharje G, De S, Das AK. Deciphering the conformational stability of MazE7 antitoxin in Mycobacterium tuberculosis from molecular dynamics simulation study. J Biomol Struct Dyn 2025; 43:127-143. [PMID: 37965715 DOI: 10.1080/07391102.2023.2280675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 11/01/2023] [Indexed: 11/16/2023]
Abstract
MazEF Toxin-antitoxin (TA) systems are associated with the persistent phenotype of the pathogen, Mycobacterium tuberculosis (Mtb), aiding their survival. Though extensively studied, the mode of action between the antitoxin-toxin and DNA of this family remains largely unclear. Here, the important interactions between MazF7 toxin and MazE7 antitoxin, and how MazE7 binds its promoter/operator region have been studied. To elucidate this, molecular dynamics (MD) simulation has been performed on MazE7, MazF7, MazEF7, MazEF7-DNA, and MazE7-DNA complexes to investigate how MazF7 and DNA affect the conformational change and dynamics of MazE7 antitoxin. This study demonstrated that the MazE7 dimer is disordered and one monomer (Chain C) attains stability after binding to the MazF7 toxin. Both the monomers (Chain C and Chain D) however are stabilized when MazE7 binds to DNA. MazE7 is also observed to sterically inhibit tRNA from binding to MazF7, thus suppressing its toxic activity. Comparative structural analysis performed on all the available antitoxins/antitoxin-toxin-DNA structures revealed MazEF7-DNA mechanism was similar to another TA system, AtaRT_E.coli. Simulation performed on the crystal structures of AtaR, AtaT, AtaRT, AtaRT-DNA, and AtaR-DNA showed that the disordered AtaR antitoxin attains stability by AtaT and DNA binding similar to MazE7. Based on these analyses it can thus be hypothesized that the disordered antitoxins enable tighter toxin and DNA binding thus preventing accidental toxin activation. Overall, this study provides crucial structural and dynamic insights into the MazEF7 toxin-antitoxin system and should provide a basis for targeting this TA system in combating Mycobacterium tuberculosis.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Rituparna Saha
- Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, India
| | - Gourab Bhattacharje
- Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, India
| | - Soumya De
- School of Bioscience, Indian Institute of Technology Kharagpur, Kharagpur, India
| | - Amit Kumar Das
- Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, India
| |
Collapse
|
4
|
Akgüller Ö, Balcı MA, Cioca G. Clustering Molecules at a Large Scale: Integrating Spectral Geometry with Deep Learning. Molecules 2024; 29:3902. [PMID: 39202980 PMCID: PMC11357287 DOI: 10.3390/molecules29163902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2024] [Revised: 08/14/2024] [Accepted: 08/14/2024] [Indexed: 09/03/2024] Open
Abstract
This study conducts an in-depth analysis of clustering small molecules using spectral geometry and deep learning techniques. We applied a spectral geometric approach to convert molecular structures into triangulated meshes and used the Laplace-Beltrami operator to derive significant geometric features. By examining the eigenvectors of these operators, we captured the intrinsic geometric properties of the molecules, aiding their classification and clustering. The research utilized four deep learning methods: Deep Belief Network, Convolutional Autoencoder, Variational Autoencoder, and Adversarial Autoencoder, each paired with k-means clustering at different cluster sizes. Clustering quality was evaluated using the Calinski-Harabasz and Davies-Bouldin indices, Silhouette Score, and standard deviation. Nonparametric tests were used to assess the impact of topological descriptors on clustering outcomes. Our results show that the DBN + k-means combination is the most effective, particularly at lower cluster counts, demonstrating significant sensitivity to structural variations. This study highlights the potential of integrating spectral geometry with deep learning for precise and efficient molecular clustering.
Collapse
Affiliation(s)
- Ömer Akgüller
- Faculty of Science, Department of Mathematics, Mugla Sitki Kocman University, Muğla 48000, Turkey;
| | - Mehmet Ali Balcı
- Faculty of Science, Department of Mathematics, Mugla Sitki Kocman University, Muğla 48000, Turkey;
| | - Gabriela Cioca
- Faculty of Medicine, Preclinical Department, Lucian Blaga University of Sibiu, 550024 Sibiu, Romania;
| |
Collapse
|
5
|
da Silva Arouche T, Lobato JCM, Dos Santos Borges R, de Oliveira MS, de Jesus Chaves Neto AM. Molecular interactions of the Omicron, Kappa, and Delta SARS-CoV-2 spike proteins with quantum dots of graphene oxide. J Mol Model 2024; 30:203. [PMID: 38858279 DOI: 10.1007/s00894-024-05996-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 05/27/2024] [Indexed: 06/12/2024]
Abstract
CONTEXT The Omicron, Kappa, and Delta variants are different strains of the SARS-CoV-2 virus. Graphene oxide quantum dots (GOQDs) represent a burgeoning class of oxygen-enriched, zero-dimensional materials characterized by their sub-20-nm dimensions. Exhibiting pronounced quantum confinement and edge effects, GOQDs manifest exceptional physical-chemical attributes. This study delves into the potential of graphene oxide quantum dots, elucidating their inherent properties pertinent to the surface structures of SARS-CoV-2, employing an integrated computational approach for the repositioning of inhibitory agents. METHODS Following rigorous adjustment tests, a spectrum of divergent bonding conformations emerged, with particular emphasis placed on identifying the conformation exhibiting optimal adjustment scores and interactions. The investigation employed molecular docking simulations integrating affinity energy evaluations, electrostatic potential clouds, molecular dynamics encompassing average square root calculations, and the computation of Gibbs-free energy. These values quantify the strength of interaction between GOQDs and SARS-CoV-2 spike protein variants. The receptor structures were optimized using the CHARM-GUI server employing force field AMBERFF14SB. The algorithm embedded in CHARMM offers an efficient interpolation scheme and automatic step size selection, enhancing the efficiency of the optimization process. The 3D structures of the ligands are constructed and optimized with density functional theory (DFT) method based on the most stable conformer of each binder. Autodock Vina Software (ADV) was utilized, where essential parameters were specified. Electrostatic potential maps (MEPs) provide a visual depiction of molecules' charge distributions and related properties. After this, molecular dynamics simulations employing the CHARM36 force field in Gromacs 2022.2 were conducted to investigate GOs' interactions with surface macromolecules of SARS-CoV-2 in an explicit aqueous environment. Furthermore, our investigation suggests that lower values indicate stronger binding. Notably, GO-E consistently showed the most negative values across interactions with different variants, suggesting a higher affinity compared to other GOQDs (GO-A to GO-D).
Collapse
Affiliation(s)
- Tiago da Silva Arouche
- Laboratory of Preparation and Computing of Nanomaterials (LPCN), Federal University of Pará, C. P. 479, Belém, PA, 66075-110, Brazil
| | - Julio Cesar Mendes Lobato
- Laboratory of Preparation and Computing of Nanomaterials (LPCN), Federal University of Pará, C. P. 479, Belém, PA, 66075-110, Brazil
- Graduate Program in Natural Resources Engineering of the Amazon, ITEC, Federal University of Pará, C. P. 2626, Belém, PA, 66050-540, Brazil
| | - Rosivaldo Dos Santos Borges
- Universidade Federal do Pará, Departamento de Farmácia/Laboratório de Química Farmacêutica, Belem, PA, 66075-110, Brazil
| | | | - Antonio Maia de Jesus Chaves Neto
- Laboratory of Preparation and Computing of Nanomaterials (LPCN), Federal University of Pará, C. P. 479, Belém, PA, 66075-110, Brazil.
- Graduate Program in Natural Resources Engineering of the Amazon, ITEC, Federal University of Pará, C. P. 2626, Belém, PA, 66050-540, Brazil.
- Graduate Program in Chemical Engineering, ITEC, Federal University of Pará, C. P. 479, Belém, PA, 66075-900, Brazil.
- Mestrado Nacional Profissional em Ensino de Física, Federal University of Pará, C. P.479, Belém, PA, 66075-110, Brazil.
| |
Collapse
|
6
|
Dong J, Wang S, Cui W, Sun X, Guo H, Yan H, Vogel H, Wang Z, Yuan S. Machine Learning Deciphered Molecular Mechanistics with Accurate Kinetic and Thermodynamic Prediction. J Chem Theory Comput 2024; 20:4499-4513. [PMID: 38394691 DOI: 10.1021/acs.jctc.3c01412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2024]
Abstract
Time-lagged independent component analysis (tICA) and the Markov state model (MSM) have been extensively employed for extracting conformational dynamics and kinetic community networks from unbiased trajectory ensembles. However, these techniques may not be the optimal choice for elucidating transition mechanisms within low-dimensional representations, especially for intricate biosystems. Unraveling the association mechanism in such complex systems always necessitates permutations of several essential independent components or collective variables, a process that is inherently obscure and may require empirical knowledge for selection. To address these challenges, we have implemented an integrated unsupervised dimension reduction model: uniform manifold approximation and projection (UMAP) with hierarchy density-based spatial clustering of applications with noise (HDBSCAN). This approach effectively generates low-dimensional configurational embeddings. The hierarchical application of this architecture, in conjunction with MSM, reveals global kinetic connectivity while identifying local conformational states. Consequently, our methodology establishes a multiscale mechanistic elucidation framework. Leveraging the benefits of the uniform sample distribution and a denoising approach, our model demonstrates robustness in preserving global and local data structures compared to traditional dimension reduction methods in the field of MD analysis area. The interpretability of hyperparameter selection and compatibility with downstream tasks are cross-validated across various simulation data sets, utilizing both computational evaluation metrics and experimental kinetic observables. Furthermore, the predicted Mcl1-BH3 association kinetics (0.76 s-1) is in close agreement with surface plasmon resonance experiments (0.12 s-1), affirming the plausibility of the identified pathway composed of representative conformations. We anticipate that the devised workflow will serve as a foundational framework for studying recognition patterns in complex biological systems. Its contributions extend to the exploration of protein functional dynamics and rational drug design, offering a potent avenue for advancing research in these domains.
Collapse
Affiliation(s)
- Junlin Dong
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shiyu Wang
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- AlphaMol Science Ltd, Shenzhen 518055, China
| | - Wenqiang Cui
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiaolin Sun
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Haojie Guo
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Hailu Yan
- School of Biological Sciences, College of Science and Engineering, University of Edinburgh, Edinburgh EH8 9YL, U.K
| | - Horst Vogel
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Zhi Wang
- Artificial Intelligence Department, Zhejiang Financial College, Hangzhou 310018, China
| | - Shuguang Yuan
- Research Center for Computer-Aided Drug Discovery, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- AlphaMol Science Ltd, Shenzhen 518055, China
| |
Collapse
|