1
|
Chen L, Smith M, Roe DR, Miranda-Quintana RA. Extended Quality (eQual): Radial Threshold Clustering Based on n-ary Similarity. J Chem Inf Model 2025; 65:5062-5070. [PMID: 40309753 DOI: 10.1021/acs.jcim.4c02341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2025]
Abstract
We are transforming Radial Threshold Clustering (RTC), an O(N2) algorithm, into Extended Quality Clustering (eQual), an O(N) algorithm with several novel features. Daura et al.'s RTC algorithm is a partitioning clustering algorithm that groups similar frames together based on their similarity to the seed configuration. RTC has two main issues: it scales as O(N2), making it inefficient for large frame counts, and its clustering results depend on the order of input frames whenever there is a tie in the most populated cluster. To address the first issue, we have increased the speed of the seed selection by using k-means++ to select the seeds of the available frames. To address the second issue and make the results invariant with respect to frame order, the densest and most compact cluster is chosen using the extended similarity indices. The new algorithm is able to cluster in linear time and produce more compact and separate clusters.
Collapse
Affiliation(s)
- Lexin Chen
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
- Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States
| | - Micah Smith
- Institute for Bioscience and Biotechnology Research, National Institute of Standards and Technology and the University of Maryland, Rockville, Maryland 20850, United States
| | - Daniel R Roe
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, United States
| | - Ramón Alain Miranda-Quintana
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
- Quantum Theory Project, University of Florida, Gainesville, Florida 32611, United States
| |
Collapse
|
2
|
Chen L, Leung JMG, Zsigmond K, Chong LT, Miranda-Quintana RA. SHINE: Deterministic Many-to-Many Clustering of Molecular Pathways. J Chem Inf Model 2025; 65:4775-4782. [PMID: 40326720 PMCID: PMC12107702 DOI: 10.1021/acs.jcim.5c00240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/07/2025]
Abstract
State-of-the-art molecular dynamics (MD) simulation methods can generate diverse ensembles of pathways for complex biological processes. Analyzing these pathways using statistical mechanics tools demands identifying key states that contribute to both the dynamic and equilibrium properties of the system. This task becomes especially challenging when analyzing multiple MD simulations simultaneously, a common scenario in enhanced sampling techniques like the weighted ensemble strategy. Here, we present a new module of the MDANCE package designed to streamline the analysis of pathway ensembles. This module integrates n-ary similarity, cheminformatics-inspired tools, and hierarchical clustering to improve analysis efficiency. We present the theoretical foundation behind this approach, termed Sampling Hierarchical Intrinsic N-ary Ensembles (SHINE), and demonstrate its application to simulations of alanine dipeptide and adenylate kinase.
Collapse
Affiliation(s)
- Lexin Chen
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, Florida 32603, United States
| | - Jeremy M G Leung
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Krisztina Zsigmond
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, Florida 32603, United States
| | - Lillian T Chong
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Ramón Alain Miranda-Quintana
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, Florida 32603, United States
| |
Collapse
|
3
|
Kakoulidis P, Theotoki EI, Pantazopoulou VI, Vlachos IS, Emiris IZ, Stravopodis DJ, Anastasiadou E. Comparative structural insights and functional analysis for the distinct unbound states of Human AGO proteins. Sci Rep 2025; 15:9432. [PMID: 40108192 PMCID: PMC11923369 DOI: 10.1038/s41598-025-91849-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2024] [Accepted: 02/24/2025] [Indexed: 03/22/2025] Open
Abstract
The four human Argonaute (AGO) proteins, critical in RNA interference and gene regulation, exhibit high sequence and structural similarity but differ functionally. We investigated the underexplored structural relationships of these paralogs through microsecond-scale molecular dynamics simulations. Our findings reveal that AGO proteins adopt similar, yet unsynchronized, open-close states. We observed similar and unique local conformations, interdomain distances and intramolecular interactions. Conformational differences at GW182/ZSWIM8 interaction sites and in catalytic/pseudo-catalytic tetrads were minimal. Tetrads display conserved movements, interacting with distant miRNA binding residues. We pinpointed long common protein subsequences with consistent molecular movement but varying solvent accessibility per AGO. We observed diverse conformational patterns at the post-transcriptional sites of the AGOs, except for AGO4. By combining simulation data with large datasets of experimental structures and AlphaFold's predictions, we identified proteins with genomic and proteomic similarities. Some of the identified proteins operate in the mitosis pathway, sharing mitosis-related interactors and miRNA targets. Additionally, we suggest that AGOs interact with a mitosis initiator, zinc ion, by predicting potential binding sites and detecting structurally similar proteins with the same function. These findings further advance our understanding for the human AGO protein family and their role in central cellular processes.
Collapse
Affiliation(s)
- Panos Kakoulidis
- Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, 16122, Athens, Greece.
- Center of Basic Research, Biomedical Research Foundation of the Academy of Athens, 4 Soranou Ephessiou St, 11527, Athens, Greece.
| | - Eleni I Theotoki
- Center of Basic Research, Biomedical Research Foundation of the Academy of Athens, 4 Soranou Ephessiou St, 11527, Athens, Greece
- Section of Cell Biology and Biophysics, Department of Biology, School of Science, National and Kapodistrian University of Athens, 15701, Athens, Greece
| | - Vasiliki I Pantazopoulou
- Department of Pathology, Medical School, National and Kapodistrian University of Athens, 11527, Athens, Greece
| | - Ioannis S Vlachos
- Broad Institute of MIT and Harvard, Merkin Building, 415 Main St, Cambridge, MA, 02142, USA
- Cancer Research Institute, Beth Israel Deaconess Medical Center, 330 Brookline Avenue, Boston, MA, 02215, USA
- Department of Pathology, Beth Israel Deaconess Medical Center, 330 Brookline Avenue, Boston, MA, 02215, USA
- Harvard Medical School, 25 Shattuck Street, Boston, MA, 02115, USA
- Spatial Technologies Unit, Harvard Medical School Initiative for RNA Medicine, Beth Israel Deaconess Medical Center, 330 Brookline Avenue, Dana BuildingBoston, MA, 02215, USA
| | - Ioannis Z Emiris
- Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, 16122, Athens, Greece
- ATHENA Research Center, Aigialias & Chalepa, 15125, Marousi, Greece
| | - Dimitrios J Stravopodis
- Section of Cell Biology and Biophysics, Department of Biology, School of Science, National and Kapodistrian University of Athens, 15701, Athens, Greece
| | - Ema Anastasiadou
- Center of Basic Research, Biomedical Research Foundation of the Academy of Athens, 4 Soranou Ephessiou St, 11527, Athens, Greece
- Department of Health Science, Higher Colleges of Technology (HCT), Academic City Campus, 17155, Dubai, United Arab Emirates
| |
Collapse
|
4
|
Pinheiro M, de Oliveira Bispo M, Mattos RS, Telles do Casal M, Chandra Garain B, Toldo JM, Mukherjee S, Barbatti M. ULaMDyn: enhancing excited-state dynamics analysis through streamlined unsupervised learning. DIGITAL DISCOVERY 2025; 4:666-682. [PMID: 39885946 PMCID: PMC11774233 DOI: 10.1039/d4dd00374h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Accepted: 01/07/2025] [Indexed: 02/01/2025]
Abstract
The analysis of nonadiabatic molecular dynamics (NAMD) data presents significant challenges due to its high dimensionality and complexity. To address these issues, we introduce ULaMDyn, a Python-based, open-source package designed to automate the unsupervised analysis of large datasets generated by NAMD simulations. ULaMDyn integrates seamlessly with the Newton-X platform and employs advanced dimensionality reduction and clustering techniques to uncover hidden patterns in molecular trajectories, enabling a more intuitive understanding of excited-state processes. Using the photochemical dynamics of fulvene as a test case, we demonstrate how ULaMDyn efficiently identifies critical molecular geometries and critical nonadiabatic transitions. The package offers a streamlined, scalable solution for interpreting large NAMD datasets. It is poised to facilitate advances in the study of excited-state dynamics across a wide range of molecular systems.
Collapse
Affiliation(s)
- Max Pinheiro
- Aix Marseille University, CNRS, ICR 13397 Marseille France
| | | | | | - Mariana Telles do Casal
- Aix Marseille University, CNRS, ICR 13397 Marseille France
- Department of Chemistry, Physical Chemistry and Quantum Chemistry Division, KU Leuven 3001 Leuven Belgium
| | | | - Josene M Toldo
- Aix Marseille University, CNRS, ICR 13397 Marseille France
- UCBL, ENS de Lyon, CNRS, LCH UMR 5182 69342 Lyon Cedex 07 France
| | - Saikat Mukherjee
- Aix Marseille University, CNRS, ICR 13397 Marseille France
- Faculty of Chemistry, Nicolaus Copernicus University in Toruń Gagarina 7 87-100 Toruń Poland
| | - Mario Barbatti
- Aix Marseille University, CNRS, ICR 13397 Marseille France
- Institut Universitaire de France 75231 Paris France https://barbatti.org/
| |
Collapse
|
5
|
Chen L, Santos JBW, Gaza J, Perez A, Miranda-Quintana RA. Hierarchical Extended Linkage Method (HELM)'s Deep Dive into Hybrid Clustering Strategies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.05.641742. [PMID: 40161705 PMCID: PMC11952300 DOI: 10.1101/2025.03.05.641742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Clustering remains a key tool in the analysis of molecular dynamics (MD) simulations, from the preparation of kinetic models to the study of mechanistic pathways and structural determination. It is no surprise then that multiple algorithms are currently used in the MD community, with k -means and hierarchical approaches being arguably the two most popular approaches. The former is very attractive from a purely computational point of view, demanding minimal memory and time resources, but at the price of being able to partition the data in very restrictive ways. Hierarchical strategies, on the other hand, can generate arbitrary partitions, but with steep memory and time requirements due to their need to build a pairwise distance matrix for all the considered conformations/frames. Here we propose a new hybrid paradigm, the Hierarchical Extended Linkage Method (HELM), that retains the efficiency of k -means while incorporating the flexibility of hierarchical methods. The key ingredient is the use of n -ary difference functions as a way to stabilize the k -means results and efficiently build the hierarchy of subsets. We showcase the applicability of this strategy over protein-DNA and protein folding studies, including the complete analysis of simulations with over 1.5 million frames. HELM is freely available in our MDANCE clustering package.
Collapse
Affiliation(s)
- Lexin Chen
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, Florida, 32611, USA
| | | | - Jokent Gaza
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, Florida, 32611, USA
| | - Alberto Perez
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, Florida, 32611, USA
| | | |
Collapse
|
6
|
Chen L, Roe DR, Miranda-Quintana RA. CADENCE: Clustering Algorithm - Density-based Exploration and Novelty Clustering with Efficiency. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.24.639863. [PMID: 40060588 PMCID: PMC11888282 DOI: 10.1101/2025.02.24.639863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/18/2025]
Abstract
Unsupervised learning techniques play a pivotal role in unraveling protein folding landscapes, constructing Markov State Models, expediting replica exchange simulations, and discerning drug binding patterns, among other applications. A fundamental challenge in current clustering methods lies in how similarities among objects are accessed. Traditional similarity operations are typically only defined over pairs of objects, and this limitation is at the core of many performance issues. The crux of the problem in this field is that efficient algorithms like k-means struggle to distinguish between metastable states effectively. However, more robust methods like density-based clustering demand substantial computational resources. Extended similarity techniques have been proven to swiftly pinpoint high and low-density regions within the data in linear O(N) time. This offers a highly convenient means to explore complex conformational landscapes, enabling focused exploration of rare events or identification of the most representative conformations, such as the medoid of the dataset. In this contribution, we aim to bridge this gap by introducing a novel density clustering algorithm to the Molecular Dynamics Analysis with N-ary Clustering Ensembles (MDANCE) software package based on n-ary similarity framework.
Collapse
Affiliation(s)
- Lexin Chen
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, USA
- Quantum Theory Project, University of Florida, Gainesville, Florida 32611, USA
| | - Daniel R Roe
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Ramón Alain Miranda-Quintana
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, USA
- Quantum Theory Project, University of Florida, Gainesville, Florida 32611, USA
| |
Collapse
|
7
|
Li X, Xiao X, Liu T. Key points for analyzing longitudinal twin growth discordance patterns and adverse perinatal outcomes. Am J Obstet Gynecol 2025:S0002-9378(25)00115-2. [PMID: 40015586 DOI: 10.1016/j.ajog.2025.02.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2025] [Accepted: 02/21/2025] [Indexed: 03/01/2025]
Affiliation(s)
- Xin Li
- Key Laboratory of Birth Defects and Related Diseases of Women and Children (Sichuan University), Ministry of Education, West China Second Hospital, Sichuan University, Chengdu, China.
| | - Xue Xiao
- Key Laboratory of Birth Defects and Related Diseases of Women and Children (Sichuan University), Ministry of Education, West China Second Hospital, Sichuan University, Chengdu, China.
| | - Tianjiao Liu
- Chengdu Women's and Children's Central Hospital, School of Medicine, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
8
|
Mondal K, Klauda JB. Publisher's Note: "Physically interpretable performance metrics for clustering" [J. Chem. Phys. 161, 244106 (2024)]. J Chem Phys 2025; 162:069902. [PMID: 39927549 DOI: 10.1063/5.0260506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2025] [Indexed: 02/11/2025] Open
Affiliation(s)
- Kinjal Mondal
- Institute for Physical Science and Technology, Biophysics Program, University of Maryland, College Park, Maryland 20742, USA
| | - Jeffery B Klauda
- Institute for Physical Science and Technology, Biophysics Program, University of Maryland, College Park, Maryland 20742, USA
- Department of Chemical and Biomolecular Engineering, University of Maryland, College Park, Maryland 20742, USA
| |
Collapse
|
9
|
Chen L, Leung JMG, Zsigmond K, Chong LT, Miranda-Quintana RA. SHINE: Deterministic Many-to-Many clustering of Molecular Pathways. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.07.636541. [PMID: 39975301 PMCID: PMC11839051 DOI: 10.1101/2025.02.07.636541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
State-of-the-art molecular dynamics (MD) simulation methods can generate diverse ensembles of pathways for complex biological processes. Analyzing these pathways using statistical mechanics tools demands identifying key states that contribute to both the dynamic and equilibrium properties of the system. This task becomes especially challenging when analyzing multiple MD simulations simultaneously, a common scenario in enhanced sampling techniques like the weighted ensemble strategy. Here, we present a new module of the MDANCE package designed to streamline the analysis of pathway ensembles. This module integrates n-ary similarity, cheminformatics-inspired tools, and hierarchical clustering to improve analysis efficiency. We present the theoretical foundation behind this approach, termed Sampling Hierarchical Intrinsic N-ary Ensembles (SHINE), and demonstrate its application to simulations of alanine dipeptide and adenylate kinase.
Collapse
Affiliation(s)
- Lexin Chen
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, FL 32603, USA
| | - Jeremy M G Leung
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260
| | - Krisztina Zsigmond
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, FL 32603, USA
| | - Lillian T Chong
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260
| | | |
Collapse
|
10
|
Chen L, Smith M, Roe DR, Miranda-Quintana RA. Extended Quality (eQual): Radial threshold clustering based on n-ary similarity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.05.627001. [PMID: 39677679 PMCID: PMC11643124 DOI: 10.1101/2024.12.05.627001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/17/2024]
Abstract
We are transforming Radial Threshold Clustering (RTC), an O ( N 2 ) algorithm, into Extended Quality Clustering, an O ( N ) algorithm with several novel features. Daura et al's RTC algorithm is a partitioning clustering algorithm that groups similar frames together based on their similarity to the seed configuration. Two current issues with RTC is that it scales as O ( N 2 ) making it inefficient at high frame counts, and the clustering results are dependent on the order of the input frames. To address the first issue, we have increased the speed of the seed selection by using k -means++ to select the seeds of the available frames. To address the second issue and make the results invariant with respect to frame ordering, whenever there is a tie in the most populated cluster, the densest and most compact cluster is chosen using the extended similarity indices. The new algorithm is able to cluster in linear time and produce more compact and separate clusters.
Collapse
Affiliation(s)
- Lexin Chen
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, USA
- Quantum Theory Project, University of Florida, Gainesville, Florida 32611, USA
| | - Micah Smith
- Institute for Bioscience and Biotechnology Research, National Institute of Standards and Technology and the University of Maryland, Rockville, MD 20850, USA
| | - Daniel R Roe
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Ramón Alain Miranda-Quintana
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, USA
- Quantum Theory Project, University of Florida, Gainesville, Florida 32611, USA
| |
Collapse
|