1
|
Ru X, Zou Q, Lin C. Optimization of drug-target affinity prediction methods through feature processing schemes. Bioinformatics 2023; 39:btad615. [PMID: 37812388 PMCID: PMC10636279 DOI: 10.1093/bioinformatics/btad615] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 09/19/2023] [Accepted: 10/07/2023] [Indexed: 10/10/2023] Open
Abstract
MOTIVATION Numerous high-accuracy drug-target affinity (DTA) prediction models, whose performance is heavily reliant on the drug and target feature information, are developed at the expense of complexity and interpretability. Feature extraction and optimization constitute a critical step that significantly influences the enhancement of model performance, robustness, and interpretability. Many existing studies aim to comprehensively characterize drugs and targets by extracting features from multiple perspectives; however, this approach has drawbacks: (i) an abundance of redundant or noisy features; and (ii) the feature sets often suffer from high dimensionality. RESULTS In this study, to obtain a model with high accuracy and strong interpretability, we utilize various traditional and cutting-edge feature selection and dimensionality reduction techniques to process self-associated features and adjacent associated features. These optimized features are then fed into learning to rank to achieve efficient DTA prediction. Extensive experimental results on two commonly used datasets indicate that, among various feature optimization methods, the regression tree-based feature selection method is most beneficial for constructing models with good performance and strong robustness. Then, by utilizing Shapley Additive Explanations values and the incremental feature selection approach, we obtain that the high-quality feature subset consists of the top 150D features and the top 20D features have a breakthrough impact on the DTA prediction. In conclusion, our study thoroughly validates the importance of feature optimization in DTA prediction and serves as inspiration for constructing high-performance and high-interpretable models. AVAILABILITY AND IMPLEMENTATION https://github.com/RUXIAOQING964914140/FS_DTA.
Collapse
Affiliation(s)
- Xiaoqing Ru
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Chen Lin
- Department of Computer Science and Technology, School of Informatics, Xiamen University, Xiamen, Fujian, 361005, China
| |
Collapse
|
2
|
Lin H, Wu H, Li H, Song A, Yin W. The essential role of GSTP1 I105V polymorphism in the prediction of CDNB metabolism and toxicity: In silico and in vitro insights. Toxicol In Vitro 2023; 90:105601. [PMID: 37031912 DOI: 10.1016/j.tiv.2023.105601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 04/02/2023] [Accepted: 04/04/2023] [Indexed: 04/11/2023]
Abstract
Humans are continuously exposed to toxic chemicals such as nitro-chlorobenzene (CDNB) through occupation, water, and even the air we breathe. Due to the severe toxicity caused by the high electrophilicity of CDNB, occupational and environmental exposure to CDNB can produce toxic effects that ultimately lead to cell damage. CDNB can be eliminated from organisms by binding to GSH, the catalytic product of glutathione S-transferase P1 (GSTP1). Therefore, GSTP1 plays an important role in the detoxification of CDNB. However, subtle variations in GSTP1 can result in single nucleotide polymorphisms (SNPs). Indeed, the correlation between the clinical outcome of the disease and certain genotypes of GSTP1 has been extensively studied, however, their impact on the metabolic detoxification of toxicants such as CDNB remains to be elucidated. Among the various SNPs of GSTP1, I105V has a significant effect on the catalytic activity of GSTP1. In this paper, a GSTP1 I105V polymorphism model was successfully established, and its effect on CDNB metabolism and toxicity was studied by computer analysis including molecular docking and molecular dynamics simulation. The result demonstrated that the binding capacity of CDNB decreases with the I105V mutation of GSTP1(p < 0.001), indicating the changes in its detoxification efficacy in CDNB-induced cell damage. Organisms expressing GSTP1 V105 are more susceptible to cell damage caused by CDNB than individuals expressing GSTP1 I105 (p < 0.001). In sum, the data in this study provide prospective insights into the mechanism and capacity of CDNB detoxification in the GSTP1 allele, extending the CDNB-mediated toxicological profile. In addition, the heterogeneity of the GSTP1 allele should be included in toxicological studies of individuals exposed to CDNB.
Collapse
Affiliation(s)
- Hao Lin
- The State Key Lab of Pharmaceutical Biotechnology, College of Life Sciences, Nanjing University, Nanjing, China
| | - Han Wu
- The State Key Lab of Pharmaceutical Biotechnology, College of Life Sciences, Nanjing University, Nanjing, China
| | - Hengda Li
- The State Key Lab of Pharmaceutical Biotechnology, College of Life Sciences, Nanjing University, Nanjing, China
| | - Aoqi Song
- Department of Pharmacy, Shanghai Fourth People's Hospital, School of Medicine, Tongji University, Shanghai, China.
| | - Wu Yin
- The State Key Lab of Pharmaceutical Biotechnology, College of Life Sciences, Nanjing University, Nanjing, China.
| |
Collapse
|
3
|
Krishnan K, Tian H, Tao P, Verkhivker GM. Probing conformational landscapes and mechanisms of allosteric communication in the functional states of the ABL kinase domain using multiscale simulations and network-based mutational profiling of allosteric residue potentials. J Chem Phys 2022; 157:245101. [PMID: 36586979 PMCID: PMC11184971 DOI: 10.1063/5.0133826] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Accepted: 12/05/2022] [Indexed: 12/12/2022] Open
Abstract
In the current study, multiscale simulation approaches and dynamic network methods are employed to examine the dynamic and energetic details of conformational landscapes and allosteric interactions in the ABL kinase domain that determine the kinase functions. Using a plethora of synergistic computational approaches, we elucidate how conformational transitions between the active and inactive ABL states can employ allosteric regulatory switches to modulate intramolecular communication networks between the ATP site, the substrate binding region, and the allosteric binding pocket. A perturbation-based network approach that implements mutational profiling of allosteric residue propensities and communications in the ABL states is proposed. Consistent with biophysical experiments, the results reveal functionally significant shifts of the allosteric interaction networks in which preferential communication paths between the ATP binding site and substrate regions in the active ABL state become suppressed in the closed inactive ABL form, which in turn features favorable allosteric coupling between the ATP site and the allosteric binding pocket. By integrating the results of atomistic simulations with dimensionality reduction methods and Markov state models, we analyze the mechanistic role of macrostates and characterize kinetic transitions between the ABL conformational states. Using network-based mutational scanning of allosteric residue propensities, this study provides a comprehensive computational analysis of long-range communications in the ABL kinase domain and identifies conserved regulatory hotspots that modulate kinase activity and allosteric crosstalk between the allosteric pocket, ATP binding site, and substrate binding regions.
Collapse
Affiliation(s)
| | - Hao Tian
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75205, USA
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75205, USA
| | - Gennady M. Verkhivker
- Author to whom correspondence should be addressed: . Telephone: 714-516-4586. Fax: 714-532-6048
| |
Collapse
|
4
|
Bhakat S. Collective variable discovery in the age of machine learning: reality, hype and everything in between. RSC Adv 2022; 12:25010-25024. [PMID: 36199882 PMCID: PMC9437778 DOI: 10.1039/d2ra03660f] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 08/20/2022] [Indexed: 11/21/2022] Open
Abstract
Understanding the kinetics and thermodynamics profile of biomolecules is necessary to understand their functional roles which has a major impact in mechanism driven drug discovery. Molecular dynamics simulation has been routinely used to understand conformational dynamics and molecular recognition in biomolecules. Statistical analysis of high-dimensional spatiotemporal data generated from molecular dynamics simulation requires identification of a few low-dimensional variables which can describe the essential dynamics of a system without significant loss of information. In physical chemistry, these low-dimensional variables are often called collective variables. Collective variables are used to generate reduced representations of free energy surfaces and calculate transition probabilities between different metastable basins. However the choice of collective variables is not trivial for complex systems. Collective variables range from geometric criteria such as distances and dihedral angles to abstract ones such as weighted linear combinations of multiple geometric variables. The advent of machine learning algorithms led to increasing use of abstract collective variables to represent biomolecular dynamics. In this review, I will highlight several nuances of commonly used collective variables ranging from geometric to abstract ones. Further, I will put forward some cases where machine learning based collective variables were used to describe simple systems which in principle could have been described by geometric ones. Finally, I will put forward my thoughts on artificial general intelligence and how it can be used to discover and predict collective variables from spatiotemporal data generated by molecular dynamics simulations.
Collapse
Affiliation(s)
- Soumendranath Bhakat
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania Pennsylvania 19104-6059 USA +1 30549 32620
| |
Collapse
|
5
|
Tian H, Jiang X, Trozzi F, Xiao S, Larson EC, Tao P. Explore Protein Conformational Space With Variational Autoencoder. Front Mol Biosci 2021; 8:781635. [PMID: 34869602 PMCID: PMC8633506 DOI: 10.3389/fmolb.2021.781635] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 10/28/2021] [Indexed: 12/02/2022] Open
Abstract
Molecular dynamics (MD) simulations have been actively used in the study of protein structure and function. However, extensive sampling in the protein conformational space requires large computational resources and takes a prohibitive amount of time. In this study, we demonstrated that variational autoencoders (VAEs), a type of deep learning model, can be employed to explore the conformational space of a protein through MD simulations. VAEs are shown to be superior to autoencoders (AEs) through a benchmark study, with low deviation between the training and decoded conformations. Moreover, we show that the learned latent space in the VAE can be used to generate unsampled protein conformations. Additional simulations starting from these generated conformations accelerated the sampling process and explored hidden spaces in the conformational landscape.
Collapse
Affiliation(s)
- Hao Tian
- Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Department of Chemistry, Southern Methodist University, Dallas, TX, United States
| | - Xi Jiang
- Department of Statistical Science, Southern Methodist University, Dallas, TX, United States
| | - Francesco Trozzi
- Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Department of Chemistry, Southern Methodist University, Dallas, TX, United States
| | - Sian Xiao
- Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Department of Chemistry, Southern Methodist University, Dallas, TX, United States
| | - Eric C. Larson
- Department of Computer Science, Southern Methodist University, Dallas, TX, United States
| | - Peng Tao
- Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Department of Chemistry, Southern Methodist University, Dallas, TX, United States
| |
Collapse
|
6
|
Rheological mechanism of polymer nanocomposites filled with spherical nanoparticles: Insight from molecular dynamics simulation. POLYMER 2021. [DOI: 10.1016/j.polymer.2021.124129] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
7
|
Trozzi F, Wang X, Tao P. UMAP as a Dimensionality Reduction Tool for Molecular Dynamics Simulations of Biomacromolecules: A Comparison Study. J Phys Chem B 2021; 125:5022-5034. [PMID: 33973773 PMCID: PMC8356557 DOI: 10.1021/acs.jpcb.1c02081] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Proteins are the molecular machines of life. The multitude of possible conformations that proteins can adopt determines their free-energy landscapes. However, the inherently high dimensionality of a protein free-energy landscape poses a challenge to deciphering how proteins perform their functions. For this reason, dimensionality reduction is an active field of research for molecular biologists. The uniform manifold approximation and projection (UMAP) is a dimensionality reduction method based on a fuzzy topological analysis of data. In the present study, the performance of UMAP is compared with that of other popular dimensionality reduction methods such as t-distributed stochastic neighbor embedding (t-SNE), principal component analysis (PCA), and time-structure independent components analysis (tICA) in the context of analyzing molecular dynamics simulations of the circadian clock protein VIVID. A good dimensionality reduction method should accurately represent the data structure on the projected components. The comparison of the raw high-dimensional data with the projections obtained using different dimensionality reduction methods based on various metrics showed that UMAP has superior performance when compared with linear reduction methods (PCA and tICA) and has competitive performance and scalable computational cost.
Collapse
Affiliation(s)
- Francesco Trozzi
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas, 75275, United States of America
| | - Xinlei Wang
- Department of Statistical Science, Southern Methodist University, Dallas, Texas, 75275, United States of America
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas, 75275, United States of America
| |
Collapse
|
8
|
Tian H, Trozzi F, Zoltowski BD, Tao P. Deciphering the Allosteric Process of the Phaeodactylum tricornutum Aureochrome 1a LOV Domain. J Phys Chem B 2020; 124:8960-8972. [PMID: 32970438 DOI: 10.1021/acs.jpcb.0c05842] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The conformational-driven allosteric protein diatom Phaeodactylum tricornutum aureochrome 1a (PtAu1a) differs from other light-oxygen-voltage (LOV) proteins for its uncommon structural topology. The mechanism of signaling transduction in the PtAu1a LOV domain (AuLOV) including flanking helices remains unclear because of this dissimilarity, which hinders the study of PtAu1a as an optogenetic tool. To clarify this mechanism, we employed a combination of tree-based machine learning models, Markov state models, machine-learning-based community analysis, and transition path theory to quantitatively analyze the allosteric process. Our results are in good agreement with the reported experimental findings and reveal a previously overlooked Cα helix and protein linkers as important in promoting the protein conformational changes. This integrated approach can be considered as a general workflow and applied on other allosteric proteins to provide detailed information about their allosteric mechanisms.
Collapse
Affiliation(s)
- Hao Tian
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75275, United States
| | - Francesco Trozzi
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75275, United States
| | - Brian D Zoltowski
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75275, United States
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75275, United States
| |
Collapse
|