1
|
Bastami Z, Sheikhpour R, Razzaghi P, Ramazani A, Gharaghani S. Proteochemometrics modeling for prediction of the interactions between caspase isoforms and their inhibitors. Mol Divers 2023; 27:249-261. [PMID: 35438428 DOI: 10.1007/s11030-022-10425-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 03/28/2022] [Indexed: 11/29/2022]
Abstract
Caspases (cysteine-aspartic proteases) play critical roles in inflammation and the programming of cell death in the form of necroptosis, apoptosis, and pyroptosis. The name of these enzymes has been chosen in accordance with their cysteine protease activity. They act as cysteines in nucleophilically active sites to attack and cleave target proteins in the aspartic acid and amino acid C-terminal. Based on the substrate's structure and the specificity, the physiological activity of caspases is divided. However, in apoptosis, the division of caspases into initiating caspases (caspase 2, 8, 9, and 10) and executive caspases (caspase 3, 6, and 7) is essential. The present study aimed to perform Proteochemometrics Modeling to generalize the data on caspases, which could predict ligand and protein interactions. In this study, we employed protein and ligand descriptors. Moreover, protein descriptors were computed using the Protr R package, while PADEL-Descriptor was employed for the computation of ligand descriptors. In addition, NCA (Neighborhood Component Analyses) was used for descriptor selection, and SVR, decision tree, and ensemble methods were utilized for the proteochemometrics modeling. This study shows that the ensemble model demonstrates superior performance compared with other models in terms of R2, Q2, and RMSE criteria.
Collapse
Affiliation(s)
- Zahra Bastami
- Department of Bioinformatics, Kish International Campus, University of Tehran, Kish, Iran.,Laboratory of Bioinformatics and Drug Design (LBD), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Razieh Sheikhpour
- Department of Computer Engineering, Faculty of Engineering, Ardakan University, P.O. Box 184, Ardakan, Iran
| | - Parvin Razzaghi
- Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, Iran
| | - Ali Ramazani
- Cancer Gene Therapy Research Center, Zanjan University of Medical Sciences, Zanjan, Iran
| | - Sajjad Gharaghani
- Laboratory of Bioinformatics and Drug Design (LBD), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
| |
Collapse
|
2
|
Bender A, Schneider N, Segler M, Patrick Walters W, Engkvist O, Rodrigues T. Evaluation guidelines for machine learning tools in the chemical sciences. Nat Rev Chem 2022; 6:428-442. [PMID: 37117429 DOI: 10.1038/s41570-022-00391-9] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/13/2022] [Indexed: 02/07/2023]
Abstract
Machine learning (ML) promises to tackle the grand challenges in chemistry and speed up the generation, improvement and/or ordering of research hypotheses. Despite the overarching applicability of ML workflows, one usually finds diverse evaluation study designs. The current heterogeneity in evaluation techniques and metrics leads to difficulty in (or the impossibility of) comparing and assessing the relevance of new algorithms. Ultimately, this may delay the digitalization of chemistry at scale and confuse method developers, experimentalists, reviewers and journal editors. In this Perspective, we critically discuss a set of method development and evaluation guidelines for different types of ML-based publications, emphasizing supervised learning. We provide a diverse collection of examples from various authors and disciplines in chemistry. While taking into account varying accessibility across research groups, our recommendations focus on reporting completeness and standardizing comparisons between tools. We aim to further contribute to improved ML transparency and credibility by suggesting a checklist of retro-/prospective tests and dissecting their importance. We envisage that the wide adoption and continuous update of best practices will encourage an informed use of ML on real-world problems related to the chemical sciences.
Collapse
|
3
|
Lee I, Nam H. Sequence-based prediction of protein binding regions and drug-target interactions. J Cheminform 2022; 14:5. [PMID: 35135622 PMCID: PMC8822694 DOI: 10.1186/s13321-022-00584-w] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 01/20/2022] [Indexed: 12/19/2022] Open
Abstract
Identifying drug-target interactions (DTIs) is important for drug discovery. However, searching all drug-target spaces poses a major bottleneck. Therefore, recently many deep learning models have been proposed to address this problem. However, the developers of these deep learning models have neglected interpretability in model construction, which is closely related to a model's performance. We hypothesized that training a model to predict important regions on a protein sequence would increase DTI prediction performance and provide a more interpretable model. Consequently, we constructed a deep learning model, named Highlights on Target Sequences (HoTS), which predicts binding regions (BRs) between a protein sequence and a drug ligand, as well as DTIs between them. To train the model, we collected complexes of protein-ligand interactions and protein sequences of binding sites and pretrained the model to predict BRs for a given protein sequence-ligand pair via object detection employing transformers. After pretraining the BR prediction, we trained the model to predict DTIs from a compound token designed to assign attention to BRs. We confirmed that training the BRs prediction model indeed improved the DTI prediction performance. The proposed HoTS model showed good performance in BR prediction on independent test datasets even though it does not use 3D structure information in its prediction. Furthermore, the HoTS model achieved the best performance in DTI prediction on test datasets. Additional analysis confirmed the appropriate attention for BRs and the importance of transformers in BR and DTI prediction. The source code is available on GitHub ( https://github.com/GIST-CSBL/HoTS ).
Collapse
Affiliation(s)
- Ingoo Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-ku, Gwangju, 61005 Republic of Korea
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-ku, Gwangju, 61005 Republic of Korea
| |
Collapse
|
4
|
Mervin LH, Johansson S, Semenova E, Giblin KA, Engkvist O. Uncertainty quantification in drug design. Drug Discov Today 2020; 26:474-489. [PMID: 33253918 DOI: 10.1016/j.drudis.2020.11.027] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Revised: 07/13/2020] [Accepted: 11/23/2020] [Indexed: 01/03/2023]
Abstract
Machine learning and artificial intelligence are increasingly being applied to the drug-design process as a result of the development of novel algorithms, growing access, the falling cost of computation and the development of novel technologies for generating chemically and biologically relevant data. There has been recent progress in fields such as molecular de novo generation, synthetic route prediction and, to some extent, property predictions. Despite this, most research in these fields has focused on improving the accuracy of the technologies, rather than on quantifying the uncertainty in the predictions. Uncertainty quantification will become a key component in autonomous decision making and will be crucial for integrating machine learning and chemistry automation to create an autonomous design-make-test-analyse cycle. This review covers the empirical, frequentist and Bayesian approaches to uncertainty quantification, and outlines how they can be used for drug design. We also outline the impact of uncertainty quantification on decision making.
Collapse
Affiliation(s)
- Lewis H Mervin
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK.
| | - Simon Johansson
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden; Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Elizaveta Semenova
- Data Sciences and Quantitative Biology, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Kathryn A Giblin
- Medicinal Chemistry, Research and Early Development, Oncology R&D, AstraZeneca, Cambridge, UK
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| |
Collapse
|
5
|
Prieto-Martínez FD, Medina-Franco JL. Current advances on the development of BET inhibitors: insights from computational methods. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2020; 122:127-180. [PMID: 32951810 DOI: 10.1016/bs.apcsb.2020.06.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Epigenetics was coined almost 70 years ago for the description of heritable phenotype without altering DNA sequences. Research on the field has uncovered significant roles of such mechanisms, that account for the biogenesis of several diseases. Further studies have led the way for drug development which targets epi-enzymes, mainly for cancer treatment. Of the numerous epi-targets involved with histone acetylation, bromodomains have captured the spotlight of drug discovery focused on novel therapies. However, due to high sequence identity, the development of potent and selective inhibitors poses a significant challenge. Herein, we discuss recent computational developments on BET inhibitors and other methods that may be applied for drug discovery in general. As a proof-of-concept, we discuss a virtual screening to identify novel BET inhibitors based on coumarin derivatives. From public data, we identified putative structure-activity relationships of coumarin scaffold and propose R-group modifications for BET selectivity. Results showed that the optimization and design of novel coumarins could be further explored.
Collapse
Affiliation(s)
- Fernando D Prieto-Martínez
- Department of Pharmacy, School of Chemistry, National Autonomous University of Mexico, Mexico City, Mexico
| | - José L Medina-Franco
- Department of Pharmacy, School of Chemistry, National Autonomous University of Mexico, Mexico City, Mexico
| |
Collapse
|
6
|
Parks C, Gaieb Z, Amaro RE. An Analysis of Proteochemometric and Conformal Prediction Machine Learning Protein-Ligand Binding Affinity Models. Front Mol Biosci 2020; 7:93. [PMID: 32671093 PMCID: PMC7328444 DOI: 10.3389/fmolb.2020.00093] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 04/22/2020] [Indexed: 11/13/2022] Open
Abstract
Protein-ligand binding affinity is a key pharmacodynamic endpoint in drug discovery. Sole reliance on experimental design, make, and test cycles is costly and time consuming, providing an opportunity for computational methods to assist. Herein, we present results comparing random forest and feed-forward neural network proteochemometric models for their ability to predict pIC50 measurements for held out generic Bemis-Murcko scaffolds. In addition, we assess the ability of conformal prediction to provide calibrated prediction intervals in both a retrospective and semi-prospective test using the recently released Grand Challenge 4 data set as an external test set. In total, random forest and deep neural network proteochemometric models show quality retrospective performance but suffer in the semi-prospective setting. However, the conformal predictor prediction intervals prove to be well-calibrated both retrospectively and semi-prospectively showing that they can be used to guide hit discovery and lead optimization campaigns.
Collapse
Affiliation(s)
| | | | - Rommie E. Amaro
- Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA, United States
| |
Collapse
|
7
|
Bongers BJ, IJzerman AP, Van Westen GJP. Proteochemometrics - recent developments in bioactivity and selectivity modeling. DRUG DISCOVERY TODAY. TECHNOLOGIES 2019; 32-33:89-98. [PMID: 33386099 DOI: 10.1016/j.ddtec.2020.08.003] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 08/18/2020] [Accepted: 08/28/2020] [Indexed: 06/12/2023]
Abstract
Proteochemometrics is a machine learning based modeling approach relying on a combination of ligand and protein descriptors. With ongoing developments in machine learning and increases in public data the technique is more frequently applied in early drug discovery, typically in ligand-target binding prediction. Common applications include improvements to single target quantitative structure-activity relationship models, protein selectivity and promiscuity modeling, and large-scale deep learning approaches. The increase in predictive power using proteochemometrics is observed in multi-target bioactivity modeling, opening the door to more extensive studies covering whole protein families. On top of that, with deep learning fueling more complex and larger scale models, proteochemometrics allows faster and higher quality computational models supporting the design, make, test cycle.
Collapse
Affiliation(s)
- Brandon J Bongers
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands
| | - Adriaan P IJzerman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands
| | - Gerard J P Van Westen
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands.
| |
Collapse
|