1
|
Chung E, Wen X, Jia X, Ciallella HL, Aleksunes LM, Zhu H. Hybrid non-animal modeling: A mechanistic approach to predict chemical hepatotoxicity. J Hazard Mater 2024; 471:134297. [PMID: 38677119 DOI: 10.1016/j.jhazmat.2024.134297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 04/10/2024] [Accepted: 04/11/2024] [Indexed: 04/29/2024]
Abstract
Developing mechanistic non-animal testing methods based on the adverse outcome pathway (AOP) framework must incorporate molecular and cellular key events associated with target toxicity. Using data from an in vitro assay and chemical structures, we aimed to create a hybrid model to predict hepatotoxicants. We first curated a reference dataset of 869 compounds for hepatotoxicity modeling. Then, we profiled them against PubChem for existing in vitro toxicity data. Of the 2560 resulting assays, we selected the mitochondrial membrane potential (MMP) assay, a high-throughput screening (HTS) tool that can test chemical disruptors for mitochondrial function. Machine learning was applied to develop quantitative structure-activity relationship (QSAR) models with 2536 compounds tested in the MMP assay for screening new compounds. The MMP assay results, including QSAR model outputs, yielded hepatotoxicity predictions for reference set compounds with a Correct Classification Ratio (CCR) of 0.59. The predictivity improved by including 37 structural alerts (CCR = 0.8). We validated our model by testing 37 reference set compounds in human HepG2 hepatoma cells, and reliably predicting them for hepatotoxicity (CCR = 0.79). This study introduces a novel AOP modeling strategy that combines public HTS data, computational modeling, and experimental testing to predict chemical hepatotoxicity.
Collapse
Affiliation(s)
- Elena Chung
- Department of Chemistry and Biochemistry, Rowan University, NJ, USA; Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, LA, USA
| | - Xia Wen
- Department of Pharmacology and Toxicology, Rutgers University, Piscataway, NJ, USA
| | - Xuelian Jia
- Department of Chemistry and Biochemistry, Rowan University, NJ, USA; Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, LA, USA
| | - Heather L Ciallella
- Department of Toxicology, Cuyahoga County Medical Examiner's Office, Cleveland, OH, USA
| | - Lauren M Aleksunes
- Department of Pharmacology and Toxicology, Rutgers University, Piscataway, NJ, USA
| | - Hao Zhu
- Department of Chemistry and Biochemistry, Rowan University, NJ, USA; Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, LA, USA.
| |
Collapse
|
2
|
Satalkar V, Degaga GD, Li W, Pang YT, McShan AC, Gumbart JC, Mitchell JC, Torres MP. Generative β-hairpin design using a residue-based physicochemical property landscape. Biophys J 2024:S0006-3495(24)00070-5. [PMID: 38297834 DOI: 10.1016/j.bpj.2024.01.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/20/2023] [Accepted: 01/25/2024] [Indexed: 02/02/2024] Open
Abstract
De novo peptide design is a new frontier that has broad application potential in the biological and biomedical fields. Most existing models for de novo peptide design are largely based on sequence homology that can be restricted based on evolutionarily derived protein sequences and lack the physicochemical context essential in protein folding. Generative machine learning for de novo peptide design is a promising way to synthesize theoretical data that are based on, but unique from, the observable universe. In this study, we created and tested a custom peptide generative adversarial network intended to design peptide sequences that can fold into the β-hairpin secondary structure. This deep neural network model is designed to establish a preliminary foundation of the generative approach based on physicochemical and conformational properties of 20 canonical amino acids, for example, hydrophobicity and residue volume, using extant structure-specific sequence data from the PDB. The beta generative adversarial network model robustly distinguishes secondary structures of β hairpin from α helix and intrinsically disordered peptides with an accuracy of up to 96% and generates artificial β-hairpin peptide sequences with minimum sequence identities around 31% and 50% when compared against the current NCBI PDB and nonredundant databases, respectively. These results highlight the potential of generative models specifically anchored by physicochemical and conformational property features of amino acids to expand the sequence-to-structure landscape of proteins beyond evolutionary limits.
Collapse
Affiliation(s)
- Vardhan Satalkar
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia
| | - Gemechis D Degaga
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee
| | - Wei Li
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia
| | - Yui Tik Pang
- School of Physics, Georgia Institute of Technology, Atlanta, Georgia
| | - Andrew C McShan
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia
| | - James C Gumbart
- School of Physics, Georgia Institute of Technology, Atlanta, Georgia; School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia
| | - Julie C Mitchell
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee.
| | - Matthew P Torres
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia; School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia.
| |
Collapse
|
3
|
Khairullina V, Martynova Y. Quantitative Structure-Activity Relationship in the Series of 5-Ethyluridine, N2-Guanine, and 6-Oxopurine Derivatives with Pronounced Anti-Herpetic Activity. Molecules 2023; 28:7715. [PMID: 38067446 PMCID: PMC10708366 DOI: 10.3390/molecules28237715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 11/10/2023] [Accepted: 11/13/2023] [Indexed: 12/18/2023] Open
Abstract
A quantitative analysis of the relationship between the structure and inhibitory activity against the herpes simplex virus thymidine kinase (HSV-TK) was performed for the series of 5-ethyluridine, N2-guanine, and 6-oxopurines derivatives with pronounced anti-herpetic activity (IC50 = 0.09 ÷ 160,000 μmol/L) using the GUSAR 2019 software. On the basis of the MNA and QNA descriptors and whole-molecule descriptors using the self-consistent regression, 12 statistically significant consensus models for predicting numerical pIC50 values were constructed. These models demonstrated high predictive accuracy for the training and test sets. Molecular fragments of HSV-1 and HSV-2 TK inhibitors that enhance or diminish the anti-herpetic activity are considered. Virtual screening of the ChEMBL database using the developed QSAR models revealed 42 new effective HSV-1 and HSV-2 TK inhibitors. These compounds are promising for further research. The obtained data open up new opportunities for developing novel effective inhibitors of TK.
Collapse
Affiliation(s)
- Veronika Khairullina
- Institute of Chemistry and Defence in Emergency Situations, Ufa University of Science and Technology, 50076 Ufa, Russia;
| | | |
Collapse
|
4
|
Mostofian B, Martin HJ, Razavi A, Patel S, Allen B, Sherman W, Izaguirre JA. Targeted Protein Degradation: Advances, Challenges, and Prospects for Computational Methods. J Chem Inf Model 2023; 63:5408-5432. [PMID: 37602861 PMCID: PMC10498452 DOI: 10.1021/acs.jcim.3c00603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Indexed: 08/22/2023]
Abstract
The therapeutic approach of targeted protein degradation (TPD) is gaining momentum due to its potentially superior effects compared with protein inhibition. Recent advancements in the biotech and pharmaceutical sectors have led to the development of compounds that are currently in human trials, with some showing promising clinical results. However, the use of computational tools in TPD is still limited, as it has distinct characteristics compared with traditional computational drug design methods. TPD involves creating a ternary structure (protein-degrader-ligase) responsible for the biological function, such as ubiquitination and subsequent proteasomal degradation, which depends on the spatial orientation of the protein of interest (POI) relative to E2-loaded ubiquitin. Modeling this structure necessitates a unique blend of tools initially developed for small molecules (e.g., docking) and biologics (e.g., protein-protein interaction modeling). Additionally, degrader molecules, particularly heterobifunctional degraders, are generally larger than conventional small molecule drugs, leading to challenges in determining drug-like properties like solubility and permeability. Furthermore, the catalytic nature of TPD makes occupancy-based modeling insufficient. TPD consists of multiple interconnected yet distinct steps, such as POI binding, E3 ligase binding, ternary structure interactions, ubiquitination, and degradation, along with traditional small molecule properties. A comprehensive set of tools is needed to address the dynamic nature of the induced proximity ternary complex and its implications for ubiquitination. In this Perspective, we discuss the current state of computational tools for TPD. We start by describing the series of steps involved in the degradation process and the experimental methods used to characterize them. Then, we delve into a detailed analysis of the computational tools employed in TPD. We also present an integrative approach that has proven successful for degrader design and its impact on project decisions. Finally, we examine the future prospects of computational methods in TPD and the areas with the greatest potential for impact.
Collapse
Affiliation(s)
- Barmak Mostofian
- OpenEye, Cadence Molecular Sciences, Boston, Massachusetts 02114 United States
| | - Holli-Joi Martin
- Laboratory
for Molecular Modeling, Division of Chemical Biology and Medicinal
Chemistry, Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599 United States
| | - Asghar Razavi
- ENKO
Chem, Inc, Mystic, Connecticut 06355 United States
| | - Shivam Patel
- Psivant
Therapeutics, Boston, Massachusetts 02210 United States
| | - Bryce Allen
- Differentiated
Therapeutics, San Diego, California 92056 United States
| | - Woody Sherman
- Psivant
Therapeutics, Boston, Massachusetts 02210 United States
| | - Jesus A Izaguirre
- Differentiated
Therapeutics, San Diego, California 92056 United States
- Atommap
Corporation, New York, New York 10013 United States
| |
Collapse
|
5
|
Liu W, Wang Z, Chen J, Tang W, Wang H. Machine Learning Model for Screening Thyroid Stimulating Hormone Receptor Agonists Based on Updated Datasets and Improved Applicability Domain Metrics. Chem Res Toxicol 2023. [PMID: 37209109 DOI: 10.1021/acs.chemrestox.3c00074] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Machine learning (ML) models for screening endocrine-disrupting chemicals (EDCs), such as thyroid stimulating hormone receptor (TSHR) agonists, are essential for sound management of chemicals. Previous models for screening TSHR agonists were built on imbalanced datasets and lacked applicability domain (AD) characterization essential for regulatory application. Herein, an updated TSHR agonist dataset was built, for which the ratio of active to inactive compounds greatly increased to 1:2.6, and chemical spaces of structure-activity landscapes (SALs) were enhanced. Resulting models based on 7 molecular representations and 4 ML algorithms were proven to outperform previous ones. Weighted similarity density (ρs) and weighted inconsistency of activities (IA) were proposed to characterize the SALs, and a state-of-the-art AD characterization methodology ADSAL{ρs, IA} was established. An optimal classifier developed with PubChem fingerprints and the random forest algorithm, coupled with ADSAL{ρs ≥ 0.15, IA ≤ 0.65}, exhibited good performance on the validation set with the area under the receiver operating characteristic curve being 0.984 and balanced accuracy being 0.941 and identified 90 TSHR agonist classes that could not be found previously. The classifier together with the ADSAL{ρs, IA} may serve as efficient tools for screening EDCs, and the AD characterization methodology may be applied to other ML models.
Collapse
Affiliation(s)
- Wenjia Liu
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Zhongyu Wang
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Jingwen Chen
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Weihao Tang
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Haobo Wang
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
6
|
Poongavanam V, Kölling F, Giese A, Göller AH, Lehmann L, Meibom D, Kihlberg J. Predictive Modeling of PROTAC Cell Permeability with Machine Learning. ACS Omega 2023; 8:5901-5916. [PMID: 36816707 PMCID: PMC9933238 DOI: 10.1021/acsomega.2c07717] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Accepted: 01/19/2023] [Indexed: 06/18/2023]
Abstract
Approaches for predicting proteolysis targeting chimera (PROTAC) cell permeability are of major interest to reduce resource-demanding synthesis and testing of low-permeable PROTACs. We report a comprehensive investigation of the scope and limitations of machine learning-based binary classification models developed using 17 simple descriptors for large and structurally diverse sets of cereblon (CRBN) and von Hippel-Lindau (VHL) PROTACs. For the VHL PROTAC set, kappa nearest neighbor and random forest models performed best and predicted the permeability of a blinded test set with >80% accuracy (k ≥ 0.57). Models retrained by combining the original training and the blinded test set performed equally well for a second blinded VHL set. However, models for CRBN PROTACs were less successful, mainly due to the imbalanced nature of the CRBN datasets. All descriptors contributed to the models, but size and lipophilicity were the most important. We conclude that properly trained machine learning models can be integrated as effective filters in the PROTAC design process.
Collapse
Affiliation(s)
| | - Florian Kölling
- Computational
Molecular Design, Bayer AG, 42096Wuppertal, Germany
| | - Anja Giese
- Drug
Discovery Sciences, Bayer AG, 13342Berlin, Germany
| | | | - Lutz Lehmann
- Drug
Discovery Sciences, Bayer AG, 42113Wuppertal, Germany
| | - Daniel Meibom
- Drug
Discovery Sciences, Bayer AG, 42113Wuppertal, Germany
| | - Jan Kihlberg
- Department
of Chemistry-BMC, Box 576, Uppsala University, 75123Uppsala, Sweden
| |
Collapse
|
7
|
Xu M, Lu Z, Wu Z, Gui M, Liu G, Tang Y, Li W. Development of In Silico Models for Predicting Potential Time-Dependent Inhibitors of Cytochrome P450 3A4. Mol Pharm 2023; 20:194-205. [PMID: 36458739 DOI: 10.1021/acs.molpharmaceut.2c00571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Cytochrome P450 3A4 (CYP3A4) is one of the major drug metabolizing enzymes in the human body and metabolizes ∼30-50% of clinically used drugs. Inhibition of CYP3A4 must always be considered in the development of new drugs. Time-dependent inhibition (TDI) is an important P450 inhibition type that could cause undesired drug-drug interactions. Therefore, identification of CYP3A4 TDI by a rapid convenient way is of great importance to any new drug discovery effort. Here, we report the development of in silico classification models for prediction of potential CYP3A4 time-dependent inhibitors. On the basis of the CYP3A4 TDI data set that we manually collected from literature and databases, both conventional machine learning and deep learning models were constructed. The comparisons of different sampling strategies, molecular representations, and machine-learning algorithms showed the benefits of a balanced data set and the deep-learning model featured by GraphConv. The generalization ability of the best model was tested by screening an external data set, and the prediction results were validated by biological experiments. In addition, several structural alerts that are relevant to CYP3A4 time-dependent inhibitors were identified via information gain and frequency analysis. We anticipate that our effort would be useful for identification of potential CYP3A4 time-dependent inhibitors in drug discovery and design.
Collapse
Affiliation(s)
- Minjie Xu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai200237, China
| | - Zhou Lu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai200237, China
| | - Zengrui Wu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai200237, China
| | - Minyan Gui
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai200237, China
| |
Collapse
|
8
|
Stolbov LA, Filimonov DA, Poroikov VV. SAR based on self consistent classifier. SAR QSAR Environ Res 2022; 33:793-804. [PMID: 36369710 DOI: 10.1080/1062936x.2022.2139751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 10/20/2022] [Indexed: 06/16/2023]
Abstract
The accuracy and performance of (Q)SAR models depend significantly on the data used for training. Datasets prepared on the basis of publicly available databases contain structures belonging to different chemical classes and have a highly imbalanced actives/inactives ratio. Currently, hundreds of structural descriptors are used in (Q)SAR studies. The abundance of structural descriptors gives rise to the problem of the constructed (Q)SAR models stability. The methods frequently used for the selection of a small fraction of the 'best' descriptors usually do not have sufficient mathematical justification. We propose a new approach to a self-consistent classifier for SAR analysis in order to overcome these problems. Logistic (SCLC) and extreme (SCEC) extensions of self-consistent regression (SCR) were implemented to enhance the classification capabilities of SCR. The approach was applied to classification models' development for inhibiting activity endpoints in HIV-1-related data and toxicity endpoints with subsequent fivefold cross-validation to estimate the models' performance. Comparison of the proposed SCLC and SCEC models with those developed using the original SCR and support vector machine demonstrated the comparable accuracy. Advantages in feature selection using our approach provide more generalizable (Q)SAR models. In particular, the crucial factors responsible for the observed value are determined unambiguously.
Collapse
Affiliation(s)
- L A Stolbov
- Laboratory of Structure-Function Based Drug Design, Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow, Russian Federation
| | - D A Filimonov
- Laboratory of Structure-Function Based Drug Design, Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow, Russian Federation
| | - V V Poroikov
- Laboratory of Structure-Function Based Drug Design, Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow, Russian Federation
| |
Collapse
|
9
|
Rudik A, Dmitriev A, Lagunin A, Filimonov D, Poroikov V. Computational Prediction of Inhibitors and Inducers of the Major Isoforms of Cytochrome P450. Molecules 2022; 27:5875. [PMID: 36144612 DOI: 10.3390/molecules27185875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 09/06/2022] [Accepted: 09/08/2022] [Indexed: 11/29/2022]
Abstract
Human cytochrome P450 enzymes (CYPs) are heme-containing monooxygenases. This superfamily of drug-metabolizing enzymes is responsible for the metabolism of most drugs and other xenobiotics. The inhibition of CYPs may lead to drug–drug interactions and impair the biotransformation of drugs. CYP inducers may decrease the bioavailability and increase the clearance of drugs. Based on the freely available databases ChEMBL and PubChem, we have collected over 70,000 records containing the structures of inhibitors and inducers together with the IC50 values for the inhibitors of the five major human CYPs: 1A2, 3A4, 2D6, 2C9, and 2C19. Based on the collected data, we developed (Q)SAR models for predicting inhibitors and inducers of these CYPs using GUSAR and PASS software. The developed (Q)SAR models could be applied for assessment of the interaction of novel drug-like substances with the major human CYPs. The created (Q)SAR models demonstrated reasonable accuracy of prediction. They have been implemented in the web application P450-Analyzer that is freely available via the Internet.
Collapse
|
10
|
Guttman Y, Kerem Z. Computer-Aided (In Silico) Modeling of Cytochrome P450-Mediated Food–Drug Interactions (FDI). Int J Mol Sci 2022; 23:ijms23158498. [PMID: 35955630 PMCID: PMC9369352 DOI: 10.3390/ijms23158498] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Revised: 07/26/2022] [Accepted: 07/28/2022] [Indexed: 02/01/2023] Open
Abstract
Modifications of the activity of Cytochrome 450 (CYP) enzymes by compounds in food might impair medical treatments. These CYP-mediated food–drug interactions (FDI) play a major role in drug clearance in the intestine and liver. Inter-individual variation in both CYP expression and structure is an important determinant of FDI. Traditional targeted approaches have highlighted a limited number of dietary inhibitors and single-nucleotide variations (SNVs), each determining personal CYP activity and inhibition. These approaches are costly in time, money and labor. Here, we review computational tools and databases that are already available and are relevant to predicting CYP-mediated FDIs. Computer-aided approaches such as protein–ligand interaction modeling and the virtual screening of big data narrow down hundreds of thousands of items in databanks to a few putative targets, to which the research resources could be further directed. Structure-based methods are used to explore the structural nature of the interaction between compounds and CYP enzymes. However, while collections of chemical, biochemical and genetic data are available today and call for the implementation of big-data approaches, ligand-based machine-learning approaches for virtual screening are still scarcely used for FDI studies. This review of CYP-mediated FDIs promises to attract scientists and the general public.
Collapse
|
11
|
Hochuli J, Jain S, Melo-Filho C, Sessions ZL, Bobrowski T, Choe J, Zheng J, Eastman R, Talley DC, Rai G, Simeonov A, Tropsha A, Muratov EN, Baljinnyam B, Zakharov AV. Allosteric Binders of ACE2 Are Promising Anti-SARS-CoV-2 Agents. ACS Pharmacol Transl Sci 2022; 5:468-478. [PMID: 35821746 PMCID: PMC9236207 DOI: 10.1021/acsptsci.2c00049] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The COVID-19 pandemic has had enormous health, economic, and social consequences. Vaccines have been successful in reducing rates of infection and hospitalization, but there is still a need for acute treatment of the disease. We investigate whether compounds that bind the human angiotensin-converting enzyme 2 (ACE2) protein can decrease SARS-CoV-2 replication without impacting ACE2's natural enzymatic function. Initial screening of a diversity library resulted in hit compounds active in an ACE2-binding assay, which showed little inhibition of ACE2 enzymatic activity (116 actives, success rate ∼4%), suggesting they were allosteric binders. Subsequent application of in silico techniques boosted success rates to ∼14% and resulted in 73 novel confirmed ACE2 binders with K d values as low as 6 nM. A subsequent SARS-CoV-2 assay revealed that five of these compounds inhibit the viral life cycle in human cells. Further effort is required to completely elucidate the antiviral mechanism of these ACE2-binders, but they present a valuable starting point for both the development of acute treatments for COVID-19 and research into the host-directed therapy.
Collapse
Affiliation(s)
- Joshua
E. Hochuli
- Molecular
Modeling Laboratory, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
- Curriculum
in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Sankalp Jain
- National
Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland 20850, United States
| | - Cleber Melo-Filho
- Molecular
Modeling Laboratory, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Zoe L. Sessions
- Molecular
Modeling Laboratory, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Tesia Bobrowski
- Molecular
Modeling Laboratory, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Jun Choe
- National
Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland 20850, United States
| | - Johnny Zheng
- National
Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland 20850, United States
| | - Richard Eastman
- National
Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland 20850, United States
| | - Daniel C. Talley
- National
Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland 20850, United States
| | - Ganesha Rai
- National
Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland 20850, United States
| | - Anton Simeonov
- National
Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland 20850, United States
| | - Alexander Tropsha
- Molecular
Modeling Laboratory, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Eugene N. Muratov
- Molecular
Modeling Laboratory, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, United States
| | - Bolormaa Baljinnyam
- National
Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland 20850, United States
| | - Alexey V. Zakharov
- National
Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland 20850, United States
| |
Collapse
|
12
|
Priya S, Tripathi G, Singh DB, Jain P, Kumar A. Machine learning approaches and their applications in drug discovery and design. Chem Biol Drug Des 2022; 100:136-153. [PMID: 35426249 DOI: 10.1111/cbdd.14057] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Revised: 03/30/2022] [Accepted: 04/10/2022] [Indexed: 01/04/2023]
Abstract
This review is focused on several machine learning approaches used in chemoinformatics. Machine learning approaches provide tools and algorithms to improve drug discovery. Many physicochemical properties of drugs like toxicity, absorption, drug-drug interaction, carcinogenesis, and distribution have been effectively modeled by QSAR techniques. Machine learning is a subset of artificial intelligence, and this technique has shown tremendous potential in the field of drug discovery. Techniques discussed in this review are capable of modeling non-linear datasets, as well as big data of increasing depth and complexity. Various machine learning-based approaches are being used for drug target prediction, modeling the structure of drug target, binding site prediction, ligand-based similarity searching, de novo designing of ligands with desired properties, developing scoring functions for molecular docking, building QSAR model for biological activity prediction, and prediction of pharmacokinetic and pharmacodynamic properties of ligands. In recent years, these predictive tools and models have achieved good accuracy. By the use of more related input data, relevant parameters, and appropriate algorithms, the accuracy of these predictions can be further improved.
Collapse
Affiliation(s)
- Sonal Priya
- Department of Chemistry, T. N. B. College, TMBU, Bhagalpur, India
| | - Garima Tripathi
- Department of Chemistry, T. N. B. College, TMBU, Bhagalpur, India
| | - Dev Bukhsh Singh
- Department of Biotechnology, Siddharth University, Siddharth Nagar, India
| | - Priyanka Jain
- National Institute of Plant Genome Research, New Delhi, India
| | - Abhijeet Kumar
- Department of Chemistry, Mahatma Gandhi Central University, Motihari, India
| |
Collapse
|
13
|
Hochuli JE, Jain S, Melo-filho C, Sessions ZL, Bobrowski T, Choe J, Zheng J, Eastman R, Talley DC, Rai G, Simeonov A, Tropsha A, Muratov EN, Baljinnyam B, Zakharov AV. Allosteric binders of ACE2 are promising anti-SARS-CoV-2 agents.. [PMID: 35313579 PMCID: PMC8936107 DOI: 10.1101/2022.03.15.484484] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
AbstractThe COVID-19 pandemic has had enormous health, economic, and social consequences. Vaccines have been successful in reducing rates of infection and hospitalization, but there is still a need for an acute treatment for the disease. We investigate whether compounds that bind the human ACE2 protein can interrupt SARS-CoV-2 replication without damaging ACE2’s natural enzymatic function. Initial compounds were screened for binding to ACE2 but little interruption of ACE2 enzymatic activity. This set of compounds was extended by application of quantitative structure-activity analysis, which resulted in 512 virtual hits for further confirmatory screening. A subsequent SARS-CoV-2 replication assay revealed that five of these compounds inhibit SARS-CoV-2 replication in human cells. Further effort is required to completely determine the antiviral mechanism of these compounds, but they serve as a strong starting point for both development of acute treatments for COVID-19 and research into the mechanism of infection.Abstract FigureTOC Graphic: Overall study design.
Collapse
|
14
|
Manggara AB, Ohkawa K, Sugimoto M. Classifying Modes of Toxic Action of Molecules with Electronic-structure Informatics. Application to Imbalanced Toxicity Data of Phenol Derivatives to Tetrahymena pyriformis. CHEM LETT 2021. [DOI: 10.1246/cl.210453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Algafari Bakti Manggara
- Graduate School of Science and Technology, Kumamoto University, 2-39-1 Kurokami, Chuo-ku, Kumamoto 860-8555, Japan
| | - Kazufumi Ohkawa
- Graduate School of Science and Technology, Kumamoto University, 2-39-1 Kurokami, Chuo-ku, Kumamoto 860-8555, Japan
| | - Manabu Sugimoto
- Graduate School of Science and Technology, Kumamoto University, 2-39-1 Kurokami, Chuo-ku, Kumamoto 860-8555, Japan
- Faculty of Advanced Science and Technology, Kumamoto University, 2-39-1 Kurokami, Chuo-ku, Kumamoto 860-8555, Japan
- Institute of Industrial Nanomaterials, Kumamoto University, 2-39-1 Kurokami, Chuo-ku, Kumamoto 860-8555, Japan
| |
Collapse
|
15
|
Wang L, Zhao L, Liu X, Fu J, Zhang A. SepPCNET: Deeping Learning on a 3D Surface Electrostatic Potential Point Cloud for Enhanced Toxicity Classification and Its Application to Suspected Environmental Estrogens. Environ Sci Technol 2021; 55:9958-9967. [PMID: 34240848 DOI: 10.1021/acs.est.1c01228] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Deep learning (DL) offers an unprecedented opportunity to revolutionize the landscape of toxicity prediction based on quantitative structure-activity relationship (QSAR) studies in the big data era. However, the structural description in the reported DL-QSAR models is still restricted to the two-dimensional level. Inspired by point clouds, a type of geometric data structure, a novel three-dimensional (3D) molecular surface point cloud with electrostatic potential (SepPC) was proposed to describe chemical structures. Each surface point of a chemical is assigned its 3D coordinate and molecular electrostatic potential. A novel DL architecture SepPCNET was then introduced to directly consume unordered SepPC data for toxicity classification. The SepPCNET model was trained on 1317 chemicals tested in a battery of 18 estrogen receptor-related assays of the ToxCast program. The obtained model recognized the active and inactive chemicals at accuracies of 82.8 and 88.9%, respectively, with a total accuracy of 88.3% on the internal test set and 92.5% on the external test set, which outperformed other up-to-date machine learning models and succeeded in recognizing the difference in the activity of isomers. Additional insights into the toxicity mechanism were also gained by visualizing critical points and extracting data-driven point features of active chemicals.
Collapse
Affiliation(s)
- Liguo Wang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
| | - Lu Zhao
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
| | - Xian Liu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, P. R. China
| | - Jianjie Fu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, P. R. China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310012, P. R. China
| | - Aiqian Zhang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310012, P. R. China
| |
Collapse
|
16
|
Casanova-Alvarez O, Morales-Helguera A, Cabrera-Pérez MÁ, Molina-Ruiz R, Molina C. A Novel Automated Framework for QSAR Modeling of Highly Imbalanced Leishmania High-Throughput Screening Data. J Chem Inf Model 2021; 61:3213-3231. [PMID: 34191520 DOI: 10.1021/acs.jcim.0c01439] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In silico prediction of antileishmanial activity using quantitative structure-activity relationship (QSAR) models has been developed on limited and small datasets. Nowadays, the availability of large and diverse high-throughput screening data provides an opportunity to the scientific community to model this activity from the chemical structure. In this study, we present the first KNIME automated workflow to modeling a large, diverse, and highly imbalanced dataset of compounds with antileishmanial activity. Because the data is strongly biased toward inactive compounds, a novel strategy was implemented based on the selection of different balanced training sets and a further consensus model using single decision trees as the base model and three criteria for output combinations. The decision tree consensus was adopted after comparing its classification performance to consensuses built upon Gaussian-Naı̈ve-Bayes, Support-Vector-Machine, Random-Forest, Gradient-Boost, and Multi-Layer-Perceptron base models. All these consensuses were rigorously validated using internal and external test validation sets and were compared against each other using Friedman and Bonferroni-Dunn statistics. For the retained decision tree-based consensus model, which covers 100% of the chemical space of the dataset and with the lowest consensus level, the overall accuracy statistics for test and external sets were between 71 and 74% and 71 and 76%, respectively, while for a reduced chemical space (21%) and with an incremental consensus level, the accuracy statistics were substantially improved with values for the test and external sets between 86 and 92% and 88 and 92%, respectively. These results highlight the relevance of the consensus model to prioritize a relatively small set of active compounds with high prediction sensitivity using the Incremental Consensus at high level values or to predict as many compounds as possible, lowering the level of Incremental Consensus. Finally, the workflow developed eliminates human bias, improves the procedure reproducibility, and allows other researchers to reproduce our design and use it in their own QSAR problems.
Collapse
Affiliation(s)
- Omar Casanova-Alvarez
- Departamento de Química, Facultad de Química-Farmacia, Universidad Central "Marta Abreu" de Las Villas, Santa Clara, Villa Clara 54830, Cuba
| | - Aliuska Morales-Helguera
- Centro de Bioactivos Químicos, Universidad Central "Marta Abreu" de Las Villas, Santa Clara, Villa Clara 54830, Cuba
| | - Miguel Ángel Cabrera-Pérez
- Centro de Bioactivos Químicos, Universidad Central "Marta Abreu" de Las Villas, Santa Clara, Villa Clara 54830, Cuba
| | - Reinaldo Molina-Ruiz
- Centro de Bioactivos Químicos, Universidad Central "Marta Abreu" de Las Villas, Santa Clara, Villa Clara 54830, Cuba
| | - Christophe Molina
- PIKAÏROS S.A., B03 - 2 Allée de la Clairière, 31650 Saint Orens de Gameville, France
| |
Collapse
|
17
|
Esposito C, Landrum GA, Schneider N, Stiefl N, Riniker S. GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning. J Chem Inf Model 2021; 61:2623-2640. [PMID: 34100609 DOI: 10.1021/acs.jcim.1c00160] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Machine learning classifiers trained on class imbalanced data are prone to overpredict the majority class. This leads to a larger misclassification rate for the minority class, which in many real-world applications is the class of interest. For binary data, the classification threshold is set by default to 0.5 which, however, is often not ideal for imbalanced data. Adjusting the decision threshold is a good strategy to deal with the class imbalance problem. In this work, we present two different automated procedures for the selection of the optimal decision threshold for imbalanced classification. A major advantage of our procedures is that they do not require retraining of the machine learning models or resampling of the training data. The first approach is specific for random forest (RF), while the second approach, named GHOST, can be potentially applied to any machine learning classifier. We tested these procedures on 138 public drug discovery data sets containing structure-activity data for a variety of pharmaceutical targets. We show that both thresholding methods improve significantly the performance of RF. We tested the use of GHOST with four different classifiers in combination with two molecular descriptors, and we found that most classifiers benefit from threshold optimization. GHOST also outperformed other strategies, including random undersampling and conformal prediction. Finally, we show that our thresholding procedures can be effectively applied to real-world drug discovery projects, where the imbalance and characteristics of the data vary greatly between the training and test sets.
Collapse
Affiliation(s)
- Carmen Esposito
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Gregory A Landrum
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland.,T5 Informatics GmbH, Spalenring 11, 4055 Basel, Switzerland
| | - Nadine Schneider
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| | - Nikolaus Stiefl
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| | - Sereina Riniker
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| |
Collapse
|
18
|
Ring C, Sipes NS, Hsieh JH, Carberry C, Koval LE, Klaren WD, Harris MA, Auerbach SS, Rager JE. Predictive modeling of biological responses in the rat liver using in vitro Tox21 bioactivity: Benefits from high-throughput toxicokinetics. Comput Toxicol 2021; 18:100166. [PMID: 34013136 PMCID: PMC8130852 DOI: 10.1016/j.comtox.2021.100166] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Computational methods are needed to more efficiently leverage data from in vitro cell-based models to predict what occurs within whole body systems after chemical insults. This study set out to test the hypothesis that in vitro high-throughput screening (HTS) data can more effectively predict in vivo biological responses when chemical disposition and toxicokinetic (TK) modeling are employed. In vitro HTS data from the Tox21 consortium were analyzed in concert with chemical disposition modeling to derive nominal, aqueous, and intracellular estimates of concentrations eliciting 50% maximal activity. In vivo biological responses were captured using rat liver transcriptomic data from the DrugMatrix and TG-Gates databases and evaluated for pathway enrichment. In vivo dosing data were translated to equivalent body concentrations using HTTK modeling. Random forest models were then trained and tested to predict in vivo pathway-level activity across 221 chemicals using in vitro bioactivity data and physicochemical properties as predictor variables, incorporating methods to address imbalanced training data resulting from high instances of inactivity. Model performance was quantified using the area under the receiver operator characteristic curve (AUC-ROC) and compared across pathways for different combinations of predictor variables. All models that included toxicokinetics were found to outperform those that excluded toxicokinetics. Biological interpretation of the model features revealed that rather than a direct mapping of in vitro assays to in vivo pathways, unexpected combinations of multiple in vitro assays predicted in vivo pathway-level activities. To demonstrate the utility of these findings, the highest-performing model was leveraged to make new predictions of in vivo biological responses across all biological pathways for remaining chemicals tested in Tox21 with adequate data coverage (n = 6617). These results demonstrate that, when chemical disposition and toxicokinetics are carefully considered, in vitro HT screening data can be used to effectively predict in vivo biological responses to chemicals.
Collapse
Affiliation(s)
- Caroline Ring
- ToxStrategies, Inc., Austin, TX 78751, United States
| | - Nisha S. Sipes
- Division of the National Toxicology Program, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, United States
| | - Jui-Hua Hsieh
- Kelly Government Solutions, Durham, NC 27709, United States
| | - Celeste Carberry
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
- The Institute for Environmental Health Solutions, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Lauren E. Koval
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
- The Institute for Environmental Health Solutions, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - William D. Klaren
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77840, United States
| | | | - Scott S. Auerbach
- Division of the National Toxicology Program, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, United States
| | - Julia E. Rager
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
- The Institute for Environmental Health Solutions, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
- Curriculum in Toxicology and Environmental Medicine, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| |
Collapse
|
19
|
Ciallella HL, Russo DP, Aleksunes LM, Grimm FA, Zhu H. Predictive modeling of estrogen receptor agonism, antagonism, and binding activities using machine- and deep-learning approaches. J Transl Med 2021; 101:490-502. [PMID: 32778734 PMCID: PMC7873171 DOI: 10.1038/s41374-020-00477-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Revised: 07/19/2020] [Accepted: 07/21/2020] [Indexed: 11/23/2022] Open
Abstract
As defined by the World Health Organization, an endocrine disruptor is an exogenous substance or mixture that alters function(s) of the endocrine system and consequently causes adverse health effects in an intact organism, its progeny, or (sub)populations. Traditional experimental testing regimens to identify toxicants that induce endocrine disruption can be expensive and time-consuming. Computational modeling has emerged as a promising and cost-effective alternative method for screening and prioritizing potentially endocrine-active compounds. The efficient identification of suitable chemical descriptors and machine-learning algorithms, including deep learning, is a considerable challenge for computational toxicology studies. Here, we sought to apply classic machine-learning algorithms and deep-learning approaches to a panel of over 7500 compounds tested against 18 Toxicity Forecaster assays related to nuclear estrogen receptor (ERα and ERβ) activity. Three binary fingerprints (Extended Connectivity FingerPrints, Functional Connectivity FingerPrints, and Molecular ACCess System) were used as chemical descriptors in this study. Each descriptor was combined with four machine-learning and two deep- learning (normal and multitask neural networks) approaches to construct models for all 18 ER assays. The resulting model performance was evaluated using the area under the receiver- operating curve (AUC) values obtained from a fivefold cross-validation procedure. The results showed that individual models have AUC values that range from 0.56 to 0.86. External validation was conducted using two additional sets of compounds (n = 592 and n = 966) with established interactions with nuclear ER demonstrated through experimentation. An agonist, antagonist, or binding score was determined for each compound by averaging its predicted probabilities in relevant assay models as an external validation, yielding AUC values ranging from 0.63 to 0.91. The results suggest that multitask neural networks offer advantages when modeling mechanistically related endpoints. Consensus predictions based on the average values of individual models remain the best modeling strategy for computational toxicity evaluations.
Collapse
Affiliation(s)
- Heather L Ciallella
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA
| | - Daniel P Russo
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA
| | - Lauren M Aleksunes
- Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, NJ, USA
| | - Fabian A Grimm
- ExxonMobil Biomedical Sciences, Inc., Annandale, NJ, USA
| | - Hao Zhu
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA.
- Department of Chemistry, Rutgers University, Camden, NJ, USA.
| |
Collapse
|
20
|
Abstract
![]()
In
silico analysis of biological activity data has become an essential
technique in pharmaceutical development. Specifically, the so-called
proteochemometric models aim to share information between targets
in machine learning ligand–target activity prediction models.
However, bioactivity data sets used in proteochemometric modeling
are usually imbalanced, which could potentially affect the performance
of the models. In this work, we explored the effect of different balancing
strategies in deep learning proteochemometric target–compound
activity classification models while controlling for the compound
series bias through clustering. These strategies were (1) no_resampling,
(2) resampling_after_clustering, (3) resampling_before_clustering,
and (4) semi_resampling. These schemas were evaluated in kinases,
GPCRs, nuclear receptors, and proteases from BindingDB. We observed
that the predicted proportion of positives was driven by the actual
data balance in the test set. Additionally, it was confirmed that
data balance had an impact on the performance estimates of the proteochemometric
model. We recommend a combination of data augmentation and clustering
in the training set (semi_resampling) to mitigate the data imbalance
effect in a realistic scenario. The code of this analysis is publicly
available at https://github.com/b2slab/imbalance_pcm_benchmark.
Collapse
Affiliation(s)
- Angela Lopez-Del Rio
- B2SLab, Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya, 08028 Barcelona, Spain.,Department of Biomedical Engineering, Institut de Recerca Pediàtrica Hospital Sant Joan de Déu, 08950 Esplugues de Llobregat, Spain
| | - Sergio Picart-Armada
- B2SLab, Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya, 08028 Barcelona, Spain.,Department of Biomedical Engineering, Institut de Recerca Pediàtrica Hospital Sant Joan de Déu, 08950 Esplugues de Llobregat, Spain
| | - Alexandre Perera-Lluna
- B2SLab, Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya, 08028 Barcelona, Spain.,Department of Biomedical Engineering, Institut de Recerca Pediàtrica Hospital Sant Joan de Déu, 08950 Esplugues de Llobregat, Spain
| |
Collapse
|
21
|
Kawai K, Tomonou M, Machida Y, Karuo Y, Tarui A, Sato K, Ikeda Y, Kinashi T, Omote M. Effect of Learning Dataset for Identification of Active Molecules: A Case Study of Integrin αIIbβ3 Inhibitors. Mol Inform 2021; 40:e2060040. [PMID: 33738924 DOI: 10.1002/minf.202060040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 01/30/2021] [Indexed: 01/13/2023]
Abstract
Efficient in silico approaches are needed to identify strong integrin αIIbβ3 inhibitors through a small number of measurements. To address the challenge, we investigated the effect of learning dataset on the classification performance of machine learning models focusing on weak and inactive compounds. The structure and activity information of the compounds were obtained from ChEMBL, and pCHEMBL values were used to classify them as active, inactive, or weak. Datasets with various imbalance levels from active:inactive=1 : 1 to 1 : 1000 were used for the machine learning. The prediction scores of the weak samples were found to lie between the predictive values of active and inactive compounds. In addition, another dataset that consists of 149 actives and 6.9 million inactives was screened; the results indicated that the number of positive predictions decreased for models trained with a higher number of inactives. Although there is a trade-off between false positives and false negatives, for determination of compounds with strong activity using a reduced number of measurements, it is better to use a large number of inactives for learning and identifying compounds that score higher than the weak samples.
Collapse
Affiliation(s)
- Kentaro Kawai
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Mami Tomonou
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Yume Machida
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Yukiko Karuo
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Atsushi Tarui
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Kazuyuki Sato
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| | - Yoshiki Ikeda
- Department of Molecular Genetics, Institute of Biomedical Science, Kansai Medical University, 2-5-1 Shin-machi, Hirakata, Osaka, 573-1010, Japan
| | - Tatsuo Kinashi
- Department of Molecular Genetics, Institute of Biomedical Science, Kansai Medical University, 2-5-1 Shin-machi, Hirakata, Osaka, 573-1010, Japan
| | - Masaaki Omote
- Faculty of Pharmaceutical Sciences, Setsunan University, 45-1, Nagaotoge-cho, Hirakata, Osaka, 573-0101, Japan
| |
Collapse
|
22
|
Rácz A, Bajusz D, Héberger K. Effect of Dataset Size and Train/Test Split Ratios in QSAR/QSPR Multiclass Classification. Molecules 2021; 26:1111. [PMID: 33669834 PMCID: PMC7922354 DOI: 10.3390/molecules26041111] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 02/04/2021] [Accepted: 02/16/2021] [Indexed: 01/04/2023] Open
Abstract
Applied datasets can vary from a few hundred to thousands of samples in typical quantitative structure-activity/property (QSAR/QSPR) relationships and classification. However, the size of the datasets and the train/test split ratios can greatly affect the outcome of the models, and thus the classification performance itself. We compared several combinations of dataset sizes and split ratios with five different machine learning algorithms to find the differences or similarities and to select the best parameter settings in nonbinary (multiclass) classification. It is also known that the models are ranked differently according to the performance merit(s) used. Here, 25 performance parameters were calculated for each model, then factorial ANOVA was applied to compare the results. The results clearly show the differences not just between the applied machine learning algorithms but also between the dataset sizes and to a lesser extent the train/test split ratios. The XGBoost algorithm could outperform the others, even in multiclass modeling. The performance parameters reacted differently to the change of the sample set size; some of them were much more sensitive to this factor than the others. Moreover, significant differences could be detected between train/test split ratios as well, exerting a great effect on the test validation of our models.
Collapse
Affiliation(s)
- Anita Rácz
- Department of Plasma Chemistry, Institute of Materials and Environmental Chemistry, ELKH Research Centre for Natural Sciences, Magyar Tudósok krt. 2, H-1117 Budapest, Hungary;
| | - Dávid Bajusz
- Medicinal Chemistry Research Group, ELKH Research Centre for Natural Sciences, Magyar Tudósok krt. 2, H-1117 Budapest, Hungary;
| | - Károly Héberger
- Department of Plasma Chemistry, Institute of Materials and Environmental Chemistry, ELKH Research Centre for Natural Sciences, Magyar Tudósok krt. 2, H-1117 Budapest, Hungary;
| |
Collapse
|
23
|
Jain S, Siramshetty VB, Alves VM, Muratov EN, Kleinstreuer N, Tropsha A, Nicklaus MC, Simeonov A, Zakharov AV. Large-Scale Modeling of Multispecies Acute Toxicity End Points Using Consensus of Multitask Deep Learning Methods. J Chem Inf Model 2021; 61:653-663. [PMID: 33533614 DOI: 10.1021/acs.jcim.0c01164] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Computational methods to predict molecular properties regarding safety and toxicology represent alternative approaches to expedite drug development, screen environmental chemicals, and thus significantly reduce associated time and costs. There is a strong need and interest in the development of computational methods that yield reliable predictions of toxicity, and many approaches, including the recently introduced deep neural networks, have been leveraged towards this goal. Herein, we report on the collection, curation, and integration of data from the public data sets that were the source of the ChemIDplus database for systemic acute toxicity. These efforts generated the largest publicly available such data set comprising > 80,000 compounds measured against a total of 59 acute systemic toxicity end points. This data was used for developing multiple single- and multitask models utilizing random forest, deep neural networks, convolutional, and graph convolutional neural network approaches. For the first time, we also reported the consensus models based on different multitask approaches. To the best of our knowledge, prediction models for 36 of the 59 end points have never been published before. Furthermore, our results demonstrated a significantly better performance of the consensus model obtained from three multitask learning approaches that particularly predicted the 29 smaller tasks (less than 300 compounds) better than other models developed in the study. The curated data set and the developed models have been made publicly available at https://github.com/ncats/ld50-multitask, https://predictor.ncats.io/, and https://cactus.nci.nih.gov/download/acute-toxicity-db (data set only) to support regulatory and research applications.
Collapse
Affiliation(s)
- Sankalp Jain
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Vishal B Siramshetty
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Vinicius M Alves
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Eugene N Muratov
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Nicole Kleinstreuer
- Division of Intramural Research, Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, 111 T.W. Alexander Drive, Durham, North Carolina 27709, United States.,National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, 111 T.W. Alexander Drive, Durham, North Carolina 27709, United States
| | - Alexander Tropsha
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Marc C Nicklaus
- Computer-Aided Drug Design (CADD) Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, DHHS, NCI-Frederick, 376 Boyles Street, Frederick, Maryland 21702, United States
| | - Anton Simeonov
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Alexey V Zakharov
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| |
Collapse
|
24
|
Hussin SK, Abdelmageid SM, Alkhalil A, Omar YM, Marie MI, Ramadan RA. Handling Imbalance Classification Virtual Screening Big Data Using Machine Learning Algorithms. Complexity 2021; 2021:1-15. [DOI: 10.1155/2021/6675279] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
Virtual screening is the most critical process in drug discovery, and it relies on machine learning to facilitate the screening process. It enables the discovery of molecules that bind to a specific protein to form a drug. Despite its benefits, virtual screening generates enormous data and suffers from drawbacks such as high dimensions and imbalance. This paper tackles data imbalance and aims to improve virtual screening accuracy, especially for a minority dataset. For a dataset identified without considering the data’s imbalanced nature, most classification methods tend to have high predictive accuracy for the majority category. However, the accuracy was significantly poor for the minority category. The paper proposes a K-mean algorithm coupled with Synthetic Minority Oversampling Technique (SMOTE) to overcome the problem of imbalanced datasets. The proposed algorithm is named as KSMOTE. Using KSMOTE, minority data can be identified at high accuracy and can be detected at high precision. A large set of experiments were implemented on Apache Spark using numeric PaDEL and fingerprint descriptors. The proposed solution was compared to both no-sampling method and SMOTE on the same datasets. Experimental results showed that the proposed solution outperformed other methods.
Collapse
Affiliation(s)
- Sahar K. Hussin
- Communication and Computers Engineering Department Alshrouck Academy, Cairo, Egypt
| | - Salah M. Abdelmageid
- Computer Engineering Department, Collage of Comp. Science and Engineering, Taibah University, Medina, Saudi Arabia
| | - Adel Alkhalil
- College of Computer Science and Engineering, University of Hai’l, Hai’l, Saudi Arabia
| | - Yasser M. Omar
- Arab Academy for Science Technology and Maritime Transport, Cairo, Egypt
| | - Mahmoud I. Marie
- Computer and System Engineering Department, Al-Azhar University, Cairo, Egypt
| | - Rabie A. Ramadan
- College of Computer Science and Engineering, University of Hai’l, Hai’l, Saudi Arabia
- Computer Engineering Department, Cairo Universality, Cairo, Egypt
| |
Collapse
|
25
|
Shen C, Weng G, Zhang X, Leung ELH, Yao X, Pang J, Chai X, Li D, Wang E, Cao D, Hou T. Accuracy or novelty: what can we gain from target-specific machine-learning-based scoring functions in virtual screening? Brief Bioinform 2021; 22:6070382. [PMID: 33418562 DOI: 10.1093/bib/bbaa410] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Revised: 11/26/2020] [Accepted: 12/12/2020] [Indexed: 12/13/2022] Open
Abstract
Machine-learning (ML)-based scoring functions (MLSFs) have gradually emerged as a promising alternative for protein-ligand binding affinity prediction and structure-based virtual screening. However, clouds of doubts have still been raised against the benefits of this novel type of scoring functions (SFs). In this study, to benchmark the performance of target-specific MLSFs on a relatively unbiased dataset, the MLSFs trained from three representative protein-ligand interaction representations were assessed on the LIT-PCBA dataset, and the classical Glide SP SF and three types of ligand-based quantitative structure-activity relationship (QSAR) models were also utilized for comparison. Two major aspects in virtual screening campaigns, including prediction accuracy and hit novelty, were systematically explored. The calculation results illustrate that the tested target-specific MLSFs yielded generally superior performance over the classical Glide SP SF, but they could hardly outperform the 2D fingerprint-based QSAR models. Although substantial improvements could be achieved by integrating multiple types of protein-ligand interaction features, the MLSFs were still not sufficient to exceed MACCS-based QSAR models. In terms of the correlations between the hit ranks or the structures of the top-ranked hits, the MLSFs developed by different featurization strategies would have the ability to identify quite different hits. Nevertheless, it seems that target-specific MLSFs do not have the intrinsic attributes of a traditional SF and may not be a substitute for classical SFs. In contrast, MLSFs can be regarded as a new derivative of ligand-based QSAR models. It is expected that our study may provide valuable guidance for the assessment and further development of target-specific MLSFs.
Collapse
Affiliation(s)
- Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Gaoqi Weng
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Xujun Zhang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Elaine Lai-Han Leung
- State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macau, SAR, China
| | - Xiaojun Yao
- State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macau, SAR, China
| | - Jinping Pang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Xin Chai
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dan Li
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Ercheng Wang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| |
Collapse
|
26
|
Antelo-Collado A, Carrasco-Velar R, García-Pedrajas N, Cerruela-García G. Effective Feature Selection Method for Class-Imbalance Datasets Applied to Chemical Toxicity Prediction. J Chem Inf Model 2020; 61:76-94. [PMID: 33350301 DOI: 10.1021/acs.jcim.0c00908] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
During the drug development process, it is common to carry out toxicity tests and adverse effect studies, which are essential to guarantee patient safety and the success of the research. The use of in silico quantitative structure-activity relationship (QSAR) approaches for this task involves processing a huge amount of data that, in many cases, have an imbalanced distribution of active and inactive samples. This is usually termed the class-imbalance problem and may have a significant negative effect on the performance of the learned models. The performance of feature selection (FS) for QSAR models is usually damaged by the class-imbalance nature of the involved datasets. This paper proposes the use of an FS method focused on dealing with the class-imbalance problems. The method is based on the use of FS ensembles constructed by boosting and using two well-known FS methods, fast clustering-based FS and the fast correlation-based filter. The experimental results demonstrate the efficiency of the proposal in terms of the classification performance compared to standard methods. The proposal can be extended to other FS methods and applied to other problems in cheminformatics.
Collapse
Affiliation(s)
| | | | - Nicolás García-Pedrajas
- Department of Computing and Numerical Analysis, University of Córdoba, Campus de Rabanales, Albert Einstein Building, E-14071 Córdoba, Spain
| | - Gonzalo Cerruela-García
- Department of Computing and Numerical Analysis, University of Córdoba, Campus de Rabanales, Albert Einstein Building, E-14071 Córdoba, Spain
| |
Collapse
|
27
|
Cáceres EL, Mew NC, Keiser MJ. Adding Stochastic Negative Examples into Machine Learning Improves Molecular Bioactivity Prediction. J Chem Inf Model 2020; 60:5957-5970. [PMID: 33245237 DOI: 10.1021/acs.jcim.0c00565] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Multitask deep neural networks learn to predict ligand-target binding by example, yet public pharmacological data sets are sparse, imbalanced, and approximate. We constructed two hold-out benchmarks to approximate temporal and drug-screening test scenarios, whose characteristics differ from a random split of conventional training data sets. We developed a pharmacological data set augmentation procedure, Stochastic Negative Addition (SNA), which randomly assigns untested molecule-target pairs as transient negative examples during training. Under the SNA procedure, drug-screening benchmark performance increases from R2 = 0.1926 ± 0.0186 to 0.4269 ± 0.0272 (122%). This gain was accompanied by a modest decrease in the temporal benchmark (13%). SNA increases in drug-screening performance were consistent for classification and regression tasks and outperformed y-randomized controls. Our results highlight where data and feature uncertainty may be problematic and how leveraging uncertainty into training improves predictions of drug-target relationships.
Collapse
Affiliation(s)
- Elena L Cáceres
- Department of Pharmaceutical Chemistry, Department of Bioengineering and Therapeutic Sciences, Bakar Computational Health Sciences Institute, Kavli Institute for Fundamental Neuroscience, Institute for Neurodegenerative Diseases, University of California, San Francisco, 675 Nelson Rising Ln NS 416A, San Francisco, California 94143, United States
| | - Nicholas C Mew
- Department of Pharmaceutical Chemistry, Department of Bioengineering and Therapeutic Sciences, Bakar Computational Health Sciences Institute, Kavli Institute for Fundamental Neuroscience, Institute for Neurodegenerative Diseases, University of California, San Francisco, 675 Nelson Rising Ln NS 416A, San Francisco, California 94143, United States
| | - Michael J Keiser
- Department of Pharmaceutical Chemistry, Department of Bioengineering and Therapeutic Sciences, Bakar Computational Health Sciences Institute, Kavli Institute for Fundamental Neuroscience, Institute for Neurodegenerative Diseases, University of California, San Francisco, 675 Nelson Rising Ln NS 416A, San Francisco, California 94143, United States
| |
Collapse
|
28
|
Tang W, Chen J, Hong H. Development of classification models for predicting inhibition of mitochondrial fusion and fission using machine learning methods. Chemosphere 2020; 273:128567. [PMID: 34756375 DOI: 10.1016/j.chemosphere.2020.128567] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 10/03/2020] [Accepted: 10/06/2020] [Indexed: 06/13/2023]
Abstract
Mitochondrial fusion and fission are processes to maintain mitochondrial function when cells respond to environment stresses. Disruption of mitochondrial fusion and fission influences cell health and can cause adverse events such as neurodegenerative disorders. It is critical to identify environmental chemicals that can disrupt mitochondrial fusion and fission. However, experimentally testing all the chemicals is not practical because experimental methods are time-consuming and costly. Quantitative structure-activity relationship (QSAR) modeling is an attractive approach for evaluation of chemicals disrupting potential on mitochondrial fusion and fission. In this study, QSAR models were developed for differentiating chemicals capable of inhibition of mitochondrial fusion and fission using machine learning algorithms (i.e. random forest, logistic regression, Bernoulli naive Bayes, and deep neural network). One hundred iterations of five-fold cross validations and external validations showed that the best model on mitochondrial fusion had area under the receiver operating characteristic curve (AUC) of 82.8% and 78.1%, respectively; and the best model for mitochondrial fission yielded AUC of 84.3% and 97.5%, respectively. Furthermore, 45 and 56 structural alerts were identified for inhibition of mitochondrial fusion and fission, respectively. The results demonstrated that the models and the structural alerts could be useful for screening chemicals that inhibit mitochondrial fusion and fission.
Collapse
Affiliation(s)
- Weihao Tang
- Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Dalian, 116024, China
| | - Jingwen Chen
- Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Dalian, 116024, China.
| | - Huixiao Hong
- National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Rd, Jefferson, AR, 72079, USA.
| |
Collapse
|
29
|
Stolbov L, Druzhilovskiy D, Rudik A, Filimonov D, Poroikov V, Nicklaus M. AntiHIV-Pred: web-resource for in silico prediction of anti-HIV/AIDS activity. Bioinformatics 2020; 36:978-979. [PMID: 31418763 DOI: 10.1093/bioinformatics/btz638] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Revised: 07/15/2019] [Accepted: 08/15/2019] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION Identification of new molecules promising for treatment of HIV-infection and HIV-associated disorders remains an important task in order to provide safer and more effective therapies. Utilization of prior knowledge by application of computer-aided drug discovery approaches reduces time and financial expenses and increases the chances of positive results in anti-HIV R&D. To provide the scientific community with a tool that allows estimating of potential agents for treatment of HIV-infection and its comorbidities, we have created a freely-available web-resource for prediction of relevant biological activities based on the structural formulae of drug-like molecules. RESULTS Over 50 000 experimental records for anti-retroviral agents from ChEMBL database were extracted for creating the training sets. After careful examination, about seven thousand molecules inhibiting five HIV-1 proteins were used to develop regression and classification models with the GUSAR software. The average values of R2 = 0.95 and Q2 = 0.72 in validation procedure demonstrated the reasonable accuracy and predictivity of the obtained (Q)SAR models. Prediction of 81 biological activities associated with the treatment of HIV-associated comorbidities with 92% mean accuracy was realized using the PASS program. AVAILABILITY AND IMPLEMENTATION Freely available on the web at http://www.way2drug.com/hiv/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Leonid Stolbov
- Laboratory for Structure-Function Based Drug Design, Institute of Biomedical Chemistry, Moscow 119121, Russia
| | - Dmitry Druzhilovskiy
- Laboratory for Structure-Function Based Drug Design, Institute of Biomedical Chemistry, Moscow 119121, Russia
| | - Anastasia Rudik
- Laboratory for Structure-Function Based Drug Design, Institute of Biomedical Chemistry, Moscow 119121, Russia
| | - Dmitry Filimonov
- Laboratory for Structure-Function Based Drug Design, Institute of Biomedical Chemistry, Moscow 119121, Russia
| | - Vladimir Poroikov
- Laboratory for Structure-Function Based Drug Design, Institute of Biomedical Chemistry, Moscow 119121, Russia
| | - Marc Nicklaus
- CADD Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, NCI-Frederick, Frederick, MD 21702, USA
| |
Collapse
|
30
|
Tinkov OV, Grigorev VY, Razdolsky AN, Grigoryeva LD, Dearden JC. Effect of the structural factors of organic compounds on the acute toxicity toward Daphnia magna. SAR QSAR Environ Res 2020; 31:615-641. [PMID: 32713201 DOI: 10.1080/1062936x.2020.1791250] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Accepted: 06/30/2020] [Indexed: 06/11/2023]
Abstract
The acute toxicity of organic compounds towards Daphina magna was subjected to QSAR analysis. The two-dimensional simplex representation of molecular structure (2D SiRMS) and the support vector machine (SVM), gradient boosting (GBM) methods were used to develop QSAR models. Adequate regression QSAR models were developed for incubation of 24 h. Their interpretation allowed us to quantitatively describe and rank the well-known toxicophores, to refine their molecular surroundings, and to distinguish the structural derivatives of the fragments that significantly contribute to the acute toxicity (LC50) of organic compounds towards D. magna. Based on the results of the interpretation of the regression models, a molecular design (modification) of highly toxic compounds was performed in order to reduce their hazard. In addition, acceptable classification QSAR models were developed to reliably predict the following mode of action (MOA): specific and non-specific toxicity of organic compounds towards D. magna. When interpreting these models, we were able to determine the structural fragments and the physicochemical characteristics of molecules that are responsible for the manifestation of one of the modes of action. The on-line version of the OCHEM expert system (https://ochem.eu), HYBOT descriptors, and the random forest and SVM methods were used for a comparative QSAR investigation.
Collapse
Affiliation(s)
- O V Tinkov
- Department of Computer Science, Military Institute of the Ministry of Defense , Tiraspol, Moldova
| | - V Y Grigorev
- Department of Computer-aided Molecular Design, Institute of Physiologically Active Compounds of the Russian Academy of Science , Chernogolovka, Russia
| | - A N Razdolsky
- Department of Computer-aided Molecular Design, Institute of Physiologically Active Compounds of the Russian Academy of Science , Chernogolovka, Russia
| | - L D Grigoryeva
- Department of Fundamental Physicochemical Engineering, Moscow State University , Moscow, Russia
| | - J C Dearden
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University , Liverpool, UK
| |
Collapse
|
31
|
Tang W, Chen J, Hong H. Discriminant models on mitochondrial toxicity improved by consensus modeling and resolving imbalance in training. Chemosphere 2020; 253:126768. [PMID: 32464767 DOI: 10.1016/j.chemosphere.2020.126768] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 04/08/2020] [Accepted: 04/08/2020] [Indexed: 06/11/2023]
Abstract
Humans and animals may be exposed to tens of thousands of natural and synthetic chemicals during their lifespan. It is difficult to assess risk for all the chemicals with experimental toxicity tests. An alternative approach is to use computational toxicology methods such as quantitative structure-activity relationship (QSAR) modeling. Mitochondrial toxicity is involved in many diseases such as cancer, neurodegeneration, type 2 diabetes, cardiovascular diseases and autoimmune diseases. Thus, it is important to rapidly and efficiently identify chemicals with mitochondrial toxicity. In this study, five machine learning algorithms and twelve types of molecular fingerprints were employed to generate QSAR discriminant models for mitochondrial toxicity. A threshold moving method was adopted to resolve the imbalance issue in the training data. Consensus of the models by an averaging probability strategy improved prediction performance. The best model has correct classification rates of 81.8% and 88.3% in ten-fold cross validation and external validation, respectively. Substructures such as phenol, carboxylic acid, nitro and arylchloride were found informative through analysis of information gain and frequency of substructures. The results demonstrate that resolving imbalance in training and building consensus models can improve classification rates for mitochondrial toxicity prediction.
Collapse
Affiliation(s)
- Weihao Tang
- Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Dalian, 116024, China
| | - Jingwen Chen
- Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Dalian, 116024, China.
| | - Huixiao Hong
- National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Rd, Jefferson, AR, 72079, USA
| |
Collapse
|
32
|
Affiliation(s)
- Selçuk Korkmaz
- Trakya University Faculty of Medicine, Department of Biostatistics and Medical Informatics, Edirne, Turkey
| |
Collapse
|
33
|
Hosny A, Ashton M, Gong Y, McGarry K. The development of a predictive model to identify potential HIV-1 attachment inhibitors. Comput Biol Med 2020; 120:103743. [DOI: 10.1016/j.compbiomed.2020.103743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Revised: 04/01/2020] [Accepted: 04/01/2020] [Indexed: 10/24/2022]
|
34
|
Affiliation(s)
- Hongjian Li
- Cancer Research Center of Marseille (INSERM U1068, Institut Paoli‐Calmettes, Aix‐Marseille Université UM105, CNRS UMR7258) Marseille France
- CUHK‐SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences Chinese University of Hong Kong Shatin Hong Kong
| | - Kam‐Heung Sze
- CUHK‐SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences Chinese University of Hong Kong Shatin Hong Kong
| | - Gang Lu
- CUHK‐SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences Chinese University of Hong Kong Shatin Hong Kong
| | - Pedro J. Ballester
- Cancer Research Center of Marseille (INSERM U1068, Institut Paoli‐Calmettes, Aix‐Marseille Université UM105, CNRS UMR7258) Marseille France
| |
Collapse
|
35
|
Shah P, Siramshetty VB, Zakharov AV, Southall NT, Xu X, Nguyen DT. Predicting liver cytosol stability of small molecules. J Cheminform 2020; 12:21. [PMID: 33431020 PMCID: PMC7140498 DOI: 10.1186/s13321-020-00426-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 03/25/2020] [Indexed: 01/28/2023] Open
Abstract
Over the last few decades, chemists have become skilled at designing compounds that avoid cytochrome P (CYP) 450 mediated metabolism. Typical screening assays are performed in liver microsomal fractions and it is possible to overlook the contribution of cytosolic enzymes until much later in the drug discovery process. Few data exist on cytosolic enzyme-mediated metabolism and no reliable tools are available to chemists to help design away from such liabilities. In this study, we screened 1450 compounds for liver cytosol-mediated metabolic stability and extracted transformation rules that might help medicinal chemists in optimizing compounds with these liabilities. In vitro half-life data were collected by performing in-house experiments in mouse (CD-1 male) and human (mixed gender) cytosol fractions. Matched molecular pairs analysis was performed in conjunction with qualitative-structure activity relationship modeling to identify chemical structure transformations affecting cytosolic stability. The transformation rules were prospectively validated on the test set. In addition, selected rules were validated on a diverse chemical library and the resulting pairs were experimentally tested to confirm whether the identified transformations could be generalized. The validation results, comprising nearly 250 library compounds and corresponding half-life data, are made publicly available. The datasets were also used to generate in silico classification models, based on different molecular descriptors and machine learning methods, to predict cytosol-mediated liabilities. To the best of our knowledge, this is the first systematic in silico effort to address cytosolic enzyme-mediated liabilities.
Collapse
Affiliation(s)
- Pranav Shah
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), 9800 Medical Center Drive, Rockville, MD, 20850, USA
| | - Vishal B Siramshetty
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), 9800 Medical Center Drive, Rockville, MD, 20850, USA
| | - Alexey V Zakharov
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), 9800 Medical Center Drive, Rockville, MD, 20850, USA
| | - Noel T Southall
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), 9800 Medical Center Drive, Rockville, MD, 20850, USA
| | - Xin Xu
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), 9800 Medical Center Drive, Rockville, MD, 20850, USA
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), 9800 Medical Center Drive, Rockville, MD, 20850, USA.
| |
Collapse
|
36
|
Shin HK, Kang MG, Park D, Park T, Yoon S. Development of Prediction Models for Drug-Induced Cholestasis, Cirrhosis, Hepatitis, and Steatosis Based on Drug and Drug Metabolite Structures. Front Pharmacol 2020; 11:67. [PMID: 32116729 PMCID: PMC7034408 DOI: 10.3389/fphar.2020.00067] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Accepted: 01/23/2020] [Indexed: 12/18/2022] Open
Abstract
Drug-induced liver injury (DILI) is one of the major reasons for termination of drug development. Due to the importance of predicting DILI in early phases of drug development, diverse in silico models have been developed to filter out DILI-causing candidates before clinical study. However, no computational models have achieved sufficient prediction power for screening DILI in early phases because 1) drugs often cause liver injury through reactive metabolites, 2) different clinical outcomes of DILI have different mechanisms, and 3) the DILI label on drugs is not clearly defined. In this study, we developed binary classification models to predict drug-induced cholestasis, cirrhosis, hepatitis, and steatosis based on the structure of drugs and their metabolites. DILI-positive data was obtained from post-market reports of drugs and DILI-negative data from DILIrank, a database curated by the Food and Drug Administration (FDA). Support vector machine (SVM) and random forest (RF) were used in developing models with nine fingerprints and one 2D molecular descriptor calculated from drug (152 DILI-positives and 102 DILI-negatives) and drug metabolite (192 DILI-positives and 126 DILI-negatives) structures. Models were developed according to Organisation for Economic Co-operation and Development (OECD) guidelines for quantitative structure-activity relationship (QSAR) validation. Internal and external validation was performed with a randomization test in order to thoroughly examine model predictability and avoid random correlation between structural features and adverse outcomes. The applicability domain was defined with a leverage method for reliable prediction of new chemicals. The best models for each liver disease were selected based on external validation results from drugs (cholestasis: 70%, cirrhosis: 90%, hepatitis: 83%, and steatosis: 85%) and drug metabolites (cholestasis: 86%, cirrhosis: 88%, hepatitis: 86%, and steatosis: 83%) with applicability domain analysis. Compiled data sets were further exploited to derive privileged substructures that were more frequent in DILI-positive sets compared to DILI-negative sets and in drug metabolite structures compared to drug structures with a Morgan fingerprint level 2.
Collapse
Affiliation(s)
- Hyun Kil Shin
- Toxicoinformatics Group, Department of Predictive Toxicology, Korea Institute of Toxicology, Daejeon, South Korea
| | - Myung-Gyun Kang
- Toxicoinformatics Group, Department of Predictive Toxicology, Korea Institute of Toxicology, Daejeon, South Korea
| | - Daeui Park
- Toxicoinformatics Group, Department of Predictive Toxicology, Korea Institute of Toxicology, Daejeon, South Korea.,Department of Human and Environmental Toxicology, University of Science and Technology, Daejeon, South Korea
| | - Tamina Park
- Toxicoinformatics Group, Department of Predictive Toxicology, Korea Institute of Toxicology, Daejeon, South Korea.,Department of Human and Environmental Toxicology, University of Science and Technology, Daejeon, South Korea
| | - Seokjoo Yoon
- Toxicoinformatics Group, Department of Predictive Toxicology, Korea Institute of Toxicology, Daejeon, South Korea.,Department of Human and Environmental Toxicology, University of Science and Technology, Daejeon, South Korea
| |
Collapse
|
37
|
Sabando MV, Ponzoni I, Soto AJ. Neural-based approaches to overcome feature selection and applicability domain in drug-related property prediction. Appl Soft Comput 2019; 85:105777. [DOI: 10.1016/j.asoc.2019.105777] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
38
|
Yang M, Tao B, Chen C, Jia W, Sun S, Zhang T, Wang X. Machine Learning Models Based on Molecular Fingerprints and an Extreme Gradient Boosting Method Lead to the Discovery of JAK2 Inhibitors. J Chem Inf Model 2019; 59:5002-5012. [DOI: 10.1021/acs.jcim.9b00798] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Affiliation(s)
- Minjian Yang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P.R. China
- Joint Laboratory of Artificial Intelligence of the Institute of Materia Medica and Yuan Qi Zhi Yao, Beijing 100050, P.R. China
| | - Bingzhong Tao
- Joint Laboratory of Artificial Intelligence of the Institute of Materia Medica and Yuan Qi Zhi Yao, Beijing 100050, P.R. China
| | - Chengjuan Chen
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P.R. China
| | - Wenqiang Jia
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P.R. China
| | - Shaolei Sun
- Joint Laboratory of Artificial Intelligence of the Institute of Materia Medica and Yuan Qi Zhi Yao, Beijing 100050, P.R. China
| | - Tiantai Zhang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P.R. China
| | - Xiaojian Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P.R. China
- Joint Laboratory of Artificial Intelligence of the Institute of Materia Medica and Yuan Qi Zhi Yao, Beijing 100050, P.R. China
| |
Collapse
|
39
|
Sadawi N, Olier I, Vanschoren J, van Rijn JN, Besnard J, Bickerton R, Grosan C, Soldatova L, King RD. Multi-task learning with a natural metric for quantitative structure activity relationship learning. J Cheminform 2019; 11:68. [PMID: 33430958 PMCID: PMC6852942 DOI: 10.1186/s13321-019-0392-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Accepted: 11/04/2019] [Indexed: 11/24/2022] Open
Abstract
The goal of quantitative structure activity relationship (QSAR) learning is to learn a function that, given the structure of a small molecule (a potential drug), outputs the predicted activity of the compound. We employed multi-task learning (MTL) to exploit commonalities in drug targets and assays. We used datasets containing curated records about the activity of specific compounds on drug targets provided by ChEMBL. Totally, 1091 assays have been analysed. As a baseline, a single task learning approach that trains random forest to predict drug activity for each drug target individually was considered. We then carried out feature-based and instance-based MTL to predict drug activities. We introduced a natural metric of evolutionary distance between drug targets as a measure of tasks relatedness. Instance-based MTL significantly outperformed both, feature-based MTL and the base learner, on 741 drug targets out of 1091. Feature-based MTL won on 179 occasions and the base learner performed best on 171 drug targets. We conclude that MTL QSAR is improved by incorporating the evolutionary distance between targets. These results indicate that QSAR learning can be performed effectively, even if little data is available for specific drug targets, by leveraging what is known about similar drug targets.
Collapse
Affiliation(s)
- Noureddin Sadawi
- Department of Medicine, Imperial College London, London, UK
- Brunel University London, London, UK
| | - Ivan Olier
- Department of Applied Mathematics, Liverpool John Moores University, Liverpool, UK
| | | | | | - Jeremy Besnard
- University of Dundee, Dundee, Dundee, UK
- Ex Scientia Ltd, Dundee, UK
| | | | | | - Larisa Soldatova
- Brunel University London, London, UK
- Goldsmiths, University of London, London, UK
| | | |
Collapse
|
40
|
Zakharov AV, Zhao T, Nguyen DT, Peryea T, Sheils T, Yasgar A, Huang R, Southall N, Simeonov A. Novel Consensus Architecture To Improve Performance of Large-Scale Multitask Deep Learning QSAR Models. J Chem Inf Model 2019; 59:4613-4624. [PMID: 31584270 DOI: 10.1021/acs.jcim.9b00526] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Advances in the development of high-throughput screening and automated chemistry have rapidly accelerated the production of chemical and biological data, much of them freely accessible through literature aggregator services such as ChEMBL and PubChem. Here, we explore how to use this comprehensive mapping of chemical biology space to support the development of large-scale quantitative structure-activity relationship (QSAR) models. We propose a new deep learning consensus architecture (DLCA) that combines consensus and multitask deep learning approaches together to generate large-scale QSAR models. This method improves knowledge transfer across different target/assays while also integrating contributions from models based on different descriptors. The proposed approach was validated and compared with proteochemometrics, multitask deep learning, and Random Forest methods paired with various descriptors types. DLCA models demonstrated improved prediction accuracy for both regression and classification tasks. The best models together with their modeling sets are provided through publicly available web services at https://predictor.ncats.io .
Collapse
Affiliation(s)
- Alexey V Zakharov
- National Center for Advancing Translational Sciences (NCATS) , National Institutes of Health , 9800 Medical Center Drive , Rockville , Maryland 20850 , United States
| | - Tongan Zhao
- National Center for Advancing Translational Sciences (NCATS) , National Institutes of Health , 9800 Medical Center Drive , Rockville , Maryland 20850 , United States
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Sciences (NCATS) , National Institutes of Health , 9800 Medical Center Drive , Rockville , Maryland 20850 , United States
| | - Tyler Peryea
- National Center for Advancing Translational Sciences (NCATS) , National Institutes of Health , 9800 Medical Center Drive , Rockville , Maryland 20850 , United States
| | - Timothy Sheils
- National Center for Advancing Translational Sciences (NCATS) , National Institutes of Health , 9800 Medical Center Drive , Rockville , Maryland 20850 , United States
| | - Adam Yasgar
- National Center for Advancing Translational Sciences (NCATS) , National Institutes of Health , 9800 Medical Center Drive , Rockville , Maryland 20850 , United States
| | - Ruili Huang
- National Center for Advancing Translational Sciences (NCATS) , National Institutes of Health , 9800 Medical Center Drive , Rockville , Maryland 20850 , United States
| | - Noel Southall
- National Center for Advancing Translational Sciences (NCATS) , National Institutes of Health , 9800 Medical Center Drive , Rockville , Maryland 20850 , United States
| | - Anton Simeonov
- National Center for Advancing Translational Sciences (NCATS) , National Institutes of Health , 9800 Medical Center Drive , Rockville , Maryland 20850 , United States
| |
Collapse
|
41
|
da Silva Rocha SF, Olanda CG, Fokoue HH, Sant'Anna CM. Virtual Screening Techniques in Drug Discovery: Review and Recent Applications. Curr Top Med Chem 2019; 19:1751-1767. [DOI: 10.2174/1568026619666190816101948] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 06/21/2019] [Accepted: 07/29/2019] [Indexed: 11/22/2022]
Abstract
The discovery of bioactive molecules is an expensive and time-consuming process and new
strategies are continuously searched for in order to optimize this process. Virtual Screening (VS) is one
of the recent strategies that has been explored for the identification of candidate bioactive molecules.
The number of new techniques and software that can be applied in this strategy has grown considerably
in recent years, so, before their use, it is necessary to understand the basics an also the limitations behind
each one to get the most out of them. It is also necessary to assess the real contributions of this strategy
so that more significant progress can be made in the future. In this context, this review aims to discuss
some important points related to VS, including the use of virtual ligand and biotarget libraries, structurebased
and ligand-based VS techniques, as well as to present recent cases where this strategy was successfully
applied.
Collapse
Affiliation(s)
- Sheisi F.L. da Silva Rocha
- Programa de Pos-Graduacao em Quimica, Instituto de Quimica, Universidade Federal Rural do Rio de Janeiro, Seropedica, Brazil
| | - Carolina G. Olanda
- Programa de Pos-Graduacao em Quimica, Instituto de Quimica, Universidade Federal Rural do Rio de Janeiro, Seropedica, Brazil
| | - Harold H. Fokoue
- Laboratorio de Avaliacao e Síntese de Substancias Bioativas (LASSBio), Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Carlos M.R. Sant'Anna
- Programa de Pos-Graduacao em Quimica, Instituto de Quimica, Universidade Federal Rural do Rio de Janeiro, Seropedica, Brazil
| |
Collapse
|
42
|
Andrade CH, Neves BJ, Melo-Filho CC, Rodrigues J, Silva DC, Braga RC, Cravo PVL. In Silico Chemogenomics Drug Repositioning Strategies for Neglected Tropical Diseases. Curr Med Chem 2019. [DOI: 10.2174/0929867325666180309114824] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Only ~1% of all drug candidates against Neglected Tropical Diseases (NTDs)
have reached clinical trials in the last decades, underscoring the need for new, safe and effective
treatments. In such context, drug repositioning, which allows finding novel indications
for approved drugs whose pharmacokinetic and safety profiles are already known,
emerging as a promising strategy for tackling NTDs. Chemogenomics is a direct descendent
of the typical drug discovery process that involves the systematic screening of chemical
compounds against drug targets in high-throughput screening (HTS) efforts, for the identification
of lead compounds. However, different to the one-drug-one-target paradigm, chemogenomics
attempts to identify all potential ligands for all possible targets and diseases. In
this review, we summarize current methodological development efforts in drug repositioning
that use state-of-the-art computational ligand- and structure-based chemogenomics approaches.
Furthermore, we highlighted the recent progress in computational drug repositioning
for some NTDs, based on curation and modeling of genomic, biological, and chemical data.
Additionally, we also present in-house and other successful examples and suggest possible solutions
to existing pitfalls.
Collapse
Affiliation(s)
- Carolina Horta Andrade
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Goiania, GO, 74605-170, Brazil
| | - Bruno Junior Neves
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Goiania, GO, 74605-170, Brazil
| | - Cleber Camilo Melo-Filho
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Goiania, GO, 74605-170, Brazil
| | - Juliana Rodrigues
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Goiania, GO, 74605-170, Brazil
| | - Diego Cabral Silva
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Goiania, GO, 74605-170, Brazil
| | - Rodolpho Campos Braga
- LabMol - Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Federal University of Goias, Goiania, GO, 74605-170, Brazil
| | - Pedro Vitor Lemos Cravo
- Laboratory of Cheminformatics, Centro Universitario de Anapolis (UniEVANGELICA), Anapolis, GO, 75083-515, Brazil
| |
Collapse
|
43
|
Rodríguez-Pérez R, Bajorath J. Interpretation of Compound Activity Predictions from Complex Machine Learning Models Using Local Approximations and Shapley Values. J Med Chem 2019; 63:8761-8777. [PMID: 31512867 DOI: 10.1021/acs.jmedchem.9b01101] [Citation(s) in RCA: 128] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
In qualitative or quantitative studies of structure-activity relationships (SARs), machine learning (ML) models are trained to recognize structural patterns that differentiate between active and inactive compounds. Understanding model decisions is challenging but of critical importance to guide compound design. Moreover, the interpretation of ML results provides an additional level of model validation based on expert knowledge. A number of complex ML approaches, especially deep learning (DL) architectures, have distinctive black-box character. Herein, a locally interpretable explanatory method termed Shapley additive explanations (SHAP) is introduced for rationalizing activity predictions of any ML algorithm, regardless of its complexity. Models resulting from random forest (RF), nonlinear support vector machine (SVM), and deep neural network (DNN) learning are interpreted, and structural patterns determining the predicted probability of activity are identified and mapped onto test compounds. The results indicate that SHAP has high potential for rationalizing predictions of complex ML models.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany.,Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riß, Germany
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany
| |
Collapse
|
44
|
Chakravarti SK, Alla SRM. Descriptor Free QSAR Modeling Using Deep Learning With Long Short-Term Memory Neural Networks. Front Artif Intell 2019; 2:17. [PMID: 33733106 PMCID: PMC7861338 DOI: 10.3389/frai.2019.00017] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Accepted: 08/22/2019] [Indexed: 12/15/2022] Open
Abstract
Current practice of building QSAR models usually involves computing a set of descriptors for the training set compounds, applying a descriptor selection algorithm and finally using a statistical fitting method to build the model. In this study, we explored the prospects of building good quality interpretable QSARs for big and diverse datasets, without using any pre-calculated descriptors. We have used different forms of Long Short-Term Memory (LSTM) neural networks to achieve this, trained directly using either traditional SMILES codes or a new linear molecular notation developed as part of this work. Three endpoints were modeled: Ames mutagenicity, inhibition of P. falciparum Dd2 and inhibition of Hepatitis C Virus, with training sets ranging from 7,866 to 31,919 compounds. To boost the interpretability of the prediction results, attention-based machine learning mechanism, jointly with a bidirectional LSTM was used to detect structural alerts for the mutagenicity data set. Traditional fragment descriptor-based models were used for comparison. As per the results of the external and cross-validation experiments, overall prediction accuracies of the LSTM models were close to the fragment-based models. However, LSTM models were superior in predicting test chemicals that are dissimilar to the training set compounds, a coveted quality of QSAR models in real world applications. In summary, it is possible to build QSAR models using LSTMs without using pre-computed traditional descriptors, and models are far from being “black box.” We wish that this study will be helpful in bringing large, descriptor-less QSARs to mainstream use.
Collapse
|
45
|
Abstract
Artificial intelligence (AI), and, in particular, deep learning as a subcategory of AI, provides opportunities for the discovery and development of innovative drugs. Various machine learning approaches have recently (re)emerged, some of which may be considered instances of domain-specific AI which have been successfully employed for drug discovery and design. This review provides a comprehensive portrayal of these machine learning techniques and of their applications in medicinal chemistry. After introducing the basic principles, alongside some application notes, of the various machine learning algorithms, the current state-of-the art of AI-assisted pharmaceutical discovery is discussed, including applications in structure- and ligand-based virtual screening, de novo drug design, physicochemical and pharmacokinetic property prediction, drug repurposing, and related aspects. Finally, several challenges and limitations of the current methods are summarized, with a view to potential future directions for AI-assisted drug discovery and design.
Collapse
Affiliation(s)
- Xin Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Yifei Wang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Ryan Byrne
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Gisbert Schneider
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Shengyong Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| |
Collapse
|
46
|
Ponzoni I, Sebastián-Pérez V, Martínez MJ, Roca C, De la Cruz Pérez C, Cravero F, Vazquez GE, Páez JA, Díaz MF, Campillo NE. QSAR Classification Models for Predicting the Activity of Inhibitors of Beta-Secretase (BACE1) Associated with Alzheimer's Disease. Sci Rep 2019; 9:9102. [PMID: 31235739 DOI: 10.1038/s41598-019-45522-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 05/30/2019] [Indexed: 12/27/2022] Open
Abstract
Alzheimer’s disease is one of the most common neurodegenerative disorders in elder population. The β-site amyloid cleavage enzyme 1 (BACE1) is the major constituent of amyloid plaques and plays a central role in this brain pathogenesis, thus it constitutes an auspicious pharmacological target for its treatment. In this paper, a QSAR model for identification of potential inhibitors of BACE1 protein is designed by using classification methods. For building this model, a database with 215 molecules collected from different sources has been assembled. This dataset contains diverse compounds with different scaffolds and physical-chemical properties, covering a wide chemical space in the drug-like range. The most distinctive aspect of the applied QSAR strategy is the combination of hybridization with backward elimination of models, which contributes to improve the quality of the final QSAR model. Another relevant step is the visual analysis of the molecular descriptors that allows guaranteeing the absence of information redundancy in the model. The QSAR model performances have been assessed by traditional metrics, and the final proposed model has low cardinality, and reaches a high percentage of chemical compounds correctly classified.
Collapse
|
47
|
He L, Xiao K, Zhou C, Li G, Yang H, Li Z, Cheng J. Insights into pesticide toxicity against aquatic organism: QSTR models on Daphnia Magna. Ecotoxicol Environ Saf 2019; 173:285-292. [PMID: 30776561 DOI: 10.1016/j.ecoenv.2019.02.014] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2018] [Revised: 01/30/2019] [Accepted: 02/04/2019] [Indexed: 06/09/2023]
Abstract
The toxicities of agrochemicals to non-target aquatic organisms are key items in chemical ecological risk assessment. However, it is still an urgent need to develop new tools to assess the agrochemical aquatic toxicity efficiently and accurately. In this work, QSTR studies were performed on a data set containing 639 diverse pesticides with measured EC50 toxicity against Daphnia magna, by using five machine learning methods combined with seven fingerprints and a set of molecular descriptors. The imbalance problem of the data set was successfully solved by clustering analysis. The top-10 QSTR models displayed greater predicative abilities than ECOSAR. The optimal model, Ext-SVM, showed the best performance in 10-fold cross validation (Qhigh=0.807, Qmoderate=0.806, Qlow=0.755, Qtotal=0.794), and also in the test set verification (Qhigh=0.865, Qmoderate=0.783, Qlow=0.931, Qtotal=0.848). The relevance of the key physical-chemical properties with the toxicity was also investigated, in which the MW, a_np, logP(o/w), GCUT_SLOGP_1, chilv and SMR_VSA7 values displayed positive correlation with Daphnia magna toxicity, whereas the logS and a_don showed negative correlation. The robust QSTR models provided efficient tools for assessing agrochemical aquatic toxicity, and the revealed different physical-chemical properties between the high and low toxic compounds might be useful in the discovery and design of low aquatic toxic pesticides.
Collapse
Affiliation(s)
- Lujue He
- Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China; Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Keya Xiao
- Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Cong Zhou
- Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China; Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Guanglong Li
- Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Hongbin Yang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Zhong Li
- Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Jiagao Cheng
- Shanghai Key Laboratory of Chemical Biology, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China; Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China.
| |
Collapse
|
48
|
Cui X, Liu J, Zhang J, Wu Q, Li X. In silico prediction of drug‐induced rhabdomyolysis with machine‐learning models and structural alerts. J Appl Toxicol 2019; 39:1224-1232. [DOI: 10.1002/jat.3808] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2018] [Revised: 03/13/2019] [Accepted: 03/17/2019] [Indexed: 12/21/2022]
Affiliation(s)
- Xueyan Cui
- Department of Pharmacy, Shandong Provincial Qianfoshan HospitalShandong University Jinan China
| | - Juan Liu
- Department of Pharmacy, Shandong Provincial Qianfoshan HospitalShandong University Jinan China
| | - Jinfeng Zhang
- Department of Pharmacy, Shandong Provincial Qianfoshan HospitalShandong University Jinan China
| | - Qiuyun Wu
- Department of Pharmacy, Shandong Provincial Qianfoshan HospitalShandong University Jinan China
| | - Xiao Li
- Department of Pharmacy, Shandong Provincial Qianfoshan HospitalShandong University Jinan China
| |
Collapse
|
49
|
Ciallella HL, Zhu H. Advancing Computational Toxicology in the Big Data Era by Artificial Intelligence: Data-Driven and Mechanism-Driven Modeling for Chemical Toxicity. Chem Res Toxicol 2019; 32:536-547. [PMID: 30907586 DOI: 10.1021/acs.chemrestox.8b00393] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
In 2016, the Frank R. Lautenberg Chemical Safety for the 21st Century Act became the first US legislation to advance chemical safety evaluations by utilizing novel testing approaches that reduce the testing of vertebrate animals. Central to this mission is the advancement of computational toxicology and artificial intelligence approaches to implementing innovative testing methods. In the current big data era, the terms volume (amount of data), velocity (growth of data), and variety (the diversity of sources) have been used to characterize the currently available chemical, in vitro, and in vivo data for toxicity modeling purposes. Furthermore, as suggested by various scientists, the variability (internal consistency or lack thereof) of publicly available data pools, such as PubChem, also presents significant computational challenges. The development of novel artificial intelligence approaches based on public massive toxicity data is urgently needed to generate new predictive models for chemical toxicity evaluations and make the developed models applicable as alternatives for evaluating untested compounds. In this procedure, traditional approaches (e.g., QSAR) purely based on chemical structures have been replaced by newly designed data-driven and mechanism-driven modeling. The resulting models realize the concept of adverse outcome pathway (AOP), which can not only directly evaluate toxicity potentials of new compounds, but also illustrate relevant toxicity mechanisms. The recent advancement of computational toxicology in the big data era has paved the road to future toxicity testing, which will significantly impact on the public health.
Collapse
|
50
|
Klimenko K, Rosenberg SA, Dybdahl M, Wedebye EB, Nikolov NG. QSAR modelling of a large imbalanced aryl hydrocarbon activation dataset by rational and random sampling and screening of 80,086 REACH pre-registered and/or registered substances. PLoS One 2019; 14:e0213848. [PMID: 30870500 DOI: 10.1371/journal.pone.0213848] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Accepted: 03/01/2019] [Indexed: 12/02/2022] Open
Abstract
The Aryl hydrocarbon receptor (AhR) plays important roles in many normal and pathological physiological processes, including endocrine homeostasis, foetal development, cell cycle regulation, cellular oxidation/antioxidation, immune regulation, metabolism of endogenous and exogenous substances, and carcinogenesis. An experimental data set for human in vitro AhR activation comprising 324,858 substances, of which 1,982 were confirmed actives, was used to test an in-house-developed approach to rationally select Quantitative Structure-Activity Relationship (QSAR) training set substances from an unbalanced data set. In the first iteration, active and inactive substances were selected by random to make QSAR models. Then, more inactive substances were added to the training set in two further iterations based on incorrect or out-of-domain predictions to produce larger models. The resulting ‘rational’ model, comprising 832 actives and four times as many inactives, i.e. 3,328, was compared to a model with a training set of same size and proportion of inactives chosen entirely by random. Both models underwent robust cross-validation and external validation showing good statistical performance, with the rational model having external validation sensitivity of 85.1% and specificity of 97.1%, compared to the random model with sensitivity 89.1% and specificity 91.3%. Furthermore, we integrated the training sets for both models with the 93 external validation test set actives and 372 randomly selected inactives to make two final models. They also underwent external validations for specificity and cross-validations, which confirmed that good predictivity was maintained. All developed models were applied to predict 80,086 EU REACH substances. The rational and random final models had 63.1% and 56.9% coverage of the REACH set, respectively, and predicted 1,256 and 3,214 substances as actives. The final models as well as predictions for AhR activation for 650,000 substances will be published in the Danish (Q)SAR Database and can, for example, be used for priority setting, in read-across predictions and in weight-of-evidence assessments of chemicals.
Collapse
|