1
|
Ulrich N, Voigt K, Kudria A, Böhme A, Ebert RU. Prediction of the water solubility by a graph convolutional-based neural network on a highly curated dataset. J Cheminform 2025; 17:55. [PMID: 40259418 PMCID: PMC12012962 DOI: 10.1186/s13321-025-01000-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2024] [Accepted: 03/30/2025] [Indexed: 04/23/2025] Open
Abstract
Water solubility is a relevant physico-chemcial property in environmental chemistry, toxicology, and drug design. Although the water solubility is besides the octanol-water partition coefficient, melting point, and boiling point a property with a large amount of available experimental data, there are still more compounds in the chemical universe for which information on their water solubility is lacking. Thus, prediction tools with a broad application domain are needed to fill the corresponding data gaps. To this end, we developed a graph convolutional neural network model (GNN) to predict the water solubility in the form of log Sw based on a highly curated dataset of 9800 chemicals. We started our model development with a curation workflow of the AqSolDB data, ending with 7605 data points. We added 2195 chemicals with experimental data, which we found in the literature, to our dataset. In the final dataset, log Sw values range from - 13.17 to 0.50. Higher values were excluded by a cut-off introduced to eliminate fully miscible chemicals. We developed a consensus GNN by a fivefold split of the corresponding training set (70% of the data) and validation set (20%) and used 10% as independent test set for the evaluation of the performance of the different splits and the consensus model. By doing so, we achieved an r2 of 0.901, a q2 of 0.896, and an rmse of 0.657 on our independently selected test set, which is close to the experimental error of 0.5 to 0.6 log units. We further provide the information on the application domain and compare our performance to other existing prediction tools.Scientific contribution Based on a highly curated dataset, we developed a neural network to predict the water solubility of chemicals for a broad application domain. Data curation was done by us in a step-wise procedure, where we identified various errors in the experimental data. Based on an independent test set, we compare our prediction results to those of the available prediction models.
Collapse
Affiliation(s)
- Nadin Ulrich
- Department of Exposure Science, Helmholtz Centre for Environmental Research-UFZ, Permoserstrasse 15, 04318, Leipzig, Germany.
- PAULY, Theresienstrasse 50, 04129, Leipzig, Germany.
| | | | - Anton Kudria
- Department of Exposure Science, Helmholtz Centre for Environmental Research-UFZ, Permoserstrasse 15, 04318, Leipzig, Germany
| | - Alexander Böhme
- Department of Exposure Science, Helmholtz Centre for Environmental Research-UFZ, Permoserstrasse 15, 04318, Leipzig, Germany
| | - Ralf-Uwe Ebert
- Department of Exposure Science, Helmholtz Centre for Environmental Research-UFZ, Permoserstrasse 15, 04318, Leipzig, Germany
| |
Collapse
|
2
|
Duy HA, Srisongkram T. Protecting your skin: a highly accurate LSTM network integrating conjoint features for predicting chemical-induced skin irritation. J Cheminform 2025; 17:39. [PMID: 40148987 PMCID: PMC11951793 DOI: 10.1186/s13321-025-00980-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2024] [Accepted: 02/28/2025] [Indexed: 03/29/2025] Open
Abstract
Skin irritation is a significant adverse effect associated with chemicals and drug substances. Quantitative structure-activity relationship (QSAR) is an alternative method bypassing in vivo assay for filling data gaps in chemical risk assessment. In this study, we developed QSAR models based on recurrent neural networks (RNNs) to classify skin irritation caused by chemical compounds. We utilized chemical language notation, molecular substructures, molecular descriptors, and a combination of these features named conjoint fingerprints for model construction. A simple RNN, long short-term memory (LSTM), bidirectional long short-term memory (BiLSTM), gated recurrent units (GRU), and bidirectional gated recurrent units (BiGRU) architectures were used to build the QSAR models. We found that the LSTM and a combination of molecular fingerprints and descriptors outperformed the other models significantly with 80% accuracy, 60% MCC, and 85% AUC for the external test set evaluation. Thereby, we selected this model for generalizability testing with other test sets beyond our study, ensuring that the model can be used with other data sets. Furthermore, the applicability domain of the purposed model was developed, enabling a trustable prediction will be made for a test compound. This model was developed based on OECD guidelines for skin irritation assessment and QSAR model development, assuring compliance with all required standards. The models and source codes developed in this study are publicly available, facilitating chemical design and safety evaluation, particularly for assessing the skin irritation potential of chemicals.
Collapse
Affiliation(s)
- Huynh Anh Duy
- Graduate School in the Program of Research and Development in Pharmaceuticals, Faculty of Pharmaceutical Sciences, Khon Kaen University, Khon Kaen, 40002, Thailand
| | - Tarapong Srisongkram
- Division of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, Khon Kaen, 40002, Thailand.
| |
Collapse
|
3
|
Jaiswal R, Bhati G, Ahmed S, Siddiqi MI. iDCNNPred: an interpretable deep learning model for virtual screening and identification of PI3Ka inhibitors against triple-negative breast cancer. Mol Divers 2024:10.1007/s11030-024-11055-9. [PMID: 39648257 DOI: 10.1007/s11030-024-11055-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Accepted: 11/12/2024] [Indexed: 12/10/2024]
Abstract
Triple-negative breast cancer (TNBC) lacks estrogen, progesterone, and HER2 expression, accounting for 15-20% of breast cancer cases. It is challenging due to low therapeutic response, heterogeneity, and aggressiveness. The PI3Ka isoform is a promising therapeutic target, often hyperactivated in TNBC, contributing to uncontrolled growth and cancer cell formation. We have proposed an interpretable deep convolutional neural network prediction (iDCNNPred) system using 2D molecular images to classify bioactivity and identify potential PI3Ka inhibitors. We built Custom-DCNN models and pre-trained models such as AlexNet, SqueezeNet, and VGG19 by using the Bayesian optimization algorithm, and found that our Custom-DCNN model performed better than a pre-trained model with lower complexity and memory usage. All top-performed models were screened with the Maybridge Chemical library to find predictive hit molecules. The screened molecules were further evaluated for protein-ligand interaction with molecular docking and finally 12 promising hits were shortlisted for biological validation using in-vitro PI3K inhibition studies. After biological evaluation, 4 potent molecules with different structural moieties were identified, and these molecules present new starting scaffolds for further improvement in terms of their potency and selectivity as PI3K inhibitors with the help of medicinal chemistry efforts. Furthermore, we also showed the significance of the interpretation and visualization of the model's predictions by the Grad-CAM technique, enhancing the robustness, transparency, and interpretability of the model's predictions. The data and script files and prediction run of models used for this study to reproduce the experiment are available in the GitHub repository at https://github.com/ravishankar1307/iDCNNPred.git .
Collapse
Affiliation(s)
- Ravishankar Jaiswal
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Sector 10, Jankipuram Extension, Sitapur Road, Lucknow, 226031, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Girdhar Bhati
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Sector 10, Jankipuram Extension, Sitapur Road, Lucknow, 226031, India
| | - Shakil Ahmed
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Sector 10, Jankipuram Extension, Sitapur Road, Lucknow, 226031, India
| | - Mohammad Imran Siddiqi
- Biochemistry and Structural Biology Division, CSIR-Central Drug Research Institute, Sector 10, Jankipuram Extension, Sitapur Road, Lucknow, 226031, India.
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India.
| |
Collapse
|
4
|
Ambe K, Nakamori M, Tohno R, Suzuki K, Sasaki T, Tohkin M, Yoshinari K. Machine Learning-Based In Silico Prediction of the Inhibitory Activity of Chemical Substances Against Rat and Human Cytochrome P450s. Chem Res Toxicol 2024; 37:1843-1850. [PMID: 39427263 DOI: 10.1021/acs.chemrestox.4c00168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2024]
Abstract
The prediction of cytochrome P450 inhibition by a computational (quantitative) structure-activity relationship approach using chemical structure information and machine learning would be useful for toxicity research as a simple and rapid in silico tool. However, there are few in silico models focusing on the species differences between rat and human in the P450s inhibition. This study aimed to establish in silico models to classify chemical substances as inhibitors or non-inhibitors of various rat and human P450s, using only molecular descriptors. Using the in-house test results from our in vitro experiments, we used 326 substances for model construction and internal validation data. Apart from the 326 substances, 60 substances were used as external validation data set. We focused on seven rat P450s (CYP1A1, CYP1A2, CYP2B1, CYP2C6, CYP2D1, CYP2E1, and CYP3A2) and 11 human P450s (CYP1A1, CYP1A2, CYP1B1, CYP2A6, CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP2E1, and CYP3A4). Most of the models established using XGBoost showed an area under the receiver operating characteristic curve (ROC-AUC) of 0.8 or more in the internal validation. When we set an applicability domain for the models and confirmed their generalization performance through external validation, most of the models showed an ROC-AUC of 0.7 or more. Interestingly, for CYP1A1 and CYP1A2, we discovered that a human P450 inhibitory activity model can predict rat P450 inhibitory activity and vice versa. These models are the first attempts to predict inhibitory activity against a wide variety of P450s in both rats and humans using chemical structure information. Our experimental results and in silico models would be helpful to support information for species similarities and differences in chemical-induced toxicity.
Collapse
Affiliation(s)
- Kaori Ambe
- Department of Regulatory Science, Graduate School of Pharmaceutical Sciences, Nagoya City University, Nagoya 4678603, Japan
| | - Mizuki Nakamori
- Department of Regulatory Science, Graduate School of Pharmaceutical Sciences, Nagoya City University, Nagoya 4678603, Japan
| | - Riku Tohno
- Department of Regulatory Science, Graduate School of Pharmaceutical Sciences, Nagoya City University, Nagoya 4678603, Japan
| | - Kotaro Suzuki
- Department of Regulatory Science, Graduate School of Pharmaceutical Sciences, Nagoya City University, Nagoya 4678603, Japan
| | - Takamitsu Sasaki
- Laboratory of Molecular Toxicology, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka 4228526, Japan
| | - Masahiro Tohkin
- Department of Regulatory Science, Graduate School of Pharmaceutical Sciences, Nagoya City University, Nagoya 4678603, Japan
| | - Kouichi Yoshinari
- Laboratory of Molecular Toxicology, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka 4228526, Japan
| |
Collapse
|
5
|
Abou Hajal A, Bryce RA, Amor BB, Atatreh N, Ghattas MA. Boosting the Accuracy and Chemical Space Coverage of the Detection of Small Colloidal Aggregating Molecules Using the BAD Molecule Filter. J Chem Inf Model 2024; 64:4991-5005. [PMID: 38920403 DOI: 10.1021/acs.jcim.4c00363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/27/2024]
Abstract
The ability to conduct effective high throughput screening (HTS) campaigns in drug discovery is often hampered by the detection of false positives in these assays due to small colloidally aggregating molecules (SCAMs). SCAMs can produce artifactual hits in HTS by nonspecific inhibition of the protein target. In this work, we present a new computational prediction tool for detecting SCAMs based on their 2D chemical structure. The tool, called the boosted aggregation detection (BAD) molecule filter, employs decision tree ensemble methods, namely, the CatBoost classifier and the light gradient-boosting machine, to significantly improve the detection of SCAMs. In developing the filter, we explore models trained on individual data sets, a consensus approach using these models, and, third, a merged data set approach, each tailored for specific drug discovery needs. The individual data set method emerged as most effective, achieving 93% sensitivity and 90% specificity, outperforming existing state-of-the-art models by 20 and 5%, respectively. The consensus models offer broader chemical space coverage, exceeding 90% for all testing sets. This feature is an important aspect particularly for early stage medicinal chemistry projects, and provides information on applicability domain. Meanwhile, the merged data set models demonstrated robust performance, with a notable sensitivity of 79% in the comprehensive 10-fold cross-validation test set. A SHAP analysis of model features indicates the importance of hydrophobicity and molecular complexity as primary factors influencing the aggregation propensity. The BAD molecule filter is readily accessible for the public usage on https://molmodlab-aau.com/Tools.html. This filter provides a new, more robust tool for aggregate prediction in the early stages of drug discovery to optimize hit rates and reduce associated testing and validation overheads.
Collapse
Affiliation(s)
- Abdallah Abou Hajal
- College of Pharmacy, Al Ain University, Abu Dhabi 112612, United Arab Emirates
- AAU Health and Biomedical Research Center, Al Ain University, Abu Dhabi 112612, United Arab Emirates
| | - Richard A Bryce
- Division of Pharmacy and Optometry, School of Health Sciences, University of Manchester, Oxford Road, Manchester M13 9PL, U.K
| | - Boulbaba Ben Amor
- Core42, Inception/G42, Abu Dhabi 2282, United Arab Emirates
- IMT Nord Europe, Villeneuve D'Ascq 59650 France
| | - Noor Atatreh
- College of Pharmacy, Al Ain University, Abu Dhabi 112612, United Arab Emirates
- AAU Health and Biomedical Research Center, Al Ain University, Abu Dhabi 112612, United Arab Emirates
| | - Mohammad A Ghattas
- College of Pharmacy, Al Ain University, Abu Dhabi 112612, United Arab Emirates
- AAU Health and Biomedical Research Center, Al Ain University, Abu Dhabi 112612, United Arab Emirates
| |
Collapse
|
6
|
Kehrein J, Bunker A, Luxenhofer R. POxload: Machine Learning Estimates Drug Loadings of Polymeric Micelles. Mol Pharm 2024; 21:3356-3374. [PMID: 38805643 PMCID: PMC11394009 DOI: 10.1021/acs.molpharmaceut.4c00086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2024]
Abstract
Block copolymers, composed of poly(2-oxazoline)s and poly(2-oxazine)s, can serve as drug delivery systems; they form micelles that carry poorly water-soluble drugs. Many recent studies have investigated the effects of structural changes of the polymer and the hydrophobic cargo on drug loading. In this work, we combine these data to establish an extended formulation database. Different molecular properties and fingerprints are tested for their applicability to serve as formulation-specific mixture descriptors. A variety of classification and regression models are built for different descriptor subsets and thresholds of loading efficiency and loading capacity, with the best models achieving overall good statistics for both cross- and external validation (balanced accuracies of 0.8). Subsequently, important features are dissected for interpretation, and the DrugBank is screened for potential therapeutic use cases where these polymers could be used to develop novel formulations of hydrophobic drugs. The most promising models are provided as an open-source software tool for other researchers to test the applicability of these delivery systems for potential new drug candidates.
Collapse
Affiliation(s)
- Josef Kehrein
- Soft Matter Chemistry, Department of Chemistry, Faculty of Science, University of Helsinki, A. I. Virtasen aukio 1, 00014 Helsinki, Finland
- Drug Research Program, Division of Pharmaceutical Biosciences Faculty of Pharmacy, University of Helsinki, Viikinkaari 5 E, 00014 Helsinki, Finland
| | - Alex Bunker
- Drug Research Program, Division of Pharmaceutical Biosciences Faculty of Pharmacy, University of Helsinki, Viikinkaari 5 E, 00014 Helsinki, Finland
| | - Robert Luxenhofer
- Soft Matter Chemistry, Department of Chemistry, Faculty of Science, University of Helsinki, A. I. Virtasen aukio 1, 00014 Helsinki, Finland
| |
Collapse
|
7
|
Karamertzanis PG, Patlewicz G, Sannicola M, Paul-Friedman K, Shah I. Systematic Approaches for the Encoding of Chemical Groups: A Case Study. Chem Res Toxicol 2024; 37:600-619. [PMID: 38498310 PMCID: PMC11258607 DOI: 10.1021/acs.chemrestox.3c00411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
Regulatory authorities aim to organize substances into groups to facilitate prioritization within hazard and risk assessment processes. Often, such chemical groupings are not explicitly defined by structural rules or physicochemical property information. This is largely due to how these groupings are developed, namely, a manual expert curation process, which in turn makes updating and refining groupings, as new substances are evaluated, a practical challenge. Herein, machine learning methods were leveraged to build models that could preliminarily assign substances to predefined groups. A set of 86 groupings containing 2,184 substances as published on the European Chemicals Agency (ECHA) website were mapped to the U.S. Environmental Protection Agency (EPA) Distributed Toxicity Structure Database (DSSTox) content to extract chemical and structural information. Substances were represented using Morgan fingerprints, and two machine learning approaches were used to classify test substances into 56 groups containing at least 10 substances with a structural representation in the data set: k-nearest neighbor (kNN) and random forest (RF), that led to mean 5-fold cross-validation test accuracies (average F1 scores) of 0.781 and 0.853, respectively. With a 9% improvement, the RF classifier was significantly more accurate than KNN (p-value = 0.001). The approach offers promise as a means of the initial profiling of new substances into predefined groups to facilitate prioritization efforts and streamline the assessment of new substances when earlier groupings are available. The algorithm to fit and use these models has been made available in the accompanying repository, thereby enabling both use of the produced models and refitting of these models, as new groupings become available by regulatory authorities or industry.
Collapse
Affiliation(s)
- Panagiotis G Karamertzanis
- Computational Assessment and Alternative Methods, European Chemicals Agency (ECHA), Telakkakatu 6, Helsinki 00150, Finland
| | - Grace Patlewicz
- Center for Computational Toxicology and Exposure (CCTE), US EPA, 109 TW Alexander Dr, Research Triangle Park, North Carolina 27711, United States
| | - Marta Sannicola
- Computational Assessment and Alternative Methods, European Chemicals Agency (ECHA), Telakkakatu 6, Helsinki 00150, Finland
| | - Katie Paul-Friedman
- Center for Computational Toxicology and Exposure (CCTE), US EPA, 109 TW Alexander Dr, Research Triangle Park, North Carolina 27711, United States
| | - Imran Shah
- Center for Computational Toxicology and Exposure (CCTE), US EPA, 109 TW Alexander Dr, Research Triangle Park, North Carolina 27711, United States
| |
Collapse
|
8
|
Viganò EL, Ballabio D, Roncaglioni A. Artificial Intelligence and Machine Learning Methods to Evaluate Cardiotoxicity following the Adverse Outcome Pathway Frameworks. TOXICS 2024; 12:87. [PMID: 38276722 PMCID: PMC10820364 DOI: 10.3390/toxics12010087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 01/15/2024] [Accepted: 01/17/2024] [Indexed: 01/27/2024]
Abstract
Cardiovascular disease is a leading global cause of mortality. The potential cardiotoxic effects of chemicals from different classes, such as environmental contaminants, pesticides, and drugs can significantly contribute to effects on health. The same chemical can induce cardiotoxicity in different ways, following various Adverse Outcome Pathways (AOPs). In addition, the potential synergistic effects between chemicals further complicate the issue. In silico methods have become essential for tackling the problem from different perspectives, reducing the need for traditional in vivo testing, and saving valuable resources in terms of time and money. Artificial intelligence (AI) and machine learning (ML) are among today's advanced approaches for evaluating chemical hazards. They can serve, for instance, as a first-tier component of Integrated Approaches to Testing and Assessment (IATA). This study employed ML and AI to assess interactions between chemicals and specific biological targets within the AOP networks for cardiotoxicity, starting with molecular initiating events (MIEs) and progressing through key events (KEs). We explored methods to encode chemical information in a suitable way for ML and AI. We started with commonly used approaches in Quantitative Structure-Activity Relationship (QSAR) methods, such as molecular descriptors and different types of fingerprint. We then increased the complexity of encoders, incorporating graph-based methods, auto-encoders, and character embeddings employed in neural language processing. We also developed a multimodal neural network architecture, capable of considering the complementary nature of different chemical representations simultaneously. The potential of this approach, compared to more conventional architectures designed to handle a single encoder, becomes apparent when the amount of data increases.
Collapse
Affiliation(s)
- Edoardo Luca Viganò
- Laboratory of Environmental Toxicology and Chemistry, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCSS, 20156 Milan, Italy;
| | - Davide Ballabio
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, 20126 Milan, Italy;
| | - Alessandra Roncaglioni
- Laboratory of Environmental Toxicology and Chemistry, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCSS, 20156 Milan, Italy;
| |
Collapse
|
9
|
Jovic O, Mouras R. Extreme Gradient Boosting Combined with Conformal Predictors for Informative Solubility Estimation. Molecules 2023; 29:19. [PMID: 38202602 PMCID: PMC10779886 DOI: 10.3390/molecules29010019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 12/15/2023] [Accepted: 12/17/2023] [Indexed: 01/12/2024] Open
Abstract
We used the extreme gradient boosting (XGB) algorithm to predict the experimental solubility of chemical compounds in water and organic solvents and to select significant molecular descriptors. The accuracy of prediction of our forward stepwise top-importance XGB (FSTI-XGB) on curated solubility data sets in terms of RMSE was found to be 0.59-0.76 Log(S) for two water data sets, while for organic solvent data sets it was 0.69-0.79 Log(S) for the Methanol data set, 0.65-0.79 for the Ethanol data set, and 0.62-0.70 Log(S) for the Acetone data set. That was the first step. In the second step, we used uncurated and curated AquaSolDB data sets for applicability domain (AD) tests of Drugbank, PubChem, and COCONUT databases and determined that more than 95% of studied ca. 500,000 compounds were within the AD. In the third step, we applied conformal prediction to obtain narrow prediction intervals and we successfully validated them using test sets' true solubility values. With prediction intervals obtained in the last fourth step, we were able to estimate individual error margins and the accuracy class of the solubility prediction for molecules within the AD of three public databases. All that was possible without the knowledge of experimental database solubilities. We find these four steps novel because usually, solubility-related works only study the first step or the first two steps.
Collapse
Affiliation(s)
| | - Rabah Mouras
- Pharmaceutical Manufacturing Technology Centre, Bernal Institute, Department of Chemical Sciences, University of Limerick, V94 T9PX Limerick, Ireland;
| |
Collapse
|
10
|
Yuan W, Hibi Y, Tamura R, Sumita M, Nakamura Y, Naito M, Tsuda K. Revealing factors influencing polymer degradation with rank-based machine learning. PATTERNS (NEW YORK, N.Y.) 2023; 4:100846. [PMID: 38106610 PMCID: PMC10724228 DOI: 10.1016/j.patter.2023.100846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 07/07/2023] [Accepted: 08/30/2023] [Indexed: 12/19/2023]
Abstract
The efficient treatment of polymer waste is a major challenge for marine sustainability. It is useful to reveal the factors that dominate the degradability of polymer materials for developing polymer materials in the future. The small number of available datasets on degradability and the diversity of their experimental means and conditions hinder large-scale analysis. In this study, we have developed a platform for evaluating the degradability of polymers that is suitable for such data, using a rank-based machine learning technique based on RankSVM. We then made a ranking model to evaluate the degradability of polymers, integrating three datasets on the degradability of polymers that are measured by different means and conditions. Analysis of this ranking model with a decision tree revealed factors that dominate the degradability of polymers.
Collapse
Affiliation(s)
- Weilin Yuan
- Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
| | - Yusuke Hibi
- Research Center for Macromolecules and Biomaterials, National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan
| | - Ryo Tamura
- Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
- Center for Basic Research on Materials, National Institute for Materials Science, 1-1 Namiki, Tsukuba, Ibaraki 305-0044, Japan
- RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
| | - Masato Sumita
- RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
| | - Yasuyuki Nakamura
- Research Center for Macromolecules and Biomaterials, National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan
| | - Masanobu Naito
- Research Center for Macromolecules and Biomaterials, National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan
| | - Koji Tsuda
- Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
- Center for Basic Research on Materials, National Institute for Materials Science, 1-1 Namiki, Tsukuba, Ibaraki 305-0044, Japan
- RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
| |
Collapse
|
11
|
Dost K, Pullar-Strecker Z, Brydon L, Zhang K, Hafner J, Riddle PJ, Wicker JS. Combatting over-specialization bias in growing chemical databases. J Cheminform 2023; 15:53. [PMID: 37208694 DOI: 10.1186/s13321-023-00716-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 03/25/2023] [Indexed: 05/21/2023] Open
Abstract
BACKGROUND Predicting in advance the behavior of new chemical compounds can support the design process of new products by directing the research toward the most promising candidates and ruling out others. Such predictive models can be data-driven using Machine Learning or based on researchers' experience and depend on the collection of past results. In either case: models (or researchers) can only make reliable assumptions about compounds that are similar to what they have seen before. Therefore, consequent usage of these predictive models shapes the dataset and causes a continuous specialization shrinking the applicability domain of all trained models on this dataset in the future, and increasingly harming model-based exploration of the space. PROPOSED SOLUTION In this paper, we propose CANCELS (CounterActiNg Compound spEciaLization biaS), a technique that helps to break the dataset specialization spiral. Aiming for a smooth distribution of the compounds in the dataset, we identify areas in the space that fall short and suggest additional experiments that help bridge the gap. Thereby, we generally improve the dataset quality in an entirely unsupervised manner and create awareness of potential flaws in the data. CANCELS does not aim to cover the entire compound space and hence retains a desirable degree of specialization to a specified research domain. RESULTS An extensive set of experiments on the use-case of biodegradation pathway prediction not only reveals that the bias spiral can indeed be observed but also that CANCELS produces meaningful results. Additionally, we demonstrate that mitigating the observed bias is crucial as it cannot only intervene with the continuous specialization process, but also significantly improves a predictor's performance while reducing the number of required experiments. Overall, we believe that CANCELS can support researchers in their experimentation process to not only better understand their data and potential flaws, but also to grow the dataset in a sustainable way. All code is available under github.com/KatDost/Cancels .
Collapse
Affiliation(s)
- Katharina Dost
- School of Computer Science, University of Auckland, 38 Princes Street, 1010, Auckland, New Zealand.
- enviPath UG & Co. KG, In den Graswiesen 13, 55437, Ockenheim, Germany.
| | - Zac Pullar-Strecker
- School of Computer Science, University of Auckland, 38 Princes Street, 1010, Auckland, New Zealand
| | - Liam Brydon
- School of Computer Science, University of Auckland, 38 Princes Street, 1010, Auckland, New Zealand
| | - Kunyang Zhang
- Eawag-Swiss Federal Institute of Aquatic Science and Technology, Überlandstrasse 133, 8600, Dübendorf, Switzerland
| | - Jasmin Hafner
- Eawag-Swiss Federal Institute of Aquatic Science and Technology, Überlandstrasse 133, 8600, Dübendorf, Switzerland
| | - Patricia J Riddle
- School of Computer Science, University of Auckland, 38 Princes Street, 1010, Auckland, New Zealand
| | - Jörg S Wicker
- School of Computer Science, University of Auckland, 38 Princes Street, 1010, Auckland, New Zealand
- enviPath UG & Co. KG, In den Graswiesen 13, 55437, Ockenheim, Germany
| |
Collapse
|
12
|
Cheng Z, Bhave M, Hwang SS, Rahman T, Chee XW. Identification of Potential p38γ Inhibitors via In Silico Screening, In Vitro Bioassay and Molecular Dynamics Simulation Studies. Int J Mol Sci 2023; 24:ijms24087360. [PMID: 37108523 PMCID: PMC10139033 DOI: 10.3390/ijms24087360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 04/06/2023] [Accepted: 04/13/2023] [Indexed: 04/29/2023] Open
Abstract
Protein kinase p38γ is an attractive target against cancer because it plays a pivotal role in cancer cell proliferation by phosphorylating the retinoblastoma tumour suppressor protein. Therefore, inhibition of p38γ with active small molecules represents an attractive alternative for developing anti-cancer drugs. In this work, we present a rigorous and systematic virtual screening framework to identify potential p38γ inhibitors against cancer. We combined the use of machine learning-based quantitative structure activity relationship modelling with conventional computer-aided drug discovery techniques, namely molecular docking and ligand-based methods, to identify potential p38γ inhibitors. The hit compounds were filtered using negative design techniques and then assessed for their binding stability with p38γ through molecular dynamics simulations. To this end, we identified a promising compound that inhibits p38γ activity at nanomolar concentrations and hepatocellular carcinoma cell growth in vitro in the low micromolar range. This hit compound could serve as a potential scaffold for further development of a potent p38γ inhibitor against cancer.
Collapse
Affiliation(s)
- Zixuan Cheng
- School of Engineering and Science, Swinburne University of Technology Sarawak, Kuching 93350, Malaysia
| | - Mrinal Bhave
- Department of Chemistry and Biotechnology, Swinburne University of Technology, Melbourne, VIC 3122, Australia
| | - Siaw San Hwang
- School of Engineering and Science, Swinburne University of Technology Sarawak, Kuching 93350, Malaysia
| | - Taufiq Rahman
- Department of Pharmacology, University of Cambridge, Cambridge CB2 1PD, UK
| | - Xavier Wezen Chee
- School of Engineering and Science, Swinburne University of Technology Sarawak, Kuching 93350, Malaysia
| |
Collapse
|
13
|
Ma W, Wang M, Jiang R, Chen W. A machine learning based approach for estimating site-specific partition coefficient K d of organic compounds: Application to nonionic pesticides. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2023; 323:121297. [PMID: 36796665 DOI: 10.1016/j.envpol.2023.121297] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 02/01/2023] [Accepted: 02/13/2023] [Indexed: 06/18/2023]
Abstract
The partitioning coefficient Kd for a specific compound and location is not only a key input parameter of fate and transport models, but also critical in estimating the safety environmental concentration threshold. In order to reduce the uncertainty caused by non-linear interactions among environmental factors, machine learning based models for predicting Kd were developed in this work based on literature datasets of nonionic pesticides including molecular descriptors, soil properties, and experimental settings. The equilibrium concentration (Ce) values were specifically included for the reason that a varied range of Kd corresponding to a given Ce occurred in a real environment. By transforming 466 isotherms reported in the literature, 2618 paired equilibrium concentrations of liquid-solid (Ce-Qe) data points were obtained. Results of SHapley Additive exPlanations revealed that soil organic carbon, Ce, and cavity formation were the most important. The distance-based applicability domain analysis was conducted for the 27 most frequently used pesticides with 15952 pieces of soil information from the HWSD-China dataset by setting three Ce scenarios (i.e., 10, 100, and 1000 μg L-1). It was revealed the groups of compounds showing log Kd < 0.06 and log Kd > 1.19 were composed mostly of those with log Kow of -0.800 and 5.50, respectively. When log Kd varied between 0.100 and 1.00, it was impacted by interactions among soil types, molecular descriptors, and Ce comprehensively, which accounted for 55% of the total 2618 calculations. It could be concluded that site-specific models developed in this work are necessary and practicable for the environmental risk assessment and management of nonionic organic compounds.
Collapse
Affiliation(s)
- Wankai Ma
- State Key Laboratory of Urban and Regional Ecology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, 100085, China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Meie Wang
- State Key Laboratory of Urban and Regional Ecology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, 100085, China.
| | - Rong Jiang
- State Key Laboratory of Urban and Regional Ecology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, 100085, China
| | - Weiping Chen
- State Key Laboratory of Urban and Regional Ecology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, 100085, China; College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, 100049, China
| |
Collapse
|
14
|
De León G, Fröhlich E, Fink E, Di Pizio A, Salar-Behzadi S. Premexotac: Machine learning bitterants predictor for advancing pharmaceutical development. Int J Pharm 2022; 628:122263. [DOI: 10.1016/j.ijpharm.2022.122263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 09/27/2022] [Accepted: 09/29/2022] [Indexed: 10/31/2022]
|
15
|
Xiang R, Fernandez-Lopez L, Robles-Martín A, Ferrer M, Guallar V. EP-Pred: A Machine Learning Tool for Bioprospecting Promiscuous Ester Hydrolases. Biomolecules 2022; 12:1529. [PMID: 36291739 PMCID: PMC9599548 DOI: 10.3390/biom12101529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 10/11/2022] [Accepted: 10/18/2022] [Indexed: 11/25/2022] Open
Abstract
When bioprospecting for novel industrial enzymes, substrate promiscuity is a desirable property that increases the reusability of the enzyme. Among industrial enzymes, ester hydrolases have great relevance for which the demand has not ceased to increase. However, the search for new substrate promiscuous ester hydrolases is not trivial since the mechanism behind this property is greatly influenced by the active site's structural and physicochemical characteristics. These characteristics must be computed from the 3D structure, which is rarely available and expensive to measure, hence the need for a method that can predict promiscuity from sequence alone. Here we report such a method called EP-pred, an ensemble binary classifier, that combines three machine learning algorithms: SVM, KNN, and a Linear model. EP-pred has been evaluated against the Lipase Engineering Database together with a hidden Markov approach leading to a final set of ten sequences predicted to encode promiscuous esterases. Experimental results confirmed the validity of our method since all ten proteins were found to exhibit a broad substrate ambiguity.
Collapse
Affiliation(s)
- Ruite Xiang
- Department of Life Sciences, Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
| | | | - Ana Robles-Martín
- Department of Life Sciences, Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
| | - Manuel Ferrer
- Department of Applied Biocatalysis, ICP, CSIC, 28049 Madrid, Spain
| | - Victor Guallar
- Department of Life Sciences, Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
| |
Collapse
|
16
|
Collins SP, Barton-Maclaren TS. Novel machine learning models to predict endocrine disruption activity for high-throughput chemical screening. FRONTIERS IN TOXICOLOGY 2022; 4:981928. [PMID: 36204696 PMCID: PMC9530987 DOI: 10.3389/ftox.2022.981928] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 09/02/2022] [Indexed: 11/13/2022] Open
Abstract
An area of ongoing concern in toxicology and chemical risk assessment is endocrine disrupting chemicals (EDCs). However, thousands of legacy chemicals lack the toxicity testing required to assess their respective EDC potential, and this is where computational toxicology can play a crucial role. The US (United States) Environmental Protection Agency (EPA) has run two programs, the Collaborative Estrogen Receptor Activity Project (CERAPP) and the Collaborative Modeling Project for Receptor Activity (CoMPARA) which aim to predict estrogen and androgen activity, respectively. The US EPA solicited research groups from around the world to provide endocrine receptor activity Qualitative (or Quantitative) Structure Activity Relationship ([Q]SAR) models and then combined them to create consensus models for different toxicity endpoints. Random Forest (RF) models were developed to cover a broader range of substances with high predictive capabilities using large datasets from CERAPP and CoMPARA for estrogen and androgen activity, respectively. By utilizing simple descriptors from open-source software and large training datasets, RF models were created to expand the domain of applicability for predicting endocrine disrupting activity and help in the screening and prioritization of extensive chemical inventories. In addition, RFs were trained to conservatively predict the activity, meaning models are more likely to make false-positive predictions to minimize the number of False Negatives. This work presents twelve binary and multi-class RF models to predict binding, agonism, and antagonism for estrogen and androgen receptors. The RF models were found to have high predictive capabilities compared to other in silico modes, with some models reaching balanced accuracies of 93% while having coverage of 89%. These models are intended to be incorporated into evolving priority-setting workflows and integrated strategies to support the screening and selection of chemicals for further testing and assessment by identifying potential endocrine-disrupting substances.
Collapse
|
17
|
Bitam S, Hamadache M, Hanini S. 2D-QSAR, docking, molecular dynamics, studies of PF-07321332 analogues to identify alternative inhibitors against 3CL pro enzyme in SARS-CoV disease. J Biomol Struct Dyn 2022:1-10. [PMID: 35983623 DOI: 10.1080/07391102.2022.2113822] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
Abstract
Given the results of the Pfizer-developed inhibitor PF-07321332 in the treatment of the SARS-Covid-19 epidemic, we aimed to identify potential alternatives to this compound by utilizing various methods; we developed 2 D-QSAR models to predict the therapeutic activity of 78 analogues of PF-07321332, three statistical learning techniques including (MLP-ANN), (SVR), and (MLR) were exploited. Various validation approaches were applied to the three models developed following the use of five most relevant descriptors. The study of the characteristics of these descriptors proved that the inhibitory activity of PF-07321332 analogues is specifically affected by the structure of the molecule, its polarizability, and by the hydrogen bonds. The best model, named MLP-ANN (with a 5-3-1 architecture), was selected on the basis of the following statistical parameters: r2 = 0.922, Q2 = 0.921. In addition, we performed a molecular docking and a molecular dynamics analysis of these compounds. The obtained results confirm that compound 8 can be a good alternative for compound PF-07321332.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Said Bitam
- Faculté de Technologie, Département du Génie des Procédés et Environnement, Laboratoire des Biomatériaux et Phénomènes de Transport (LBMPT), Université de Médéa, Médéa, Algérie
| | - Mabrouk Hamadache
- Faculté de Technologie, Département du Génie des Procédés et Environnement, Laboratoire des Biomatériaux et Phénomènes de Transport (LBMPT), Université de Médéa, Médéa, Algérie
| | - Salah Hanini
- Faculté de Technologie, Département du Génie des Procédés et Environnement, Laboratoire des Biomatériaux et Phénomènes de Transport (LBMPT), Université de Médéa, Médéa, Algérie
| |
Collapse
|
18
|
Morita K, Mizuno T, Kusuhara H. Investigation of a Data Split Strategy Involving the Time Axis in Adverse Event Prediction Using Machine Learning. J Chem Inf Model 2022; 62:3982-3992. [PMID: 35971760 DOI: 10.1021/acs.jcim.2c00765] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Adverse events are a serious issue in drug development, and many prediction methods using machine learning have been developed. The random split cross-validation is the de facto standard for model building and evaluation in machine learning, but care should be taken in adverse event prediction because this approach does not strictly match the real-world situation. The time split, which uses the time axis, is considered suitable for real-world prediction. However, the differences in model performance obtained using the time and random splits are not clear due to the lack of comparable studies. To understand the differences, we compared the model performance between the time and random splits using nine types of compound information as input, eight adverse events as targets, and six machine learning algorithms. The random split showed higher area under the curve values than did the time split for six of eight targets. The chemical spaces of the training and test datasets of the time split were similar, suggesting that the concept of applicability domain is insufficient to explain the differences derived from the splitting. The area under the curve differences were smaller for the protein interaction than for the other datasets. Subsequent detailed analyses suggested the danger of confounding in the use of knowledge-based information in the time split. These findings indicate the importance of understanding the differences between the time and random splits in adverse event prediction and suggest that appropriate use of the splitting strategies and interpretation of results are necessary for the real-world prediction of adverse events. We provide the analysis code and datasets used in the present study at https://github.com/mizuno-group/AE_prediction.
Collapse
Affiliation(s)
- Katsuhisa Morita
- Graduate School of Pharmaceutical Sciences, The University of Tokyo, Bunkyo-ku, Tokyo 113-0033, Japan
| | - Tadahaya Mizuno
- Graduate School of Pharmaceutical Sciences, The University of Tokyo, Bunkyo-ku, Tokyo 113-0033, Japan
| | - Hiroyuki Kusuhara
- Graduate School of Pharmaceutical Sciences, The University of Tokyo, Bunkyo-ku, Tokyo 113-0033, Japan
| |
Collapse
|
19
|
Karpov K, Mitrofanov A, Korolev V, Tkachenko V. Size Doesn't Matter: Predicting Physico- or Biochemical Properties Based on Dozens of Molecules. J Phys Chem Lett 2021; 12:9213-9219. [PMID: 34529429 DOI: 10.1021/acs.jpclett.1c02477] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The use of machine learning in chemistry has become a common practice. At the same time, despite the success of modern machine learning methods, the lack of data limits their use. Using a transfer learning methodology can help solve this problem. This methodology assumes that a model built on a sufficient amount of data captures general features of the chemical compound structure on which it was trained and that the further reuse of these features on a data set with a lack of data will greatly improve the quality of the new model. In this paper, we develop this approach for small organic molecules, implementing transfer learning with graph convolutional neural networks. The paper shows a significant improvement in the performance of the models for target properties with a lack of data. The effects of the data set composition on the model's quality and the applicability domain of the resulting models are also considered.
Collapse
Affiliation(s)
- Kirill Karpov
- Department of Chemistry, Lomonosov Moscow State University, Leninskie Gory 1, Building 3, Moscow 119991, Russia
- Science Data Software, LLC, 14909 Forest Landing Circle, Rockville, Maryland 20850, United States
| | - Artem Mitrofanov
- Department of Chemistry, Lomonosov Moscow State University, Leninskie Gory 1, Building 3, Moscow 119991, Russia
- Science Data Software, LLC, 14909 Forest Landing Circle, Rockville, Maryland 20850, United States
| | - Vadim Korolev
- Department of Chemistry, Lomonosov Moscow State University, Leninskie Gory 1, Building 3, Moscow 119991, Russia
- Science Data Software, LLC, 14909 Forest Landing Circle, Rockville, Maryland 20850, United States
| | - Valery Tkachenko
- Science Data Software, LLC, 14909 Forest Landing Circle, Rockville, Maryland 20850, United States
| |
Collapse
|
20
|
Gonzalez E, Jain S, Shah P, Torimoto-Katori N, Zakharov A, Nguyễn ÐT, Sakamuru S, Huang R, Xia M, Obach RS, Hop CECA, Simeonov A, Xu X. Development of Robust Quantitative Structure-Activity Relationship Models for CYP2C9, CYP2D6, and CYP3A4 Catalysis and Inhibition. Drug Metab Dispos 2021; 49:822-832. [PMID: 34183376 PMCID: PMC11022912 DOI: 10.1124/dmd.120.000320] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 06/17/2021] [Indexed: 11/22/2022] Open
Abstract
Cytochrome P450 enzymes are responsible for the metabolism of >75% of marketed drugs, making it essential to identify the contributions of individual cytochromes P450 to the total clearance of a new candidate drug. Overreliance on one cytochrome P450 for clearance levies a high risk of drug-drug interactions; and considering that several human cytochrome P450 enzymes are polymorphic, it can also lead to highly variable pharmacokinetics in the clinic. Thus, it would be advantageous to understand the likelihood of new chemical entities to interact with the major cytochrome P450 enzymes at an early stage in the drug discovery process. Typical screening assays using human liver microsomes do not provide sufficient information to distinguish the specific cytochromes P450 responsible for clearance. In this regard, we experimentally assessed the metabolic stability of ∼5000 compounds for the three most prominent xenobiotic metabolizing human cytochromes P450, i.e., CYP2C9, CYP2D6, and CYP3A4, and used the data sets to develop quantitative structure-activity relationship models for the prediction of high-clearance substrates for these enzymes. Screening library included the NCATS Pharmaceutical Collection, comprising clinically approved low-molecular-weight compounds, and an annotated library consisting of drug-like compounds. To identify inhibitors, the library was screened against a luminescence-based cytochrome P450 inhibition assay; and through crossreferencing hits from the two assays, we were able to distinguish substrates and inhibitors of these enzymes. The best substrate and inhibitor models (balanced accuracies ∼0.7), as well as the data used to develop these models, have been made publicly available (https://opendata.ncats.nih.gov/adme) to advance drug discovery across all research groups. SIGNIFICANCE STATEMENT: In drug discovery and development, drug candidates with indiscriminate cytochrome P450 metabolic profiles are considered advantageous, since they provide less risk of potential issues with cytochrome P450 polymorphisms and drug-drug interactions. This study developed robust substrate and inhibitor quantitative structure-activity relationship models for the three major xenobiotic metabolizing cytochromes P450, i.e., CYP2C9, CYP2D6, and CYP3A4. The use of these models early in drug discovery will enable project teams to strategize or pivot when necessary, thereby accelerating drug discovery research.
Collapse
Affiliation(s)
- Eric Gonzalez
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), Rockville, Maryland (E.G., S.J., P.S., N.T.-K., A.Z., D.-T.N., S.S., R.H., M.X. A.S., X.X.); Discovery Technology Laboratories, Sohyaku. Innovative Research Division, Mitsubishi Tanabe Pharma Corporation, Yokohama-shi, Japan (N.T.-K.); Pfizer Inc. Department of Pharmacokinetics, Dynamics and Metabolism, Pfizer, Groton, Connecticut (R.S.O.); and Genentech Inc. Department of Drug Metabolism and Pharmacokinetics, Genentech Inc., San Francisco, California (C.E.C.A.H.)
| | - Sankalp Jain
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), Rockville, Maryland (E.G., S.J., P.S., N.T.-K., A.Z., D.-T.N., S.S., R.H., M.X. A.S., X.X.); Discovery Technology Laboratories, Sohyaku. Innovative Research Division, Mitsubishi Tanabe Pharma Corporation, Yokohama-shi, Japan (N.T.-K.); Pfizer Inc. Department of Pharmacokinetics, Dynamics and Metabolism, Pfizer, Groton, Connecticut (R.S.O.); and Genentech Inc. Department of Drug Metabolism and Pharmacokinetics, Genentech Inc., San Francisco, California (C.E.C.A.H.)
| | - Pranav Shah
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), Rockville, Maryland (E.G., S.J., P.S., N.T.-K., A.Z., D.-T.N., S.S., R.H., M.X. A.S., X.X.); Discovery Technology Laboratories, Sohyaku. Innovative Research Division, Mitsubishi Tanabe Pharma Corporation, Yokohama-shi, Japan (N.T.-K.); Pfizer Inc. Department of Pharmacokinetics, Dynamics and Metabolism, Pfizer, Groton, Connecticut (R.S.O.); and Genentech Inc. Department of Drug Metabolism and Pharmacokinetics, Genentech Inc., San Francisco, California (C.E.C.A.H.)
| | - Nao Torimoto-Katori
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), Rockville, Maryland (E.G., S.J., P.S., N.T.-K., A.Z., D.-T.N., S.S., R.H., M.X. A.S., X.X.); Discovery Technology Laboratories, Sohyaku. Innovative Research Division, Mitsubishi Tanabe Pharma Corporation, Yokohama-shi, Japan (N.T.-K.); Pfizer Inc. Department of Pharmacokinetics, Dynamics and Metabolism, Pfizer, Groton, Connecticut (R.S.O.); and Genentech Inc. Department of Drug Metabolism and Pharmacokinetics, Genentech Inc., San Francisco, California (C.E.C.A.H.)
| | - Alexey Zakharov
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), Rockville, Maryland (E.G., S.J., P.S., N.T.-K., A.Z., D.-T.N., S.S., R.H., M.X. A.S., X.X.); Discovery Technology Laboratories, Sohyaku. Innovative Research Division, Mitsubishi Tanabe Pharma Corporation, Yokohama-shi, Japan (N.T.-K.); Pfizer Inc. Department of Pharmacokinetics, Dynamics and Metabolism, Pfizer, Groton, Connecticut (R.S.O.); and Genentech Inc. Department of Drug Metabolism and Pharmacokinetics, Genentech Inc., San Francisco, California (C.E.C.A.H.)
| | - Ðắc-Trung Nguyễn
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), Rockville, Maryland (E.G., S.J., P.S., N.T.-K., A.Z., D.-T.N., S.S., R.H., M.X. A.S., X.X.); Discovery Technology Laboratories, Sohyaku. Innovative Research Division, Mitsubishi Tanabe Pharma Corporation, Yokohama-shi, Japan (N.T.-K.); Pfizer Inc. Department of Pharmacokinetics, Dynamics and Metabolism, Pfizer, Groton, Connecticut (R.S.O.); and Genentech Inc. Department of Drug Metabolism and Pharmacokinetics, Genentech Inc., San Francisco, California (C.E.C.A.H.)
| | - Srilatha Sakamuru
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), Rockville, Maryland (E.G., S.J., P.S., N.T.-K., A.Z., D.-T.N., S.S., R.H., M.X. A.S., X.X.); Discovery Technology Laboratories, Sohyaku. Innovative Research Division, Mitsubishi Tanabe Pharma Corporation, Yokohama-shi, Japan (N.T.-K.); Pfizer Inc. Department of Pharmacokinetics, Dynamics and Metabolism, Pfizer, Groton, Connecticut (R.S.O.); and Genentech Inc. Department of Drug Metabolism and Pharmacokinetics, Genentech Inc., San Francisco, California (C.E.C.A.H.)
| | - Ruili Huang
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), Rockville, Maryland (E.G., S.J., P.S., N.T.-K., A.Z., D.-T.N., S.S., R.H., M.X. A.S., X.X.); Discovery Technology Laboratories, Sohyaku. Innovative Research Division, Mitsubishi Tanabe Pharma Corporation, Yokohama-shi, Japan (N.T.-K.); Pfizer Inc. Department of Pharmacokinetics, Dynamics and Metabolism, Pfizer, Groton, Connecticut (R.S.O.); and Genentech Inc. Department of Drug Metabolism and Pharmacokinetics, Genentech Inc., San Francisco, California (C.E.C.A.H.)
| | - Menghang Xia
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), Rockville, Maryland (E.G., S.J., P.S., N.T.-K., A.Z., D.-T.N., S.S., R.H., M.X. A.S., X.X.); Discovery Technology Laboratories, Sohyaku. Innovative Research Division, Mitsubishi Tanabe Pharma Corporation, Yokohama-shi, Japan (N.T.-K.); Pfizer Inc. Department of Pharmacokinetics, Dynamics and Metabolism, Pfizer, Groton, Connecticut (R.S.O.); and Genentech Inc. Department of Drug Metabolism and Pharmacokinetics, Genentech Inc., San Francisco, California (C.E.C.A.H.)
| | - R Scott Obach
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), Rockville, Maryland (E.G., S.J., P.S., N.T.-K., A.Z., D.-T.N., S.S., R.H., M.X. A.S., X.X.); Discovery Technology Laboratories, Sohyaku. Innovative Research Division, Mitsubishi Tanabe Pharma Corporation, Yokohama-shi, Japan (N.T.-K.); Pfizer Inc. Department of Pharmacokinetics, Dynamics and Metabolism, Pfizer, Groton, Connecticut (R.S.O.); and Genentech Inc. Department of Drug Metabolism and Pharmacokinetics, Genentech Inc., San Francisco, California (C.E.C.A.H.)
| | - Cornelis E C A Hop
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), Rockville, Maryland (E.G., S.J., P.S., N.T.-K., A.Z., D.-T.N., S.S., R.H., M.X. A.S., X.X.); Discovery Technology Laboratories, Sohyaku. Innovative Research Division, Mitsubishi Tanabe Pharma Corporation, Yokohama-shi, Japan (N.T.-K.); Pfizer Inc. Department of Pharmacokinetics, Dynamics and Metabolism, Pfizer, Groton, Connecticut (R.S.O.); and Genentech Inc. Department of Drug Metabolism and Pharmacokinetics, Genentech Inc., San Francisco, California (C.E.C.A.H.)
| | - Anton Simeonov
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), Rockville, Maryland (E.G., S.J., P.S., N.T.-K., A.Z., D.-T.N., S.S., R.H., M.X. A.S., X.X.); Discovery Technology Laboratories, Sohyaku. Innovative Research Division, Mitsubishi Tanabe Pharma Corporation, Yokohama-shi, Japan (N.T.-K.); Pfizer Inc. Department of Pharmacokinetics, Dynamics and Metabolism, Pfizer, Groton, Connecticut (R.S.O.); and Genentech Inc. Department of Drug Metabolism and Pharmacokinetics, Genentech Inc., San Francisco, California (C.E.C.A.H.)
| | - Xin Xu
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), Rockville, Maryland (E.G., S.J., P.S., N.T.-K., A.Z., D.-T.N., S.S., R.H., M.X. A.S., X.X.); Discovery Technology Laboratories, Sohyaku. Innovative Research Division, Mitsubishi Tanabe Pharma Corporation, Yokohama-shi, Japan (N.T.-K.); Pfizer Inc. Department of Pharmacokinetics, Dynamics and Metabolism, Pfizer, Groton, Connecticut (R.S.O.); and Genentech Inc. Department of Drug Metabolism and Pharmacokinetics, Genentech Inc., San Francisco, California (C.E.C.A.H.)
| |
Collapse
|
21
|
Meyer H, Pebesma E. Predicting into unknown space? Estimating the area of applicability of spatial prediction models. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13650] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Hanna Meyer
- Institute of Landscape Ecology Westfälische Wilhelms‐Universität Münster Münster Germany
| | - Edzer Pebesma
- Institute for Geoinformatics Westfälische Wilhelms‐Universität Münster Münster Germany
| |
Collapse
|
22
|
Tang BH, Guan Z, Allegaert K, Wu YE, Manolis E, Leroux S, Yao BF, Shi HY, Li X, Huang X, Wang WQ, Shen AD, Wang XL, Wang TY, Kou C, Xu HY, Zhou Y, Zheng Y, Hao GX, Xu BP, Thomson AH, Capparelli EV, Biran V, Simon N, Meibohm B, Lo YL, Marques R, Peris JE, Lutsar I, Saito J, Burggraaf J, Jacqz-Aigrain E, van den Anker J, Zhao W. Drug Clearance in Neonates: A Combination of Population Pharmacokinetic Modelling and Machine Learning Approaches to Improve Individual Prediction. Clin Pharmacokinet 2021; 60:1435-1448. [PMID: 34041714 DOI: 10.1007/s40262-021-01033-x] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/28/2021] [Indexed: 12/17/2022]
Abstract
BACKGROUND Population pharmacokinetic evaluations have been widely used in neonatal pharmacokinetic studies, while machine learning has become a popular approach to solving complex problems in the current era of big data. OBJECTIVE The aim of this proof-of-concept study was to evaluate whether combining population pharmacokinetic and machine learning approaches could provide a more accurate prediction of the clearance of renally eliminated drugs in individual neonates. METHODS Six drugs that are primarily eliminated by the kidneys were selected (vancomycin, latamoxef, cefepime, azlocillin, ceftazidime, and amoxicillin) as 'proof of concept' compounds. Individual estimates of clearance obtained from population pharmacokinetic models were used as reference clearances, and diverse machine learning methods and nested cross-validation were adopted and evaluated against these reference clearances. The predictive performance of these combined methods was compared with the performance of two other predictive methods: a covariate-based maturation model and a postmenstrual age and body weight scaling model. Relative error was used to evaluate the different methods. RESULTS The extra tree regressor was selected as the best-fit machine learning method. Using the combined method, more than 95% of predictions for all six drugs had a relative error of < 50% and the mean relative error was reduced by an average of 44.3% and 71.3% compared with the other two predictive methods. CONCLUSION A combined population pharmacokinetic and machine learning approach provided improved predictions of individual clearances of renally cleared drugs in neonates. For a new patient treated in clinical practice, individual clearance can be predicted a priori using our model code combined with demographic data.
Collapse
Affiliation(s)
- Bo-Hao Tang
- Department of Clinical Pharmacy, Key Laboratory of Chemical Biology (Ministry of Education), School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan, 250012, People's Republic of China
| | - Zheng Guan
- Centre for Human Drug Research, Leiden, The Netherlands.,Leiden University Medical Center, Leiden, The Netherlands
| | - Karel Allegaert
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium.,Department of Pharmaceutical and Pharmacological Sciences, KU Leuven, Leuven, Belgium
| | - Yue-E Wu
- Department of Clinical Pharmacy, Key Laboratory of Chemical Biology (Ministry of Education), School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan, 250012, People's Republic of China
| | - Efthymios Manolis
- Modelling and Simulation Working Party, European Medicines Agency, Amsterdam, The Netherlands
| | | | - Bu-Fan Yao
- Department of Clinical Pharmacy, Key Laboratory of Chemical Biology (Ministry of Education), School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan, 250012, People's Republic of China
| | - Hai-Yan Shi
- Department of Pharmacy, Shandong Provincial Qianfoshan Hospital, The First Affiliated Hospital of Shandong First Medical University, Jinan, People's Republic of China
| | - Xiao Li
- Department of Pharmacy, Shandong Provincial Qianfoshan Hospital, The First Affiliated Hospital of Shandong First Medical University, Jinan, People's Republic of China
| | - Xin Huang
- Department of Pharmacy, Shandong Provincial Qianfoshan Hospital, The First Affiliated Hospital of Shandong First Medical University, Jinan, People's Republic of China.,Clinical Research Center, Shandong Provincial Qianfoshan Hospital, The First Affiliated Hospital of Shandong First Medical University, Jinan, People's Republic of China
| | - Wen-Qi Wang
- Clinical Research Center, Shandong Provincial Qianfoshan Hospital, The First Affiliated Hospital of Shandong First Medical University, Jinan, People's Republic of China
| | - A-Dong Shen
- Key Laboratory of Major Diseases in Children and National Key Discipline of Pediatrics (Capital Medical University), Ministry of Education, Beijing Pediatric Research Institute, Beijing Children's Hospital, Capital Medical University, Beijing, People's Republic of China
| | - Xiao-Ling Wang
- Clinical Research Center, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, People's Republic of China
| | - Tian-You Wang
- Clinical Research Center, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, People's Republic of China
| | - Chen Kou
- Department of Neonatology, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing, People's Republic of China
| | - Hai-Yan Xu
- Department of Pediatrics, Shandong Provincial Qianfoshan Hospital, The First Affiliated Hospital of Shandong First Medical University, Jinan, People's Republic of China
| | - Yue Zhou
- Department of Clinical Pharmacy, Key Laboratory of Chemical Biology (Ministry of Education), School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan, 250012, People's Republic of China
| | - Yi Zheng
- Department of Clinical Pharmacy, Key Laboratory of Chemical Biology (Ministry of Education), School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan, 250012, People's Republic of China
| | - Guo-Xiang Hao
- Department of Clinical Pharmacy, Key Laboratory of Chemical Biology (Ministry of Education), School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan, 250012, People's Republic of China
| | - Bao-Ping Xu
- Department of Respiratory Diseases, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, People's Republic of China
| | - Alison H Thomson
- Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, UK
| | - Edmund V Capparelli
- Pediatric Pharmacology and Drug Discovery, University of California, San Diego, CA, USA
| | - Valerie Biran
- Neonatal Intensive Care Unit, Hospital Robert Debre, Paris, France
| | - Nicolas Simon
- Aix Marseille Univ, APHM, INSERM, IRD, SESSTIM, Hop Sainte Marguerite, Service de Pharmacologie Clinique, CAP-TV, Marseille, France
| | - Bernd Meibohm
- Department of Pharmaceutical Sciences, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Yoke-Lin Lo
- Department of Pharmacy, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia.,School of Pharmacy, International Medical University, Kuala Lumpur, Malaysia
| | - Remedios Marques
- Department of Pharmacy Services, La Fe Hospital, Valencia, Spain
| | - Jose-Esteban Peris
- Department of Pharmacy and Pharmaceutical Technology, University of Valencia, Valencia, Spain
| | - Irja Lutsar
- Institute of Medical Microbiology, University of Tartu, Tartu, Estonia
| | - Jumpei Saito
- Department of Pharmacy, National Children's Hospital National Center for Child Health and Development, Tokyo, Japan
| | - Jacobus Burggraaf
- Centre for Human Drug Research, Leiden, The Netherlands.,Leiden University Medical Center, Leiden, The Netherlands
| | - Evelyne Jacqz-Aigrain
- Department of Pediatric Pharmacology and Pharmacogenetics, Hospital Robert Debre, APHP, Paris, France.,Clinical Investigation Center CIC1426, Hoŝpital Robert Debre, Paris, France.,University Paris Diderot, Sorbonne Paris Cite, Paris, France
| | - John van den Anker
- Division of Clinical Pharmacology, Children's National Hospital, Washington, DC, USA.,Departments of Pediatrics, Pharmacology and Physiology, Genomics and Precision Medicine, George Washington University School of Medicine and Health Sciences, Washington, DC, USA.,Department of Paediatric Pharmacology and Pharmacometrics, University of Basel Children's Hospital, Basel, Switzerland
| | - Wei Zhao
- Department of Clinical Pharmacy, Key Laboratory of Chemical Biology (Ministry of Education), School of Pharmaceutical Sciences, Cheeloo College of Medicine, Shandong University, Jinan, 250012, People's Republic of China. .,Modelling and Simulation Working Party, European Medicines Agency, Amsterdam, The Netherlands. .,Department of Pharmacy, Shandong Provincial Qianfoshan Hospital, The First Affiliated Hospital of Shandong First Medical University, Jinan, People's Republic of China. .,Clinical Research Center, Shandong Provincial Qianfoshan Hospital, The First Affiliated Hospital of Shandong First Medical University, Jinan, People's Republic of China.
| |
Collapse
|
23
|
Bugeac CA, Ancuceanu R, Dinu M. QSAR Models for Active Substances against Pseudomonas aeruginosa Using Disk-Diffusion Test Data. Molecules 2021; 26:molecules26061734. [PMID: 33808845 PMCID: PMC8003670 DOI: 10.3390/molecules26061734] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Revised: 03/14/2021] [Accepted: 03/15/2021] [Indexed: 12/02/2022] Open
Abstract
Pseudomonas aeruginosa is a Gram-negative bacillus included among the six “ESKAPE” microbial species with an outstanding ability to “escape” currently used antibiotics and developing new antibiotics against it is of the highest priority. Whereas minimum inhibitory concentration (MIC) values against Pseudomonas aeruginosa have been used previously for QSAR model development, disk diffusion results (inhibition zones) have not been apparently used for this purpose in the literature and we decided to explore their use in this sense. We developed multiple QSAR methods using several machine learning algorithms (support vector classifier, K nearest neighbors, random forest classifier, decision tree classifier, AdaBoost classifier, logistic regression and naïve Bayes classifier). We used four sets of molecular descriptors and fingerprints and three different methods of data balancing, together with the “native” data set. In total, 32 models were built for each set of descriptors or fingerprint and balancing method, of which 28 were selected and stacked to create meta-models. In terms of balanced accuracy, the best performance was provided by KNN, logistic regression and decision tree classifier, but the ensemble method had slightly superior results in nested cross-validation.
Collapse
Affiliation(s)
- Cosmin Alexandru Bugeac
- Faculty of Pharmacy, Carol Davila University of Medicine and Pharmacy, 6 Traian Vuia Street, Sector 2, 020956 Bucharest, Romania;
| | - Robert Ancuceanu
- Department of Pharmaceutical Botany and Cell Biology, Faculty of Pharmacy, Carol Davila University of Medicine and Pharmacy, 6 Traian Vuia Street, Sector 2, 020956 Bucharest, Romania;
- Correspondence:
| | - Mihaela Dinu
- Department of Pharmaceutical Botany and Cell Biology, Faculty of Pharmacy, Carol Davila University of Medicine and Pharmacy, 6 Traian Vuia Street, Sector 2, 020956 Bucharest, Romania;
| |
Collapse
|
24
|
Liu Y, Zhang D, Tang Y, Zhang Y, Chang Y, Zheng J. Machine Learning-Enabled Design and Prediction of Protein Resistance on Self-Assembled Monolayers and Beyond. ACS APPLIED MATERIALS & INTERFACES 2021; 13:11306-11319. [PMID: 33635641 DOI: 10.1021/acsami.1c00642] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The rational design of highly antifouling materials is crucial for a wide range of fundamental research and practical applications. The immense variety and complexity of the intrinsic physicochemical properties of materials (i.e., chemical structure, hydrophobicity, charge distribution, and molecular weight) and their surface coating properties (i.e., packing density, film thickness and roughness, and chain conformation) make it challenging to rationally design antifouling materials and reveal their fundamental structure-property relationships. In this work, we developed a data-driven machine learning model, a combination of factor analysis of functional group (FAFG), Pearson analysis, random forest (RF) and artificial neural network (ANN) algorithms, and Bayesian statistics, to computationally extract structure/chemical/surface features in correlation with the antifouling activity of self-assembled monolayers (SAMs) from a self-construction data set. The resultant model demonstrates the robustness of QCV2 = 0.90 and RMSECV = 0.21 and the predictive ability of Qext2 = 0.84 and RMSEext = 0.28, determines key descriptors and functional groups important for the antifouling activity, and enables to design original antifouling SAMs using the predicted antifouling functional groups. Three computationally designed molecules were further coated onto the surfaces in different forms of SAMs and polymer brushes. The resultant coatings with negative fouling indexes exhibited strong surface resistance to protein adsorption from undiluted blood serum and plasma, validating the model predictions. The data-driven machine learning model demonstrates their design and predictive capacity for next-generation antifouling materials and surfaces, which hopefully help to accelerate the discovery and understanding of functional materials.
Collapse
Affiliation(s)
- Yonglan Liu
- Department of Chemical, Biomolecular, and Corrosion Engineering, The University of Akron, Ohio 44325, United States
| | - Dong Zhang
- Department of Chemical, Biomolecular, and Corrosion Engineering, The University of Akron, Ohio 44325, United States
| | - Yijing Tang
- Department of Chemical, Biomolecular, and Corrosion Engineering, The University of Akron, Ohio 44325, United States
| | - Yanxian Zhang
- Department of Chemical, Biomolecular, and Corrosion Engineering, The University of Akron, Ohio 44325, United States
| | - Yung Chang
- Department of Chemical Engineering, R&D Center for Membrane Technology, Chung Yuan Christian University, Taoyuan 32023, Taiwan
| | - Jie Zheng
- Department of Chemical, Biomolecular, and Corrosion Engineering, The University of Akron, Ohio 44325, United States
| |
Collapse
|
25
|
Hao Y, Sun G, Fan T, Tang X, Zhang J, Liu Y, Zhang N, Zhao L, Zhong R, Peng Y. In vivo toxicity of nitroaromatic compounds to rats: QSTR modelling and interspecies toxicity relationship with mouse. JOURNAL OF HAZARDOUS MATERIALS 2020; 399:122981. [PMID: 32534390 DOI: 10.1016/j.jhazmat.2020.122981] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 05/14/2020] [Accepted: 05/16/2020] [Indexed: 06/11/2023]
Abstract
Nitroaromatic compounds (NACs) in the environment can cause serious public health and environmental problems due to their potential toxicity. This study established quantitative structure-toxicity relationship (QSTR) models for the acute oral toxicity of NACs towards rats following the stringent OECD principles for QSTR modelling. All models were assessed by various internationally accepted validation metrics and the OECD criteria. The best QSTR model contains seven simple and interpretable 2D descriptors with defined physicochemical meaning. Mechanistic interpretation indicated that van der Waals surface area, presence of C-F at topological distance 6, heteroatom content and frequency of C-N at topological distance 9 are main factors responsible for the toxicity of NACs. This proposed model was successfully applied to a true external set (295 compounds), and prediction reliability was analysed and discussed. Moreover, the rat-mouse and mouse-rat interspecies quantitative toxicity-toxicity relationship (iQTTR) models were also constructed, validated and employed in toxicity prediction for true external sets consisting of 67 and 265 compounds, respectively. These models showed good external predictivity that can be used to rapidly predict the rat oral acute toxicity of new or untested NACs falling within the applicability domain of the models, thus being beneficial in environmental risk assessment and regulatory purposes.
Collapse
Affiliation(s)
- Yuxing Hao
- Beijing Key Laboratory of Environmental and Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing 100124, PR China.
| | - Guohui Sun
- Beijing Key Laboratory of Environmental and Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing 100124, PR China.
| | - Tengjiao Fan
- Beijing Key Laboratory of Environmental and Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing 100124, PR China.
| | - Xiaoyu Tang
- College of Environmental and Energy Engineering, Beijing University of Technology, Beijing 100124, PR China.
| | - Jing Zhang
- Beijing Key Laboratory of Environmental and Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing 100124, PR China.
| | - Yongdong Liu
- Beijing Key Laboratory of Environmental and Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing 100124, PR China.
| | - Na Zhang
- Beijing Key Laboratory of Environmental and Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing 100124, PR China.
| | - Lijiao Zhao
- Beijing Key Laboratory of Environmental and Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing 100124, PR China.
| | - Rugang Zhong
- Beijing Key Laboratory of Environmental and Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing 100124, PR China.
| | - Yongzhen Peng
- National Engineering Laboratory for Advanced Municipal Wastewater Treatment and Reuse Technology, Engineering Research Center of Beijing, Beijing University of Technology, Beijing 100124, PR China.
| |
Collapse
|
26
|
Cappelli CI, Manganelli S, Toma C, Benfenati E, Mombelli E. Prediction of the Partition Coefficient between Adipose Tissue and Blood for Environmental Chemicals: From Single QSAR Models to an Integrated Approach. Mol Inform 2020; 40:e2000072. [PMID: 33135856 DOI: 10.1002/minf.202000072] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 09/07/2020] [Indexed: 12/15/2022]
Abstract
The adipose tissue:blood partition coefficient is a key-endpoint to predict the pharmacokinetics of chemicals in humans and animals, since other organ:blood affinities can be estimated as a function of this parameter. We performed a search in the literature to select all the available rat in vivo data. This approach resulted into two improvements to existing models: a homogeneous definition of the endpoint and an expanded data collection. The resulting dataset was used to develop QSAR models as a function of linear and non-linear algorithms. Several applicability domain definitions were assessed and the definition corresponding to a good balance between performance and coverage was retained. We assessed the pertinence of combining single models into integrated approaches to increase the accuracy in predictions. The best integrated model outperformed the single models and it was characterized by an external mean absolute error (MAE) equal to 0.26, while preserving an adequate coverage (84 %). This performance is comparable to experimental variability and it highlights the pertinence of the integrated model.
Collapse
Affiliation(s)
- Claudia Ileana Cappelli
- Unité Modèles pour l'Ecotoxicologie et la Toxicologie (METO), Institut National de l'Environnement Industriel et des Risques (INERIS), Verneuil en Halatte, France.,Currently at S-IN Soluzioni Informatiche S.r.l., Vicenza, Italy
| | - Serena Manganelli
- Unité Modèles pour l'Ecotoxicologie et la Toxicologie (METO), Institut National de l'Environnement Industriel et des Risques (INERIS), Verneuil en Halatte, France.,Currently at Chemical Food Safety Group, Nestlé Research, Lausanne, Switzerland
| | - Cosimo Toma
- Laboratory of Environmental Chemistry and Toxicology, Department Environmental Health Sciences, IRCCS - Istituto di Ricerche Farmacologiche Mario, Negri, Milan, Italy
| | - Emilio Benfenati
- Laboratory of Environmental Chemistry and Toxicology, Department Environmental Health Sciences, IRCCS - Istituto di Ricerche Farmacologiche Mario, Negri, Milan, Italy
| | - Enrico Mombelli
- Unité Modèles pour l'Ecotoxicologie et la Toxicologie (METO), Institut National de l'Environnement Industriel et des Risques (INERIS), Verneuil en Halatte, France
| |
Collapse
|
27
|
Fuzzy Divisive Hierarchical Clustering of Solvents According to Their Experimentally and Theoretically Predicted Descriptors. Symmetry (Basel) 2020. [DOI: 10.3390/sym12111763] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The present study describes a simple procedure to separate into patterns of similarity a large group of solvents, 259 in total, presented by 15 specific descriptors (experimentally found and theoretically predicted physicochemical parameters). Solvent data is usually characterized by its high variability, different molecular symmetry, and spatial orientation. Methods of chemometrics can usefully be used to extract and explore accurately the information contained in such data. In this order, advanced fuzzy divisive hierarchical-clustering methods were efficiently applied in the present study of a large group of solvents using specific descriptors. The fuzzy divisive hierarchical associative-clustering algorithm provides not only a fuzzy partition of the solvents investigated, but also a fuzzy partition of descriptors considered. In this way, it is possible to identify the most specific descriptors (in terms of higher, smallest, or intermediate values) to each fuzzy partition (group) of solvents. Additionally, the partitioning performed could be interpreted with respect to the molecular symmetry. The chemometric approach used for this goal is fuzzy c-means method being a semi-supervised clustering procedure. The advantage of such a clustering process is the opportunity to achieve separation of the solvents into similarity patterns with a certain degree of membership of each solvent to a certain pattern, as well as to consider possible membership of the same object (solvent) in another cluster. Partitioning based on a hybrid approach of the theoretical molecular descriptors and experimentally obtained ones permits a more straightforward separation into groups of similarity and acceptable interpretation. It was shown that an important link between objects’ groups of similarity and similarity groups of variables is achieved. Ten classes of solvents are interpreted depending on their specific descriptors, as one of the classes includes a single object and could be interpreted as an outlier. Setting the results of this research into broader perspective, it has been shown that the fuzzy clustering approach provides a useful tool for partitioning by the variables related to the main physicochemical properties of the solvents. It gets possible to offer a simple guide for solvents recognition based on theoretically calculated or experimentally found descriptors related to the physicochemical properties of the solvents.
Collapse
|
28
|
Montanari F, Knasmüller B, Kohlbacher S, Hillisch C, Baierová C, Grandits M, Ecker GF. Vienna LiverTox Workspace-A Set of Machine Learning Models for Prediction of Interactions Profiles of Small Molecules With Transporters Relevant for Regulatory Agencies. Front Chem 2020; 7:899. [PMID: 31998690 PMCID: PMC6966498 DOI: 10.3389/fchem.2019.00899] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 12/13/2019] [Indexed: 12/15/2022] Open
Abstract
Transporters expressed in the liver play a major role in drug pharmacokinetics and are a key component of the physiological bile flow. Inhibition of these transporters may lead to drug-drug interactions or even drug-induced liver injury. Therefore, predicting the interaction profile of small molecules with transporters expressed in the liver may help medicinal chemists and toxicologists to prioritize compounds in an early phase of the drug development process. Based on a comprehensive analysis of the data available in the public domain, we developed a set of classification models which allow to predict—for a small molecule—the inhibition of and transport by a set of liver transporters considered to be relevant by FDA, EMA, and the Japanese regulatory agency. The models were validated by cross-validation and external test sets and comprise cross validated balanced accuracies in the range of 0.64–0.88. Finally, models were implemented as an easy to use web-service which is freely available at https://livertox.univie.ac.at.
Collapse
Affiliation(s)
- Floriane Montanari
- Pharmacoinformatics Research Group, Department of Pharmaceutical Chemistry, University of Vienna, Vienna, Austria
| | - Bernhard Knasmüller
- Pharmacoinformatics Research Group, Department of Pharmaceutical Chemistry, University of Vienna, Vienna, Austria
| | - Stefan Kohlbacher
- Pharmacoinformatics Research Group, Department of Pharmaceutical Chemistry, University of Vienna, Vienna, Austria
| | - Christoph Hillisch
- Pharmacoinformatics Research Group, Department of Pharmaceutical Chemistry, University of Vienna, Vienna, Austria
| | - Christine Baierová
- Pharmacoinformatics Research Group, Department of Pharmaceutical Chemistry, University of Vienna, Vienna, Austria
| | - Melanie Grandits
- Pharmacoinformatics Research Group, Department of Pharmaceutical Chemistry, University of Vienna, Vienna, Austria
| | - Gerhard F Ecker
- Pharmacoinformatics Research Group, Department of Pharmaceutical Chemistry, University of Vienna, Vienna, Austria
| |
Collapse
|
29
|
Ancuceanu R, Tamba B, Stoicescu CS, Dinu M. Use of QSAR Global Models and Molecular Docking for Developing New Inhibitors of c-src Tyrosine Kinase. Int J Mol Sci 2019; 21:ijms21010019. [PMID: 31861445 PMCID: PMC6981969 DOI: 10.3390/ijms21010019] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 12/15/2019] [Accepted: 12/16/2019] [Indexed: 12/11/2022] Open
Abstract
A prototype of a family of at least nine members, cellular Src tyrosine kinase is a therapeutically interesting target because its inhibition might be of interest not only in a number of malignancies, but also in a diverse array of conditions, from neurodegenerative pathologies to certain viral infections. Computational methods in drug discovery are considerably cheaper than conventional methods and offer opportunities of screening very large numbers of compounds in conditions that would be simply impossible within the wet lab experimental settings. We explored the use of global quantitative structure-activity relationship (QSAR) models and molecular ligand docking in the discovery of new c-src tyrosine kinase inhibitors. Using a dataset of 1038 compounds from ChEMBL database, we developed over 350 QSAR classification models. A total of 49 models with reasonably good performance were selected and the models were assembled by stacking with a simple majority vote and used for the virtual screening of over 100,000 compounds. A total of 744 compounds were predicted by at least 50% of the QSAR models as active, 147 compounds were within the applicability domain and predicted by at least 75% of the models to be active. The latter 147 compounds were submitted to molecular ligand docking using AutoDock Vina and LeDock, and 89 were predicted to be active based on the energy of binding.
Collapse
Affiliation(s)
- Robert Ancuceanu
- Faculty of Pharmacy, Carol Davila University of Medicine and Pharmacy, 020956 Bucharest, Romania; (R.A.); (M.D.)
| | - Bogdan Tamba
- Advanced Research and Development Center for Experimental Medicine (CEMEX), Grigore T. Popa, University of Medicine and Pharmacy of Iasi, 700115 Iasi, Romania
- Correspondence:
| | - Cristina Silvia Stoicescu
- Department of Chemical Thermodynamics, Institute of Physical Chemistry “Ilie Murgulescu”, 060021 Bucharest, Romania;
| | - Mihaela Dinu
- Faculty of Pharmacy, Carol Davila University of Medicine and Pharmacy, 020956 Bucharest, Romania; (R.A.); (M.D.)
| |
Collapse
|
30
|
Miyao T, Funatsu K. Iterative Screening Methods for Identification of Chemical Compounds with Specific Values of Various Properties. J Chem Inf Model 2019; 59:2626-2641. [PMID: 31058504 DOI: 10.1021/acs.jcim.9b00093] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Identification of chemical compounds having desirable properties is a central goal of screening campaigns. Iterative screening is a means of surveying a set of compounds, during which their property values are determined and used as feedback for regression models. Quantitative models that assess the relationships between chemical structures and property/activity are repeatedly updated through this type of cycle, and the efficient sampling of compounds for the subsequent test is a key factor in the early identification of target compounds. Nevertheless, methodological approaches to comparisons and to establishing the degree of extrapolation of sampled compounds, including the effects of applicability domains, are still required. In the present study, we conducted a series of virtual experiments to assess the characteristics of different iterative screening methods. Genetic algorithm-based partial least-squares regression, support vector regression, Bayesian optimization with Gaussian Process (GP), and batch-based Bayesian optimization with GP (GP_batch) were all compared, based on the analysis of one million compounds extracted from the ZINC database. Our results show that, irrespective of the diversity of the initial set of compounds, it was possible to identify a compound having the desired property value using the appropriate screening method. However, overall, the GP_batch method was found to be preferable when evaluating properties either which are difficult to predict or for which a key factor is present in the set of molecular descriptors.
Collapse
Affiliation(s)
- Tomoyuki Miyao
- Data Science Center and Graduate School of Science and Technology , Nara Institute of Science and Technology , 8916-5 Takayama-cho , Ikoma , Nara 630-0192 , Japan
| | - Kimito Funatsu
- Data Science Center and Graduate School of Science and Technology , Nara Institute of Science and Technology , 8916-5 Takayama-cho , Ikoma , Nara 630-0192 , Japan.,Department of Chemical System Engineering, School of Engineering , The University of Tokyo , 7-3-1 Hongo , Bunkyo-ku , Tokyo 113-8656 , Japan
| |
Collapse
|
31
|
Zhang Y, Wang Y, Zhou W, Fan Y, Zhao J, Zhu L, Lu S, Lu T, Chen Y, Liu H. A combined drug discovery strategy based on machine learning and molecular docking. Chem Biol Drug Des 2019; 93:685-699. [PMID: 30688405 DOI: 10.1111/cbdd.13494] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Revised: 01/04/2019] [Accepted: 01/19/2019] [Indexed: 12/14/2022]
Abstract
Data mining methods based on machine learning play an increasingly important role in drug design and discovery. In the current work, eight machine learning methods including decision trees, k-Nearest neighbor, support vector machines, random forests, extremely randomized trees, AdaBoost, gradient boosting trees, and XGBoost were evaluated comprehensively through a case study of ACC inhibitor data sets. Internal and external data sets were employed for cross-validation of the eight machine learning methods. Results showed that the extremely randomized trees model performed best and was adopted as the first step of virtual screening. Together with structure-based virtual screening in the second step, this combined strategy obtained desirable results. This work indicates that the combination of machine learning methods with traditional structure-based virtual screening can effectively strengthen the ability in finding potential hits from large compound database for a given target.
Collapse
Affiliation(s)
- Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Yuchen Wang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Weineng Zhou
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Yuanrong Fan
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Junnan Zhao
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Lu Zhu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Shuai Lu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Tao Lu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China.,State Key Laboratory of Natural Medicines, China Pharmaceutical University, Nanjing, China
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| |
Collapse
|
32
|
Ancuceanu R, Dinu M, Neaga I, Laszlo FG, Boda D. Development of QSAR machine learning-based models to forecast the effect of substances on malignant melanoma cells. Oncol Lett 2019; 17:4188-4196. [PMID: 31007759 PMCID: PMC6466999 DOI: 10.3892/ol.2019.10068] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Accepted: 11/15/2018] [Indexed: 11/20/2022] Open
Abstract
SK-MEL-5 is a human melanoma cell line that has been used in various studies to explore new therapies against melanoma in different in vitro experiments. Based on this study we report on the development of quantitative structure-activity relationship (QSAR) models able to predict the cytotoxic effect of diverse chemical compounds on this cancer cell line. The dataset of cytotoxic and inactive compounds were downloaded from the PubChem database. It contains the data for all chemical compounds for which cytotoxicity results expressed by GI50 was recorded. In total 13 blocks of molecular descriptors were computed and used, after appropriate pre-processing in building QSAR models with four machine learning classifiers: Random forest (RF), gradient boosting, support vector machine and random k-nearest neighbors. Among the 186 models reported none had a positive predictive value (PPV) higher than 0.90 in both nested cross-validation and on an external dataset testing, but 7 models had a PPV higher than 0.85 in both evaluations, all seven using the RFs algorithm as a classifier, and topological descriptors, information indices, 2D-autocorrelation descriptors, P-VSA-like descriptors, and edge-adjacency descriptors as sets of features used for classification. The y-scrambling test was associated with considerably worse performance (confirming the non-random character of the models) and the applicability domain was assessed through three different methods.
Collapse
Affiliation(s)
- Robert Ancuceanu
- Department of Pharmaceutical Botany and Cell Biology, Faculty of Pharmacy, 'Carol Davila' University of Medicine and Pharmacy, 020956 Bucharest, Romania
| | - Mihaela Dinu
- Department of Pharmaceutical Botany and Cell Biology, Faculty of Pharmacy, 'Carol Davila' University of Medicine and Pharmacy, 020956 Bucharest, Romania
| | - Iana Neaga
- Department of Public Health and Management, Faculty of Medicine, 'Carol Davila' University of Medicine and Pharmacy, 050463 Bucharest, Romania
| | - Fekete Gyula Laszlo
- Department of Dermatology, University of Medicine and Pharmacy of Târgu Mureş, 540142 Târgu Mureş, Romania
| | - Daniel Boda
- Dermatology Research Laboratory, 'Carol Davila' University of Medicine and Pharmacy, 050474 Bucharest, Romania
| |
Collapse
|
33
|
Hanser T, Barber C, Guesné S, Marchaland JF, Werner S. Applicability Domain: Towards a More Formal Framework to Express the Applicability of a Model and the Confidence in Individual Predictions. CHALLENGES AND ADVANCES IN COMPUTATIONAL CHEMISTRY AND PHYSICS 2019. [DOI: 10.1007/978-3-030-16443-0_11] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
34
|
Neighborhood Attribute Reduction: A Multicriterion Strategy Based on Sample Selection. INFORMATION 2018. [DOI: 10.3390/info9110282] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
In the rough-set field, the objective of attribute reduction is to regulate the variations of measures by reducing redundant data attributes. However, most of the previous concepts of attribute reductions were designed by one and only one measure, which indicates that the obtained reduct may fail to meet the constraints given by other measures. In addition, the widely used heuristic algorithm for computing a reduct requires to scan all samples in data, and then time consumption may be too high to be accepted if the size of the data is too large. To alleviate these problems, a framework of attribute reduction based on multiple criteria with sample selection is proposed in this paper. Firstly, cluster centroids are derived from data, and then samples that are far away from the cluster centroids can be selected. This step completes the process of sample selection for reducing data size. Secondly, multiple criteria-based attribute reduction was designed, and the heuristic algorithm was used over the selected samples for computing reduct in terms of multiple criteria. Finally, the experimental results over 12 UCI datasets show that the reducts obtained by our framework not only satisfy the constraints given by multiple criteria, but also provide better classification performance and less time consumption.
Collapse
|
35
|
Villaverde JJ, Sevilla-Morán B, López-Goti C, Alonso-Prados JL, Sandín-España P. Considerations of nano-QSAR/QSPR models for nanopesticide risk assessment within the European legislative framework. THE SCIENCE OF THE TOTAL ENVIRONMENT 2018; 634:1530-1539. [PMID: 29710651 DOI: 10.1016/j.scitotenv.2018.04.033] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Revised: 04/02/2018] [Accepted: 04/03/2018] [Indexed: 06/08/2023]
Abstract
The European market for pesticides is currently legislated through the well-developed Regulation (EC) No. 1107/2009. This regulation promotes the competitiveness of European agriculture, recognizing the necessity of safe pesticides for human and animal health and the environment to protect crops against pests, diseases and weeds. In this sense, nanotechnology can provide a tremendous opportunity to achieve a more rational use of pesticides. However, the lack of information regarding nanopesticides and their fate and behavior in the environment and their effects on human and animal health is inhibiting rapid nanopesticide incorporation into European Union agriculture. This review analyzes the recent state of knowledge on nanopesticide risk assessment, highlighting the challenges that need to be overcame to accelerate the arrival of these new tools for plant protection to European agricultural professionals. Novel nano-Quantitative Structure-Activity/Structure-Property Relationship (nano-QSAR/QSPR) tools for risk assessment are analyzed, including modeling methods and validation procedures towards the potential of these computational instruments to meet the current requirements for authorization of nanoformulations. Future trends on these issues, of pressing importance within the context of the current European pesticide legislative framework, are also discussed. Standard protocols to make high-quality and well-described datasets for the series of related but differently sized nanoparticles/nanopesticides are required.
Collapse
Affiliation(s)
- Juan José Villaverde
- Plant Protection Products Unit, DTEVPF, INIA, Crta, La Coruña, Km. 7.5, 28040 Madrid, Spain.
| | - Beatriz Sevilla-Morán
- Plant Protection Products Unit, DTEVPF, INIA, Crta, La Coruña, Km. 7.5, 28040 Madrid, Spain
| | - Carmen López-Goti
- Plant Protection Products Unit, DTEVPF, INIA, Crta, La Coruña, Km. 7.5, 28040 Madrid, Spain
| | | | - Pilar Sandín-España
- Plant Protection Products Unit, DTEVPF, INIA, Crta, La Coruña, Km. 7.5, 28040 Madrid, Spain
| |
Collapse
|
36
|
Lo YC, Rensi SE, Torng W, Altman RB. Machine learning in chemoinformatics and drug discovery. Drug Discov Today 2018; 23:1538-1546. [PMID: 29750902 DOI: 10.1016/j.drudis.2018.05.010] [Citation(s) in RCA: 483] [Impact Index Per Article: 69.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Revised: 03/29/2018] [Accepted: 05/02/2018] [Indexed: 01/03/2023]
Abstract
Chemoinformatics is an established discipline focusing on extracting, processing and extrapolating meaningful data from chemical structures. With the rapid explosion of chemical 'big' data from HTS and combinatorial synthesis, machine learning has become an indispensable tool for drug designers to mine chemical information from large compound databases to design drugs with important biological properties. To process the chemical data, we first reviewed multiple processing layers in the chemoinformatics pipeline followed by the introduction of commonly used machine learning models in drug discovery and QSAR analysis. Here, we present basic principles and recent case studies to demonstrate the utility of machine learning techniques in chemoinformatics analyses; and we discuss limitations and future directions to guide further development in this evolving field.
Collapse
Affiliation(s)
- Yu-Chen Lo
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Stefano E Rensi
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Wen Torng
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Russ B Altman
- Department of Bioengineering, Stanford University, Stanford, CA, USA.
| |
Collapse
|
37
|
Bitam S, Hamadache M, Hanini S. Prediction of therapeutic potency of tacrine derivatives as BuChE inhibitors from quantitative structure-activity relationship modelling. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2018; 29:213-230. [PMID: 29390887 DOI: 10.1080/1062936x.2018.1423640] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Accepted: 01/01/2018] [Indexed: 06/07/2023]
Abstract
Numerous studies show that tacrine derivatives exhibit increased inhibitory activity against butyrylcholinesterase (BuChE) and acetylcholinesterase (AChE). However, the screening assays for currently available BuChE inhibitors are expensive, time consuming and dependent on the inhibitory compound. It is therefore desirable to develop alternative methods to facilitate the screening of these derivatives in the early phase of drug discovery. In order to develop robust predictive models, three regression methods were chosen in this study: multiple linear regression (MLR), support vector regression (SVR) and multilayer perceptron network (MLP). Eight relevant descriptors were selected on a dataset of 151 molecules using a method based on genetic algorithms. Internal and external validation strategies play an important role. Also, to check the robustness of the selected models, all available validation strategies were used, and all criteria used to validate these models revealed the superiority of the SVR model. The statistical parameters obtained with the SVR model were RMSE = 0.197, r2 = 0.969 and Q2 = 0.964 for the training set, and r2 = 0.906 and Q2 = 0.891 for the test set. Therefore, the model developed in this study provides an excellent prediction of the inhibitory concentration of tacrine derivatives.
Collapse
Affiliation(s)
- S Bitam
- a Département du Génie des Procédés et Environnement , Université de Médéa , Quartier Ain D'heb, Médéa , Algeria
| | - M Hamadache
- a Département du Génie des Procédés et Environnement , Université de Médéa , Quartier Ain D'heb, Médéa , Algeria
| | - S Hanini
- a Département du Génie des Procédés et Environnement , Université de Médéa , Quartier Ain D'heb, Médéa , Algeria
| |
Collapse
|
38
|
Guan D, Fan K, Spence I, Matthews S. QSAR ligand dataset for modelling mutagenicity, genotoxicity, and rodent carcinogenicity. Data Brief 2018. [PMID: 29516034 PMCID: PMC5835004 DOI: 10.1016/j.dib.2018.01.077] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Five datasets were constructed from ligand and bioassay result data from the literature. These datasets include bioassay results from the Ames mutagenicity assay, Greenscreen GADD-45a-GFP assay, Syrian Hamster Embryo (SHE) assay, and 2 year rat carcinogenicity assay results. These datasets provide information about chemical mutagenicity, genotoxicity and carcinogenicity.
Collapse
Affiliation(s)
- Davy Guan
- Pharmacoinformatics Laboratory, Sydney Medical School, The University of Sydney, Australia
| | - Kevin Fan
- Pharmacoinformatics Laboratory, Sydney Medical School, The University of Sydney, Australia
| | - Ian Spence
- Pharmacoinformatics Laboratory, Sydney Medical School, The University of Sydney, Australia
| | - Slade Matthews
- Pharmacoinformatics Laboratory, Sydney Medical School, The University of Sydney, Australia
| |
Collapse
|
39
|
Grisoni F, Ballabio D, Todeschini R, Consonni V. Molecular Descriptors for Structure-Activity Applications: A Hands-On Approach. Methods Mol Biol 2018; 1800:3-53. [PMID: 29934886 DOI: 10.1007/978-1-4939-7899-1_1] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Molecular descriptors capture diverse parts of the structural information of molecules and they are the support of many contemporary computer-assisted toxicological and chemical applications. After briefly introducing some fundamental concepts of structure-activity applications (e.g., molecular descriptor dimensionality, classical vs. fingerprint description, and activity landscapes), this chapter guides the readers through a step-by-step explanation of molecular descriptors rationale and application. To this end, the chapter illustrates a case study of a recently published application of molecular descriptors for modeling the activity on cytochrome P450.
Collapse
Affiliation(s)
- Francesca Grisoni
- Department of Earth and Environmental Sciences, Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy.
| | - Davide Ballabio
- Department of Earth and Environmental Sciences, Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy
| | - Roberto Todeschini
- Department of Earth and Environmental Sciences, Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy
| | - Viviana Consonni
- Department of Earth and Environmental Sciences, Milano Chemometrics and QSAR Research Group, University of Milano-Bicocca, Milan, Italy
| |
Collapse
|
40
|
Algamal ZY, Qasim MK, Ali HTM. A QSAR classification model for neuraminidase inhibitors of influenza A viruses (H1N1) based on weighted penalized support vector machine. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2017; 28:415-426. [PMID: 28539063 DOI: 10.1080/1062936x.2017.1326402] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2017] [Accepted: 05/01/2017] [Indexed: 06/07/2023]
Abstract
Descriptor selection is a procedure widely used in chemometrics. The aim is to select the best subset of descriptors relevant to the quantitative structure-activity relationship (QSAR) study being considered. In this paper, a new descriptor selection method for the QSAR classification model is proposed by adding a new weight inside L1-norm. The experimental results from classifying the neuraminidase inhibitors of influenza A viruses (H1N1) demonstrate that the proposed method in the QSAR classification model performs effectively and competitively compared with other existing penalized methods in terms of classification performance and the number of selected descriptors.
Collapse
Affiliation(s)
- Z Y Algamal
- a Department of Statistics and Informatics , University of Mosul , Mosul , Iraq
| | - M K Qasim
- b Department of General Science , University of Mosul , Mosul , Iraq
| | - H T M Ali
- c College of Computers and Information Technology , Nawroz University , Duhok , Iraq
| |
Collapse
|
41
|
Zang Q, Mansouri K, Williams AJ, Judson RS, Allen DG, Casey WM, Kleinstreuer NC. In Silico Prediction of Physicochemical Properties of Environmental Chemicals Using Molecular Fingerprints and Machine Learning. J Chem Inf Model 2017; 57:36-49. [PMID: 28006899 PMCID: PMC6131700 DOI: 10.1021/acs.jcim.6b00625] [Citation(s) in RCA: 93] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
There are little available toxicity data on the vast majority of chemicals in commerce. High-throughput screening (HTS) studies, such as those being carried out by the U.S. Environmental Protection Agency (EPA) ToxCast program in partnership with the federal Tox21 research program, can generate biological data to inform models for predicting potential toxicity. However, physicochemical properties are also needed to model environmental fate and transport, as well as exposure potential. The purpose of the present study was to generate an open-source quantitative structure-property relationship (QSPR) workflow to predict a variety of physicochemical properties that would have cross-platform compatibility to integrate into existing cheminformatics workflows. In this effort, decades-old experimental property data sets available within the EPA EPI Suite were reanalyzed using modern cheminformatics workflows to develop updated QSPR models capable of supplying computationally efficient, open, and transparent HTS property predictions in support of environmental modeling efforts. Models were built using updated EPI Suite data sets for the prediction of six physicochemical properties: octanol-water partition coefficient (logP), water solubility (logS), boiling point (BP), melting point (MP), vapor pressure (logVP), and bioconcentration factor (logBCF). The coefficient of determination (R2) between the estimated values and experimental data for the six predicted properties ranged from 0.826 (MP) to 0.965 (BP), with model performance for five of the six properties exceeding those from the original EPI Suite models. The newly derived models can be employed for rapid estimation of physicochemical properties within an open-source HTS workflow to inform fate and toxicity prediction models of environmental chemicals.
Collapse
Affiliation(s)
- Qingda Zang
- Integrated Laboratory Systems, Inc., Research Triangle Park, NC 27709, USA
| | - Kamel Mansouri
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | - Antony J. Williams
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | - Richard S. Judson
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | - David G. Allen
- Integrated Laboratory Systems, Inc., Research Triangle Park, NC 27709, USA
| | - Warren M. Casey
- National Toxicology Program, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA
| | - Nicole C. Kleinstreuer
- National Toxicology Program, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA
| |
Collapse
|
42
|
Shoombuatong W, Prathipati P, Owasirikul W, Worachartcheewan A, Simeon S, Anuwongcharoen N, Wikberg JES, Nantasenamat C. Towards the Revival of Interpretable QSAR Models. CHALLENGES AND ADVANCES IN COMPUTATIONAL CHEMISTRY AND PHYSICS 2017. [DOI: 10.1007/978-3-319-56850-8_1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
43
|
Aniceto N, Freitas AA, Bender A, Ghafourian T. A novel applicability domain technique for mapping predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood. J Cheminform 2016. [PMCID: PMC5395519 DOI: 10.1186/s13321-016-0182-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
The ability to define the regions of chemical space where a predictive model can be safely used is a necessary condition to assure the reliability of new predictions. This implies that reliability must be determined across chemical space in the attempt to localize “safe” and “unsafe” regions for prediction. As a result we devised an applicability domain technique that addresses the data locally instead of handling it as a whole—the reliability-density neighbourhood (RDN). The main novelty aspect of this method is that it characterizes each single training instance according to the density of its neighbourhood in the training set, as well as its individual bias and precision. By scanning through the chemical space (by iteratively increasing the applicability domain area), it was observed that new test compounds are successively included into the applicability domain region in such a manner that strongly correlates to their predictive performance. This allows the mapping of local reliability across different locations in the training set space, and thus allows identifying regions where the model has low reliability. This method also showed matching profiles between two external sets, which is an indication that it performs robustly with new data. Another novel aspect in this technique is that it is paired with a specific feature selection algorithm. As a result, the impact of the feature set used was studied from which the top 20 features selected by ReliefF yielded the best results, as opposed to using the model’s features or the entire feature set as commonly done. As the third novel aspect, in this work we propose a new scoring function to help evaluate the quality of an applicability domain profile (i.e., the curve of accuracy vs the applicability domain measure in question). Overall, the RDN showed to be a promising method that can correctly sort new instances according to predictive performance. As a result, this technique can be received by an end-user as proof of concept for the performance of a QSAR model in new data, thus promoting the user’s trust on the QSAR output.. ![]()
Collapse
|
44
|
Nembri S, Grisoni F, Consonni V, Todeschini R. In Silico Prediction of Cytochrome P450-Drug Interaction: QSARs for CYP3A4 and CYP2C9. Int J Mol Sci 2016; 17:ijms17060914. [PMID: 27294921 PMCID: PMC4926447 DOI: 10.3390/ijms17060914] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2016] [Revised: 06/01/2016] [Accepted: 06/06/2016] [Indexed: 11/16/2022] Open
Abstract
Cytochromes P450 (CYP) are the main actors in the oxidation of xenobiotics and play a crucial role in drug safety, persistence, bioactivation, and drug-drug/food-drug interaction. This work aims to develop Quantitative Structure-Activity Relationship (QSAR) models to predict the drug interaction with two of the most important CYP isoforms, namely 2C9 and 3A4. The presented models are calibrated on 9122 drug-like compounds, using three different modelling approaches and two types of molecular description (classical molecular descriptors and binary fingerprints). For each isoform, three classification models are presented, based on a different approach and with different advantages: (1) a very simple and interpretable classification tree; (2) a local (k-Nearest Neighbor) model based classical descriptors and; (3) a model based on a recently proposed local classifier (N-Nearest Neighbor) on binary fingerprints. The salient features of the work are (1) the thorough model validation and the applicability domain assessment; (2) the descriptor interpretation, which highlighted the crucial aspects of P450-drug interaction; and (3) the consensus aggregation of models, which largely increased the prediction accuracy.
Collapse
Affiliation(s)
- Serena Nembri
- Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.za della Scienza 1, 20126 Milano, Italy.
| | - Francesca Grisoni
- Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.za della Scienza 1, 20126 Milano, Italy.
| | - Viviana Consonni
- Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.za della Scienza 1, 20126 Milano, Italy.
| | - Roberto Todeschini
- Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.za della Scienza 1, 20126 Milano, Italy.
| |
Collapse
|
45
|
Zhang S, Cheng D, Zong M, Gao L. Self-representation nearest neighbor search for classification. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.08.115] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
46
|
|
47
|
Mathea M, Klingspohn W, Baumann K. Chemoinformatic Classification Methods and their Applicability Domain. Mol Inform 2016; 35:160-80. [PMID: 27492083 DOI: 10.1002/minf.201501019] [Citation(s) in RCA: 87] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Accepted: 01/20/2016] [Indexed: 11/08/2022]
Abstract
Classification rules are often used in chemoinformatics to predict categorical properties of drug candidates related to bioactivity from explanatory variables, which encode the respective molecular structures (i.e. molecular descriptors). To avoid predictions with an unduly large error probability, the domain the classifier is applied to should be restricted to the domain covered by the training set objects. This latter domain is commonly referred to as applicability domain in chemoinformatics. Conceptually, the applicability domain defines the region in space where the "normal" objects are located. Defining the border of the applicability domain may then be viewed as detecting anomalous or novel objects or as detecting outliers. Currently two different types of measures are in use. The first one defines the applicability domain solely in terms of the molecular descriptor space, which is referred to as novelty detection. The second type defines the applicability domain in terms of the expected reliability of the predictions which is referred to as confidence estimation. Both types are systematically differentiated here and the most popular measures are reviewed. It will be shown that all common chemoinformatic classifiers have built-in confidence scores. Since confidence estimation uses information of the class labels for computing the confidence scores, it is expected to be more efficient in reducing the error rate than novelty detection, which solely uses the information of the explanatory variables.
Collapse
Affiliation(s)
- Miriam Mathea
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106 Braunschweig, Germany
| | - Waldemar Klingspohn
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106 Braunschweig, Germany
| | - Knut Baumann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, 38106 Braunschweig, Germany.
| |
Collapse
|
48
|
Ain QU, Méndez-Lucio O, Ciriano IC, Malliavin T, van Westen GJP, Bender A. Modelling ligand selectivity of serine proteases using integrative proteochemometric approaches improves model performance and allows the multi-target dependent interpretation of features. Integr Biol (Camb) 2015; 6:1023-33. [PMID: 25255469 DOI: 10.1039/c4ib00175c] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Serine proteases, implicated in important physiological functions, have a high intra-family similarity, which leads to unwanted off-target effects of inhibitors with insufficient selectivity. However, the availability of sequence and structure data has now made it possible to develop approaches to design pharmacological agents that can discriminate successfully between their related binding sites. In this study, we have quantified the relationship between 12,625 distinct protease inhibitors and their bioactivity against 67 targets of the serine protease family (20,213 data points) in an integrative manner, using proteochemometric modelling (PCM). The benchmarking of 21 different target descriptors motivated the usage of specific binding pocket amino acid descriptors, which helped in the identification of active site residues and selective compound chemotypes affecting compound affinity and selectivity. PCM models performed better than alternative approaches (models trained using exclusively compound descriptors on all available data, QSAR) employed for comparison with R(2)/RMSE values of 0.64 ± 0.23/0.66 ± 0.20 vs. 0.35 ± 0.27/1.05 ± 0.27 log units, respectively. Moreover, the interpretation of the PCM model singled out various chemical substructures responsible for bioactivity and selectivity towards particular proteases (thrombin, trypsin and coagulation factor 10) in agreement with the literature. For instance, absence of a tertiary sulphonamide was identified to be responsible for decreased selective activity (by on average 0.27 ± 0.65 pChEMBL units) on FA10. Among the binding pocket residues, the amino acids (arginine, leucine and tyrosine) at positions 35, 39, 60, 93, 140 and 207 were observed as key contributing residues for selective affinity on these three targets.
Collapse
Affiliation(s)
- Qurrat U Ain
- Centre for Molecular Informatics, Department of Chemistry, Lensfield Road, CB2 1EW, University of Cambridge, UK.
| | | | | | | | | | | |
Collapse
|
49
|
Sheridan RP. The Relative Importance of Domain Applicability Metrics for Estimating Prediction Errors in QSAR Varies with Training Set Diversity. J Chem Inf Model 2015; 55:1098-107. [DOI: 10.1021/acs.jcim.5b00110] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Robert P. Sheridan
- Cheminformatics Department, RY800B-305, Merck Research Laboratories, Rahway, New Jersey 07065, United States
| |
Collapse
|
50
|
Li H, Zhong Z, Li L, Gao R, Cui J, Gao T, Hu LH, Lu Y, Su ZM, Li H. A cascaded QSAR model for efficient prediction of overall power conversion efficiency of all-organic dye-sensitized solar cells. J Comput Chem 2015; 36:1036-46. [DOI: 10.1002/jcc.23886] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2014] [Revised: 11/25/2014] [Accepted: 02/08/2015] [Indexed: 01/19/2023]
Affiliation(s)
- Hongzhi Li
- School of Computer Science and Information Technology; Northeast Normal University; Changchun 130117 China
| | - Ziyan Zhong
- School of Computer Science and Information Technology; Northeast Normal University; Changchun 130117 China
| | - Lin Li
- School of Computer Science and Information Technology; Northeast Normal University; Changchun 130117 China
| | - Rui Gao
- Institute of Functional Material Chemistry, Faculty of Chemistry, Northeast Normal University; Changchun 130024 China
| | - Jingxia Cui
- Institute of Functional Material Chemistry, Faculty of Chemistry, Northeast Normal University; Changchun 130024 China
| | - Ting Gao
- School of Computer Science and Information Technology; Northeast Normal University; Changchun 130117 China
| | - Li Hong Hu
- School of Computer Science and Information Technology; Northeast Normal University; Changchun 130117 China
| | - Yinghua Lu
- School of Computer Science and Information Technology; Northeast Normal University; Changchun 130117 China
- Institute of Functional Material Chemistry, Faculty of Chemistry, Northeast Normal University; Changchun 130024 China
| | - Zhong-Min Su
- Institute of Functional Material Chemistry, Faculty of Chemistry, Northeast Normal University; Changchun 130024 China
| | - Hui Li
- School of Computer Science and Information Technology; Northeast Normal University; Changchun 130117 China
| |
Collapse
|