1
|
Piir G, Sild S, Maran U. Interpretable machine learning for the identification of estrogen receptor agonists, antagonists, and binders. CHEMOSPHERE 2024; 347:140671. [PMID: 37951393 DOI: 10.1016/j.chemosphere.2023.140671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 10/25/2023] [Accepted: 11/07/2023] [Indexed: 11/14/2023]
Abstract
An abnormal hormonal activity or exposure to endocrine-disrupting chemicals (EDCs) can cause endocrine system malfunction. Among the many interactions EDCs can affect is the disruption of estrogen signalling, which can lead to adverse health effects such as cancer, osteoporosis, neurodegenerative diseases, cardiovascular disease, insulin resistance, and obesity. Knowing which chemical can act as an EDC is a significant advantage and a practical necessity. New Approach Methodologies (NAM) computational models offer a quick and cost-effective solution for preliminary hazard assessment of chemicals without animal testing. Therefore, a machine learning approach was used to investigate the relationships between estrogen receptor (ER) activity and chemical structure to identify chemicals that can interact with ER. For this purpose, the consolidated in vitro assay data from ToxCast/Tox21 projects was used for developing Random Forest classification models for ER binding, agonists, and antagonists. The overall classification prediction accuracy reaches up to 82%, depending on whether the model predicted agonists, antagonists, or compounds that bind to the active site. Given the imbalance in endocrine disruption data, the derived models are good candidates for deprioritising chemicals and reducing animal testing. The interpretation of theoretical molecular descriptors of the models was consistent with the molecular interactions known in the ligand binding pocket. The estimated class probabilities enabled the analysis of the applicability domain of the developed models and the assessment of the predictions' reliability, followed by the guidelines for interpreting prediction results. The models are openly accessible and useable at QsarDB.org (http://dx.doi.org/10.15152/QDB.259) according to the FAIR (Findable, Accessible, Interoperable, Reusable) principles.
Collapse
Affiliation(s)
- Geven Piir
- Institute of Chemistry, University of Tartu, Ravila 14A, Tartu, 50411, Estonia
| | - Sulev Sild
- Institute of Chemistry, University of Tartu, Ravila 14A, Tartu, 50411, Estonia
| | - Uko Maran
- Institute of Chemistry, University of Tartu, Ravila 14A, Tartu, 50411, Estonia.
| |
Collapse
|
2
|
Piir G, Sild S, Maran U. Binary and multi-class classification for androgen receptor agonists, antagonists and binders. CHEMOSPHERE 2021; 262:128313. [PMID: 33182081 DOI: 10.1016/j.chemosphere.2020.128313] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Revised: 08/24/2020] [Accepted: 09/10/2020] [Indexed: 06/11/2023]
Abstract
Androgens and androgen receptor regulate a variety of biological effects in the human body. The impaired functioning of androgen receptor may have different adverse health effects from cancer to infertility. Therefore, it is important to determine whether new chemicals have any binding activity and act as androgen agonists or antagonists before commercial use. Due to the large number of chemicals that require experimental testing, the computational methods are a viable alternative. Therefore, the aim of the present study was to develop predictive QSAR models for classifying compounds according to their activity at the androgen receptor. A large data set of chemicals from the CoMPARA project was used for this purpose and random forest classification models have been developed for androgen binding, agonistic, and antagonistic activity. In addition, a unique effort has been made for multi-class approach that discriminates between inactive compounds, agonists and antagonists simultaneously. For the evaluation set, the classification models predicted agonists with 80% of accuracy and for the antagonists' and binders' the respective metrics were 72% and 78%. Combining agonists, antagonists and inactive compounds into a multi-class approach added complexity to the modelling task and resulted to 64% prediction accuracy for the evaluation set. Considering the size of the training data sets and their imbalance, the achieved evaluation accuracy is very good. The final classification models are available for exploring and predicting at QsarDB repository (https://doi.org/10.15152/QDB.236).
Collapse
Affiliation(s)
- Geven Piir
- University of Tartu, Institute of Chemistry, Ravila 14A, Tartu, 50411, Estonia
| | - Sulev Sild
- University of Tartu, Institute of Chemistry, Ravila 14A, Tartu, 50411, Estonia
| | - Uko Maran
- University of Tartu, Institute of Chemistry, Ravila 14A, Tartu, 50411, Estonia.
| |
Collapse
|
3
|
Mukherjee R, Beykal B, Szafran AT, Onel M, Stossi F, Mancini MG, Lloyd D, Wright FA, Zhou L, Mancini MA, Pistikopoulos EN. Classification of estrogenic compounds by coupling high content analysis and machine learning algorithms. PLoS Comput Biol 2020; 16:e1008191. [PMID: 32970665 PMCID: PMC7538107 DOI: 10.1371/journal.pcbi.1008191] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 10/06/2020] [Accepted: 07/25/2020] [Indexed: 12/28/2022] Open
Abstract
Environmental toxicants affect human health in various ways. Of the thousands of chemicals present in the environment, those with adverse effects on the endocrine system are referred to as endocrine-disrupting chemicals (EDCs). Here, we focused on a subclass of EDCs that impacts the estrogen receptor (ER), a pivotal transcriptional regulator in health and disease. Estrogenic activity of compounds can be measured by many in vitro or cell-based high throughput assays that record various endpoints from large pools of cells, and increasingly at the single-cell level. To simultaneously capture multiple mechanistic ER endpoints in individual cells that are affected by EDCs, we previously developed a sensitive high throughput/high content imaging assay that is based upon a stable cell line harboring a visible multicopy ER responsive transcription unit and expressing a green fluorescent protein (GFP) fusion of ER. High content analysis generates voluminous multiplex data comprised of minable features that describe numerous mechanistic endpoints. In this study, we present a machine learning pipeline for rapid, accurate, and sensitive assessment of the endocrine-disrupting potential of benchmark chemicals based on data generated from high content analysis. The multidimensional imaging data was used to train a classification model to ultimately predict the impact of unknown compounds on the ER, either as agonists or antagonists. To this end, both linear logistic regression and nonlinear Random Forest classifiers were benchmarked and evaluated for predicting the estrogenic activity of unknown compounds. Furthermore, through feature selection, data visualization, and model discrimination, the most informative features were identified for the classification of ER agonists/antagonists. The results of this data-driven study showed that highly accurate and generalized classification models with a minimum number of features can be constructed without loss of generality, where these machine learning models serve as a means for rapid mechanistic/phenotypic evaluation of the estrogenic potential of many chemicals. Chemical contaminants or toxicants pose environmental and health-related risks for exposure. The ability to rapidly understand their biological impact, specifically on a key modulator of important physiological and pathological states in the human body is essential for diagnosing and avoiding undesirable health outcomes during environmental emergencies. In this study, we use advanced data analytics for creating statistical models that can accurately predict the endocrinological activity of toxic chemicals based on high throughput/high content image analysis data. We focus on a subclass of chemicals that affect the estrogen receptor (ER), which is a pivotal transcriptional regulator in health and disease. The multidimensional imaging data of these benchmark chemicals are used to train a classification model to ultimately predict the impact of unknown compounds on the ER, either as agonists or antagonists. To this end, we evaluate linear and nonlinear classifiers for predicting the estrogenic activity of unknown compounds and use feature selection, data visualization, and model discrimination methodologies to identify the most informative features for the classification of ER agonists/antagonists.
Collapse
Affiliation(s)
- Rajib Mukherjee
- Texas A&M Energy Institute, Texas A&M University, College Station, TX, United States of America
| | - Burcu Beykal
- Texas A&M Energy Institute, Texas A&M University, College Station, TX, United States of America
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, United States of America
| | - Adam T. Szafran
- Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, United States of America
| | - Melis Onel
- Texas A&M Energy Institute, Texas A&M University, College Station, TX, United States of America
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, United States of America
| | - Fabio Stossi
- Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, United States of America
- GCC Center for Advanced Microscopy and Image Informatics, Houston, TX, United States of America
| | - Maureen G. Mancini
- Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, United States of America
- GCC Center for Advanced Microscopy and Image Informatics, Houston, TX, United States of America
| | - Dillon Lloyd
- Bioinformatics Research Center, Center for Human Health and the Environment, Department of Statistics, North Carolina State University, Raleigh, NC, United States of America
| | - Fred A. Wright
- Bioinformatics Research Center, Center for Human Health and the Environment, Department of Statistics, North Carolina State University, Raleigh, NC, United States of America
| | - Lan Zhou
- Department of Statistics, Texas A&M University, College Station, TX, United States of America
| | - Michael A. Mancini
- Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, United States of America
- GCC Center for Advanced Microscopy and Image Informatics, Houston, TX, United States of America
- Texas A&M University Institute for Bioscience and Technology, Houston, TX, United States of America
- Pharmacology and Chemical Genomics, Baylor College of Medicine, Houston, TX, United States of America
| | - Efstratios N. Pistikopoulos
- Texas A&M Energy Institute, Texas A&M University, College Station, TX, United States of America
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, United States of America
- * E-mail:
| |
Collapse
|
4
|
Gonzalez MA, Takkellapati S, Tadele K, Li T, Varma RS. Framework towards more Sustainable Chemical Synthesis Design - A Case Study of Organophosphates. ACS SUSTAINABLE CHEMISTRY & ENGINEERING 2019; 7:6744-6757. [PMID: 32280570 PMCID: PMC7147815 DOI: 10.1021/acssuschemeng.8b06038] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
In recent years, the advancement of sustainable chemistry concepts and approaches along with their demonstrated application has become a central part of the design, synthesis, and manufacture of a chemical. Sustainable chemistry not only utilizes the principles of green chemistry, but also expands to incorporate economic, societal, and environmental aspects. This is further elucidated by the incorporation of life cycle assessment/thinking to include the raw material production, manufacture, processing, and use and disposal stages, allowing for a comprehensive evaluation of the environmental and human health impacts attributed to a chemical. This contribution outlines an approach for the development of a preliminary framework for the sustainable synthesis of a chemical that is identified as an alternative for an existing chemical of concern. The framework is introduced concurrently with a case study for organophosphates that are selected as potential replacements for brominated flame retardants (BFRs). This framework is designed to apply existing knowledge of green chemistry to the synthesis of alternatives, along with its integration into Life Cycle Assessment culminating in the development of a more overall sustainable chemical entity when compared to its predecessor.
Collapse
Affiliation(s)
- Michael A. Gonzalez
- United States Environmental Protection Agency, Office
of Research and Development, National Risk Management Research Laboratory, Land
and Materials Management Division, 26 W. Martin Luther King Dr., Cincinnati,
Ohio 45268
| | - Sudhakar Takkellapati
- United States Environmental Protection Agency, Office
of Research and Development, National Risk Management Research Laboratory, Land
and Materials Management Division, 26 W. Martin Luther King Dr., Cincinnati,
Ohio 45268
| | - Kidus Tadele
- Oak Ridge Institute for Science and Education, Oak
Ridge TN, 37831
| | - Tao Li
- United States Environmental Protection Agency, Office
of Research and Development, National Risk Management Research Laboratory, Land
and Materials Management Division, 26 W. Martin Luther King Dr., Cincinnati,
Ohio 45268
| | - Rajender S. Varma
- United States Environmental Protection Agency, Office
of Research and Development, National Risk Management Research Laboratory, Land
and Materials Management Division, 26 W. Martin Luther King Dr., Cincinnati,
Ohio 45268
| |
Collapse
|
5
|
Russo DP, Zorn KM, Clark AM, Zhu H, Ekins S. Comparing Multiple Machine Learning Algorithms and Metrics for Estrogen Receptor Binding Prediction. Mol Pharm 2018; 15:4361-4370. [PMID: 30114914 PMCID: PMC6181119 DOI: 10.1021/acs.molpharmaceut.8b00546] [Citation(s) in RCA: 99] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Many chemicals that disrupt endocrine function have been linked to a variety of adverse biological outcomes. However, screening for endocrine disruption using in vitro or in vivo approaches is costly and time-consuming. Computational methods, e.g., quantitative structure-activity relationship models, have become more reliable due to bigger training sets, increased computing power, and advanced machine learning algorithms, such as multilayered artificial neural networks. Machine learning models can be used to predict compounds for endocrine disrupting capabilities, such as binding to the estrogen receptor (ER), and allow for prioritization and further testing. In this work, an exhaustive comparison of multiple machine learning algorithms, chemical spaces, and evaluation metrics for ER binding was performed on public data sets curated using in-house cheminformatics software (Assay Central). Chemical features utilized in modeling consisted of binary fingerprints (ECFP6, FCFP6, ToxPrint, or MACCS keys) and continuous molecular descriptors from RDKit. Each feature set was subjected to classic machine learning algorithms (Bernoulli Naive Bayes, AdaBoost Decision Tree, Random Forest, Support Vector Machine) and Deep Neural Networks (DNN). Models were evaluated using a variety of metrics: recall, precision, F1-score, accuracy, area under the receiver operating characteristic curve, Cohen's Kappa, and Matthews correlation coefficient. For predicting compounds within the training set, DNN has an accuracy higher than that of other methods; however, in 5-fold cross validation and external test set predictions, DNN and most classic machine learning models perform similarly regardless of the data set or molecular descriptors used. We have also used the rank normalized scores as a performance-criteria for each machine learning method, and Random Forest performed best on the validation set when ranked by metric or by data sets. These results suggest classic machine learning algorithms may be sufficient to develop high quality predictive models of ER activity.
Collapse
Affiliation(s)
- Daniel P. Russo
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ, 08102, USA
- first author
| | - Kimberley M. Zorn
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
- first author
| | - Alex M. Clark
- Molecular Materials Informatics, Inc., Montreal, Quebec, Canada
| | - Hao Zhu
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ, 08102, USA
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC 27606, USA
| |
Collapse
|
6
|
Gadaleta D, Manganelli S, Roncaglioni A, Toma C, Benfenati E, Mombelli E. QSAR Modeling of ToxCast Assays Relevant to the Molecular Initiating Events of AOPs Leading to Hepatic Steatosis. J Chem Inf Model 2018; 58:1501-1517. [PMID: 29949360 DOI: 10.1021/acs.jcim.8b00297] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Nonalcoholic hepatic steatosis is a worldwide epidemiological concern since it is among the most prominent hepatic diseases. Indeed, research in toxicology and epidemiology has gathered evidence that exposure to endocrine disruptors can perturb cellular homeostasis and cause this disease. Therefore, assessing the likelihood of a chemical to trigger hepatic steatosis is a matter of the utmost importance. However, systematic in vivo testing of all the chemicals humans are exposed to is not feasible for ethical and economical reasons. In this context, predicting the molecular initiating events (MIE) leading to hepatic steatosis by QSAR modeling is an issue of practical relevance in modern toxicology. In this article, we present QSAR models based on random forest classifiers and DRAGON molecular descriptors for the prediction of in vitro assays that are relevant to MIEs leading to hepatic steatosis. These assays were provided by the ToxCast program and proved to be predictive for the detection of chemical-induced steatosis. During the modeling process, special attention was paid to chemical and toxicological data curation. We adopted two modeling strategies (undersampling and balanced random forests) to develop robust QSAR models from unbalanced data sets. The two modeling approaches gave similar results in terms of predictivity, and most of the models satisfy a minimum percentage of correctly predicted chemicals equal to 75%. Finally, and most importantly, the developed models proved to be useful as an effective in silico screening test for hepatic steatosis.
Collapse
Affiliation(s)
- Domenico Gadaleta
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences , IRCCS - Istituto di Ricerche Farmacologiche Mario Negri , Via la Masa 19 , 20156 Milano , Italy
| | - Serena Manganelli
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences , IRCCS - Istituto di Ricerche Farmacologiche Mario Negri , Via la Masa 19 , 20156 Milano , Italy
| | - Alessandra Roncaglioni
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences , IRCCS - Istituto di Ricerche Farmacologiche Mario Negri , Via la Masa 19 , 20156 Milano , Italy
| | - Cosimo Toma
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences , IRCCS - Istituto di Ricerche Farmacologiche Mario Negri , Via la Masa 19 , 20156 Milano , Italy
| | - Emilio Benfenati
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences , IRCCS - Istituto di Ricerche Farmacologiche Mario Negri , Via la Masa 19 , 20156 Milano , Italy
| | - Enrico Mombelli
- Unité Modèles pour l'Ecotoxicologie et la Toxicologie (METO) , Institut National de l'Environnement Industriel et des Risques (INERIS) , 60550 Verneuil en Halatte , France
| |
Collapse
|
7
|
Martin TM. A framework for an alternatives assessment dashboard for evaluating chemical alternatives applied to flame retardants for electronic applications. CLEAN TECHNOLOGIES AND ENVIRONMENTAL POLICY 2017; 19:1067-1086. [PMID: 29333139 PMCID: PMC5759784 DOI: 10.1007/s10098-016-1300-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
The goal of alternatives assessment (AA) is to facilitate a comparison of alternatives to a chemical of concern, resulting in the identification of safer alternatives. A two stage methodology for comparing chemical alternatives was developed. In the first stage, alternatives are compared using a variety of human health effects, ecotoxicity, and physicochemical properties. Hazard profiles are completed using a variety of online sources and quantitative structure activity relationship models. In the second stage, alternatives are evaluated utilizing an exposure/risk assessment over the entire life cycle. Exposure values are calculated using screening-level near-field and far-field exposure models. The second stage allows one to more accurately compare potential exposure to each alternative and consider additional factors that may not be obvious from separate binned persistence, bioaccumulation, and toxicity scores. The methodology was utilized to compare phosphate-based alternatives for decabromodiphenyl ether (decaBDE) in electronics applications.
Collapse
Affiliation(s)
- Todd M. Martin
- National Risk Management Research Laboratory, U.S.
Environmental Protection Agency, 26 W. Martin Luther King Dr., Cincinnati, OH,
45268, USA
| |
Collapse
|
8
|
Du H, Cai Y, Yang H, Zhang H, Xue Y, Liu G, Tang Y, Li W. In Silico Prediction of Chemicals Binding to Aromatase with Machine Learning Methods. Chem Res Toxicol 2017; 30:1209-1218. [PMID: 28414904 DOI: 10.1021/acs.chemrestox.7b00037] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Environmental chemicals may affect endocrine systems through multiple mechanisms, one of which is via effects on aromatase (also known as CYP19A1), an enzyme critical for maintaining the normal balance of estrogens and androgens in the body. Therefore, rapid and efficient identification of aromatase-related endocrine disrupting chemicals (EDCs) is important for toxicology and environment risk assessment. In this study, on the basis of the Tox21 10K compound library, in silico classification models for predicting aromatase binders/nonbinders were constructed by machine learning methods. To improve the prediction ability of the models, a combined classifier (CC) strategy that combines different independent machine learning methods was adopted. Performances of the models were measured by test and external validation sets containing 1336 and 216 chemicals, respectively. The best model was obtained with the MACCS (Molecular Access System) fingerprint and CC method, which exhibited an accuracy of 0.84 for the test set and 0.91 for the external validation set. Additionally, several representative substructures for characterizing aromatase binders, such as ketone, lactone, and nitrogen-containing derivatives, were identified using information gain and substructure frequency analysis. Our study provided a systematic assessment of chemicals binding to aromatase. The built models can be helpful to rapidly identify potential EDCs targeting aromatase.
Collapse
Affiliation(s)
- Hanwen Du
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology , Shanghai 200237, China
| | - Yingchun Cai
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology , Shanghai 200237, China
| | - Hongbin Yang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology , Shanghai 200237, China
| | - Hongxiao Zhang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology , Shanghai 200237, China
| | - Yuhan Xue
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology , Shanghai 200237, China
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology , Shanghai 200237, China
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology , Shanghai 200237, China
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology , Shanghai 200237, China
| |
Collapse
|
9
|
Zhu XW, Xin YJ, Chen QH. Chemical and in vitro biological information to predict mouse liver toxicity using recursive random forests. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2016; 27:559-572. [PMID: 27353437 DOI: 10.1080/1062936x.2016.1201142] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2016] [Accepted: 06/09/2016] [Indexed: 06/06/2023]
Abstract
In this study, recursive random forests were used to build classification models for mouse liver toxicity. The mouse liver toxicity endpoint (67 toxic and 166 non-toxic) was a composition of four in vivo chronic systemic and carcinogenic toxicity endpoints (non-proliferative, neoplastic, proliferative and gross pathology). A multiple under-sampling approach and a shifted classification threshold of 0.288 (non-toxic < 0.288 and toxic ≥ 0.288) were used to cope with the unbalanced data. Our study showed that recursive random forests are very efficient in variable selection and for the development of predictive in silico models. Generally, over 95% redundant descriptors could be reduced from modelling for all the chemical, biological and hybrid models in this study. The predictive performance of chemical models (CCR of 0.73) is comparable with hybrid model performance (CCR of 0.74). Descriptors related to the octanol-water partition coefficient are vital for model performance. The in vitro endpoint of CYP2A2 played a key role in the development and interpretation of hybrid models. Identifying high-throughput screening assays relevant to liver toxicity would be key for improving in silico models of liver toxicity.
Collapse
Affiliation(s)
- X-W Zhu
- a College of Resource and Environment, Qingdao Agricultural University , Qingdao , China
- b Qingdao Engineering Research Center for Rural Environment, Qingdao Agricultural University , Qingdao , China
| | - Y-J Xin
- a College of Resource and Environment, Qingdao Agricultural University , Qingdao , China
| | - Q-H Chen
- a College of Resource and Environment, Qingdao Agricultural University , Qingdao , China
| |
Collapse
|