1
|
Zhang YH, Li ZD, Zeng T, Chen L, Huang T, Cai YD. Screening gene signatures for clinical response subtypes of lung transplantation. Mol Genet Genomics 2022; 297:1301-1313. [PMID: 35780439 DOI: 10.1007/s00438-022-01918-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Accepted: 06/12/2022] [Indexed: 11/30/2022]
Abstract
Lung is the most important organ in the human respiratory system, whose normal functions are quite essential for human beings. Under certain pathological conditions, the normal lung functions could no longer be maintained in patients, and lung transplantation is generally applied to ease patients' breathing and prolong their lives. However, several risk factors exist during and after lung transplantation, including bleeding, infection, and transplant rejections. In particular, transplant rejections are difficult to predict or prevent, leading to the most dangerous complications and severe status in patients undergoing lung transplantation. Given that most common monitoring and validation methods for lung transplantation rejections may take quite a long time and have low reproducibility, new technologies and methods are required to improve the efficacy and accuracy of rejection monitoring after lung transplantation. Recently, one previous study set up the gene expression profiles of patients who underwent lung transplantation. However, it did not provide a tool to predict lung transplantation responses. Here, a further deep investigation was conducted on such profiling data. A computational framework, incorporating several machine learning algorithms, such as feature selection methods and classification algorithms, was built to establish an effective prediction model distinguishing patient into different clinical subgroups, corresponding to different rejection responses after lung transplantation. Furthermore, the framework also screened essential genes with functional enrichments and create quantitative rules for the distinction of patients with different rejection responses to lung transplantation. The outcome of this contribution could provide guidelines for clinical treatment of each rejection subtype and contribute to the revealing of complicated rejection mechanisms of lung transplantation.
Collapse
Affiliation(s)
- Yu-Hang Zhang
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Zhan Dong Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, 130052, China
| | - Tao Zeng
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, 200444, China.
| |
Collapse
|
2
|
Sedykh A. CurveP Method for Rendering High-Throughput Screening Dose-Response Data into Digital Fingerprints. Methods Mol Biol 2022; 2474:147-154. [PMID: 35294763 DOI: 10.1007/978-1-0716-2213-1_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The nature of high-throughput screening (HTS) puts certain limits on optimal test conditions for each particular sample; therefore, on top of usual data normalization, additional parsing is often needed to account for incomplete read outs or various artifacts that arise from signal interferences.CurveP is a heuristic, user-tunable curve-cleaning algorithm that attempts to find a minimum set of corrections, which would give a monotonic dose-response curve. After applying the corrections, the algorithm proceeds to calculate a set of numeric features, which can be used as a fingerprint characterizing the sample, or as a vector of independent variables (e.g., molecular descriptors in case of chemical substances testing). The resulting output can be a part of HTS data analysis or can be used as input for a broad spectrum of computational applications, such as quantitative structure-activity relationship (QSAR ) modeling, computational toxicology, bioinformatics, and cheminformatics.
Collapse
|
3
|
Mamada H, Nomura Y, Uesawa Y. Prediction Model of Clearance by a Novel Quantitative Structure-Activity Relationship Approach, Combination DeepSnap-Deep Learning and Conventional Machine Learning. ACS OMEGA 2021; 6:23570-23577. [PMID: 34549154 PMCID: PMC8444299 DOI: 10.1021/acsomega.1c03689] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 08/23/2021] [Indexed: 05/19/2023]
Abstract
Some targets predicted by machine learning (ML) in drug discovery remain a challenge because of poor prediction. In this study, a new prediction model was developed and rat clearance (CL) was selected as a target because it is difficult to predict. A classification model was constructed using 1545 in-house compounds with rat CL data. The molecular descriptors calculated by Molecular Operating Environment (MOE), alvaDesc, and ADMET Predictor software were used to construct the prediction model. In conventional ML using 100 descriptors and random forest selected by DataRobot, the area under the curve (AUC) and accuracy (ACC) were 0.883 and 0.825, respectively. Conversely, the prediction model using DeepSnap and Deep Learning (DeepSnap-DL) with compound features as images had AUC and ACC of 0.905 and 0.832, respectively. We combined the two models (conventional ML and DeepSnap-DL) to develop a novel prediction model. Using the ensemble model with the mean of the predicted probabilities from each model improved the evaluation metrics (AUC = 0.943 and ACC = 0.874). In addition, a consensus model using the results of the agreement between classifications had an increased ACC (0.959). These combination models with a high level of predictive performance can be applied to rat CL as well as other pharmacokinetic parameters, pharmacological activity, and toxicity prediction. Therefore, these models will aid in the design of more rational compounds for the development of drugs.
Collapse
Affiliation(s)
- Hideaki Mamada
- Department
of Medical Molecular Informatics, Meiji
Pharmaceutical University, 2-522-1, Noshio, Kiyose-shi, Tokyo 204-858, Japan
- Drug
Metabolism and Pharmacokinetics Research Laboratories, Central Pharmaceutical Research Institute, Japan Tobacco
Inc., 1-1, Murasaki-cho, Takatsuki, Osaka 569-1125, Japan
| | - Yukihiro Nomura
- Drug
Metabolism and Pharmacokinetics Research Laboratories, Central Pharmaceutical Research Institute, Japan Tobacco
Inc., 1-1, Murasaki-cho, Takatsuki, Osaka 569-1125, Japan
| | - Yoshihiro Uesawa
- Department
of Medical Molecular Informatics, Meiji
Pharmaceutical University, 2-522-1, Noshio, Kiyose-shi, Tokyo 204-858, Japan
- . Tel.: +81-42-495-8983. Fax: +81-42-495-8983
| |
Collapse
|
4
|
Wang YT, Russo DP, Liu C, Zhou Q, Zhu H, Zhang YH. Predictive Modeling of Angiotensin I-Converting Enzyme Inhibitory Peptides Using Various Machine Learning Approaches. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2020; 68:12132-12140. [PMID: 32915574 DOI: 10.1021/acs.jafc.0c04624] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Food-derived angiotensin I-converting enzyme (ACE) inhibitory peptides could potentially be used as safe supportive therapeutic products for high blood pressure. Theoretical approaches are promising methods with the advantage through exploring the relationships between peptide structures and their bioactivities. In this study, peptides with ACE inhibitory activity were collected and curated. Quantitative structure-activity relationship (QSAR) models were developed by using the combination of various machine learning approaches and chemical descriptors. The resultant models have revealed several structure features accounting for the ACE inhibitions. 14 new dipeptides predicted to lower blood pressure by inhibiting ACE were selected. Molecular docking indicated that these dipeptides formed hydrogen bonds with ACE. Five of these dipeptides were synthesized for experimental testing. The QSAR models developed were proofed to design and propose novel ACE inhibitory peptides. Machine learning algorithms and properly selected chemical descriptors can be promising modeling approaches for rational design of natural functional food components.
Collapse
Affiliation(s)
- Yu-Tang Wang
- Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China
- Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Daniel P Russo
- The Rutgers Center for Computational and Integrative Biology, Camden, New Jersey 08102, United States
| | - Chang Liu
- Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Qian Zhou
- Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Hao Zhu
- The Rutgers Center for Computational and Integrative Biology, Camden, New Jersey 08102, United States
- Department of Chemistry, Rutgers University, Camden, New Jersey 08102, United States
| | - Ying-Hua Zhang
- Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China
- Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| |
Collapse
|
5
|
Zhao L, Ciallella HL, Aleksunes LM, Zhu H. Advancing computer-aided drug discovery (CADD) by big data and data-driven machine learning modeling. Drug Discov Today 2020; 25:1624-1638. [PMID: 32663517 PMCID: PMC7572559 DOI: 10.1016/j.drudis.2020.07.005] [Citation(s) in RCA: 84] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Revised: 06/26/2020] [Accepted: 07/06/2020] [Indexed: 02/06/2023]
Abstract
Advancing a new drug to market requires substantial investments in time as well as financial resources. Crucial bioactivities for drug candidates, including their efficacy, pharmacokinetics (PK), and adverse effects, need to be investigated during drug development. With advancements in chemical synthesis and biological screening technologies over the past decade, a large amount of biological data points for millions of small molecules have been generated and are stored in various databases. These accumulated data, combined with new machine learning (ML) approaches, such as deep learning, have shown great potential to provide insights into relevant chemical structures to predict in vitro, in vivo, and clinical outcomes, thereby advancing drug discovery and development in the big data era.
Collapse
Affiliation(s)
- Linlin Zhao
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ 08102, USA
| | - Heather L Ciallella
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ 08102, USA
| | - Lauren M Aleksunes
- Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, NJ 08854, USA
| | - Hao Zhu
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ 08102, USA; Department of Chemistry, Rutgers University, Camden, NJ 08102, USA.
| |
Collapse
|
6
|
Chen L, Pan X, Guo W, Gan Z, Zhang YH, Niu Z, Huang T, Cai YD. Investigating the gene expression profiles of cells in seven embryonic stages with machine learning algorithms. Genomics 2020; 112:2524-2534. [PMID: 32045671 DOI: 10.1016/j.ygeno.2020.02.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 12/26/2019] [Accepted: 02/07/2020] [Indexed: 12/15/2022]
Abstract
The development of embryonic cells involves several continuous stages, and some genes are related to embryogenesis. To date, few studies have systematically investigated changes in gene expression profiles during mammalian embryogenesis. In this study, a computational analysis using machine learning algorithms was performed on the gene expression profiles of mouse embryonic cells at seven stages. First, the profiles were analyzed through a powerful Monte Carlo feature selection method for the generation of a feature list. Second, increment feature selection was applied on the list by incorporating two classification algorithms: support vector machine (SVM) and repeated incremental pruning to produce error reduction (RIPPER). Through SVM, we extracted several latent gene biomarkers, indicating the stages of embryonic cells, and constructed an optimal SVM classifier that produced a nearly perfect classification of embryonic cells. Furthermore, some interesting rules were accessed by the RIPPER algorithm, suggesting different expression patterns for different stages.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai 200444, China; College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China; Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai 200241, China.
| | - XiaoYong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China.
| | - Wei Guo
- Institute of Health Sciences, Shanghai Jiao Tong University School of Medicine, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Zijun Gan
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Yu-Hang Zhang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China; Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
| | - Zhibin Niu
- College of Intelligence and Computing, Tianjin University, Tianjin 300072, China.
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China.
| |
Collapse
|
7
|
Zhao X, Chen L, Guo ZH, Liu T. Predicting Drug Side Effects with Compact Integration of Heterogeneous Networks. Curr Bioinform 2019. [DOI: 10.2174/1574893614666190220114644] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Background:
The side effects of drugs are not only harmful to humans but also the major
reasons for withdrawing approved drugs, bringing greater risks for pharmaceutical companies.
However, detecting the side effects for a given drug via traditional experiments is time- consuming
and expensive. In recent years, several computational methods have been proposed to predict the
side effects of drugs. However, most of the methods cannot effectively integrate the heterogeneous
properties of drugs.
Methods:
In this study, we adopted a network embedding method, Mashup, to extract essential and
informative drug features from several drug heterogeneous networks, representing different properties
of drugs. For side effects, a network was also built, from where side effect features were extracted.
These features can capture essential information about drugs and side effects in a network
level. Drug and side effect features were combined together to represent each pair of drug and side
effect, which was deemed as a sample in this study. Furthermore, they were fed into a random forest
(RF) algorithm to construct the prediction model, called the RF network model.
Results:
The RF network model was evaluated by several tests. The average of Matthews correlation
coefficients on the balanced and unbalanced datasets was 0.640 and 0.641, respectively.
Conclusion:
The RF network model was superior to the models incorporating other machine
learning algorithms and one previous model. Finally, we also investigated the influence of two feature
dimension parameters on the RF network model and found that our model was not very sensitive
to these parameters.
Collapse
Affiliation(s)
- Xian Zhao
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Zi-Han Guo
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Tao Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| |
Collapse
|
8
|
Balram D, Lian KY, Sebastian N. Air quality warning system based on a localized PM 2.5 soft sensor using a novel approach of Bayesian regularized neural network via forward feature selection. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2019; 182:109386. [PMID: 31255868 DOI: 10.1016/j.ecoenv.2019.109386] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Revised: 06/22/2019] [Accepted: 06/24/2019] [Indexed: 06/09/2023]
Abstract
It is highly significant to develop efficient soft sensors to estimate the concentration of hazardous pollutants in a region to maintain environmental safety. In this paper, an air quality warning system based on a robust PM2.5 soft sensor and support vector machine (SVM) classifier is reported. The soft sensor for the estimation of PM2.5 concentration is proposed using a novel approach of Bayesian regularized neural network (BRNN) via forward feature selection (FFS). Zuoying district of Taiwan is selected as the region of study for implementation of the estimation system because of the high pollution in the region. Descriptive statistics of various pollutants in Zuoying district is computed as part of the study. Moreover, seasonal variation of particulate matter (PM) concentration is analyzed to evaluate the impact of various seasons on the increased levels of PM in the region. To investigate the linear dependence of concentration of different pollutants to the concentration of PM2.5, Pearson correlation coefficient, Kendall's tau coefficient, and Spearman coefficient are computed. To achieve high performance for the PM2.5 estimation, selection of appropriate forward features from the input variables is carried out using FFS technique and Bayesian regularization is incorporated to the neural network system to avoid the overfitting problem. The comparative evaluation of performance of BRNN/FFS estimation system with various other methods shows that our proposed estimation system has the lowest mean square error (MSE), root mean square error (RMSE), and mean absolute error (MAE). Moreover, the coefficient of determination (R-squared) is around 0.95 for the proposed estimation method, which denotes a good fit. Evaluation of the SVM classifier showed good performance indicating that the proposed air quality warning system is efficient.
Collapse
Affiliation(s)
- Deepak Balram
- Department of Electrical Engineering, National Taipei University of Technology, No. 1, Section 3, Zhongxiao East Road, Taipei, 106, Taiwan, Republic of China
| | - Kuang-Yow Lian
- Department of Electrical Engineering, National Taipei University of Technology, No. 1, Section 3, Zhongxiao East Road, Taipei, 106, Taiwan, Republic of China.
| | - Neethu Sebastian
- Institute of Organic and Polymeric Materials, National Taipei University of Technology, No. 1, Section 3, Zhongxiao East Road, Taipei, 106, Taiwan, Republic of China
| |
Collapse
|
9
|
Abstract
Due to the massive data sets available for drug candidates, modern drug discovery has advanced to the big data era. Central to this shift is the development of artificial intelligence approaches to implementing innovative modeling based on the dynamic, heterogeneous, and large nature of drug data sets. As a result, recently developed artificial intelligence approaches such as deep learning and relevant modeling studies provide new solutions to efficacy and safety evaluations of drug candidates based on big data modeling and analysis. The resulting models provided deep insights into the continuum from chemical structure to in vitro, in vivo, and clinical outcomes. The relevant novel data mining, curation, and management techniques provided critical support to recent modeling studies. In summary, the new advancement of artificial intelligence in the big data era has paved the road to future rational drug development and optimization, which will have a significant impact on drug discovery procedures and, eventually, public health.
Collapse
Affiliation(s)
- Hao Zhu
- Department of Chemistry and Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey 08102, USA;
| |
Collapse
|
10
|
Yan X, Sedykh A, Wang W, Zhao X, Yan B, Zhu H. In silico profiling nanoparticles: predictive nanomodeling using universal nanodescriptors and various machine learning approaches. NANOSCALE 2019; 11:8352-8362. [PMID: 30984943 DOI: 10.1039/c9nr00844f] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Rational nanomaterial design is urgently demanded for new nanomaterial development with desired properties. However, computational nanomaterial modeling and virtual nanomaterial screening are not applicable for this purpose due to the complexity of nanomaterial structures. To address this challenge, a new computational workflow is established in this study to virtually profile nanoparticles by (1) constructing a structurally diverse virtual gold nanoparticle (GNP) library and (2) developing novel universal nanodescriptors. The emphasis of this study is the second task by developing geometrical nanodescriptors that are suitable for the quantitative modeling of GNPs and virtual screening purposes. The feasibility, rigor and applicability of this novel computational method are validated by testing seven GNP datasets consisting of 191 unique GNPs of various nano-bioactivities and physicochemical properties. The high predictability of the developed GNP models suggests that this workflow can be used as a universal tool for nanomaterial profiling and rational nanomaterial design.
Collapse
Affiliation(s)
- Xiliang Yan
- School of Chemistry and Chemical Engineering, Shandong University, Jinan 250100, China
| | | | | | | | | | | |
Collapse
|
11
|
Russo DP, Strickland J, Karmaus AL, Wang W, Shende S, Hartung T, Aleksunes LM, Zhu H. Nonanimal Models for Acute Toxicity Evaluations: Applying Data-Driven Profiling and Read-Across. ENVIRONMENTAL HEALTH PERSPECTIVES 2019; 127:47001. [PMID: 30933541 PMCID: PMC6785238 DOI: 10.1289/ehp3614] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
BACKGROUND Low-cost, high-throughput in vitro bioassays have potential as alternatives to animal models for toxicity testing. However, incorporating in vitro bioassays into chemical toxicity evaluations such as read-across requires significant data curation and analysis based on knowledge of relevant toxicity mechanisms, lowering the enthusiasm of using the massive amount of unstructured public data. OBJECTIVE We aimed to develop a computational method to automatically extract useful bioassay data from a public repository (i.e., PubChem) and assess its ability to predict animal toxicity using a novel bioprofile-based read-across approach. METHODS A training database containing 7,385 compounds with diverse rat acute oral toxicity data was searched against PubChem to establish in vitro bioprofiles. Using a novel subspace clustering algorithm, bioassay groups that may inform on relevant toxicity mechanisms underlying acute oral toxicity were identified. These bioassays groups were used to predict animal acute oral toxicity using read-across through a cross-validation process. Finally, an external test set of over 600 new compounds was used to validate the resulting model predictivity. RESULTS Several bioassay clusters showed high predictivity for acute oral toxicity (positive prediction rates range from 62-100%) through cross-validation. After incorporating individual clusters into an ensemble model, chemical toxicants in the external test set were evaluated for putative acute toxicity (positive prediction rate equal to 76%). Additionally, chemical fragment -in vitro-in vivo relationships were identified to illustrate new animal toxicity mechanisms. CONCLUSIONS The in vitro bioassay data-driven profiling strategy developed in this study meets the urgent needs of computational toxicology in the current big data era and can be extended to develop predictive models for other complex toxicity end points. https://doi.org/10.1289/EHP3614.
Collapse
Affiliation(s)
- Daniel P. Russo
- Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey, USA
| | - Judy Strickland
- Integrated Laboratory Systems (ILS), Research Triangle Park, North Carolina, USA
| | - Agnes L. Karmaus
- Integrated Laboratory Systems (ILS), Research Triangle Park, North Carolina, USA
| | - Wenyi Wang
- Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey, USA
| | - Sunil Shende
- Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey, USA
- Department of Computer Science, Rutgers University, Camden, New Jersey, USA
| | - Thomas Hartung
- Johns Hopkins Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, Maryland, USA
- University of Konstanz, CAAT-Europe, Konstanz, Germany
| | - Lauren M. Aleksunes
- Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, New Jersey, USA
| | - Hao Zhu
- Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey, USA
- Department of Chemistry, Rutgers University, Camden, New Jersey, USA
| |
Collapse
|
12
|
Zhang Y, Wang Y, Zhou W, Fan Y, Zhao J, Zhu L, Lu S, Lu T, Chen Y, Liu H. A combined drug discovery strategy based on machine learning and molecular docking. Chem Biol Drug Des 2019; 93:685-699. [PMID: 30688405 DOI: 10.1111/cbdd.13494] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Revised: 01/04/2019] [Accepted: 01/19/2019] [Indexed: 12/14/2022]
Abstract
Data mining methods based on machine learning play an increasingly important role in drug design and discovery. In the current work, eight machine learning methods including decision trees, k-Nearest neighbor, support vector machines, random forests, extremely randomized trees, AdaBoost, gradient boosting trees, and XGBoost were evaluated comprehensively through a case study of ACC inhibitor data sets. Internal and external data sets were employed for cross-validation of the eight machine learning methods. Results showed that the extremely randomized trees model performed best and was adopted as the first step of virtual screening. Together with structure-based virtual screening in the second step, this combined strategy obtained desirable results. This work indicates that the combination of machine learning methods with traditional structure-based virtual screening can effectively strengthen the ability in finding potential hits from large compound database for a given target.
Collapse
Affiliation(s)
- Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Yuchen Wang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Weineng Zhou
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Yuanrong Fan
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Junnan Zhao
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Lu Zhu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Shuai Lu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Tao Lu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China.,State Key Laboratory of Natural Medicines, China Pharmaceutical University, Nanjing, China
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, Nanjing, China
| |
Collapse
|
13
|
Zorn KM, Lane TR, Russo DP, Clark AM, Makarov V, Ekins S. Multiple Machine Learning Comparisons of HIV Cell-based and Reverse Transcriptase Data Sets. Mol Pharm 2019; 16:1620-1632. [PMID: 30779585 DOI: 10.1021/acs.molpharmaceut.8b01297] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The human immunodeficiency virus (HIV) causes over a million deaths every year and has a huge economic impact in many countries. The first class of drugs approved were nucleoside reverse transcriptase inhibitors. A newer generation of reverse transcriptase inhibitors have become susceptible to drug resistant strains of HIV, and hence, alternatives are urgently needed. We have recently pioneered the use of Bayesian machine learning to generate models with public data to identify new compounds for testing against different disease targets. The current study has used the NIAID ChemDB HIV, Opportunistic Infection and Tuberculosis Therapeutics Database for machine learning studies. We curated and cleaned data from HIV-1 wild-type cell-based and reverse transcriptase (RT) DNA polymerase inhibition assays. Compounds from this database with ≤1 μM HIV-1 RT DNA polymerase activity inhibition and cell-based HIV-1 inhibition are correlated (Pearson r = 0.44, n = 1137, p < 0.0001). Models were trained using multiple machine learning approaches (Bernoulli Naive Bayes, AdaBoost Decision Tree, Random Forest, support vector classification, k-Nearest Neighbors, and deep neural networks as well as consensus approaches) and then their predictive abilities were compared. Our comparison of different machine learning methods demonstrated that support vector classification, deep learning, and a consensus were generally comparable and not significantly different from each other using 5-fold cross validation and using 24 training and test set combinations. This study demonstrates findings in line with our previous studies for various targets that training and testing with multiple data sets does not demonstrate a significant difference between support vector machine and deep neural networks.
Collapse
Affiliation(s)
- Kimberley M Zorn
- Collaborations Pharmaceuticals, Inc. , Main Campus Drive, Lab 3510 , Raleigh , North Carolina 27606 , United States
| | - Thomas R Lane
- Collaborations Pharmaceuticals, Inc. , Main Campus Drive, Lab 3510 , Raleigh , North Carolina 27606 , United States
| | - Daniel P Russo
- Collaborations Pharmaceuticals, Inc. , Main Campus Drive, Lab 3510 , Raleigh , North Carolina 27606 , United States.,The Rutgers Center for Computational and Integrative Biology , Camden , New Jersey 08102 , United States
| | - Alex M Clark
- Molecular Materials Informatics, Inc. , 2234 Duvernay Street , Montreal , Quebec H3J2Y3 , Canada
| | - Vadim Makarov
- Bach Institute of Biochemistry , Research Center of Biotechnology of the Russian Academy of Sciences , Leninsky Prospekt 33-2 , Moscow 119071 , Russia
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc. , Main Campus Drive, Lab 3510 , Raleigh , North Carolina 27606 , United States
| |
Collapse
|
14
|
Maltarollo VG, Kronenberger T, Espinoza GZ, Oliveira PR, Honorio KM. Advances with support vector machines for novel drug discovery. Expert Opin Drug Discov 2018; 14:23-33. [PMID: 30488731 DOI: 10.1080/17460441.2019.1549033] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
INTRODUCTION Novel drug discovery remains an enormous challenge, with various computer-aided drug design (CADD) approaches having been widely employed for this purpose. CADD, specifically the commonly used support vector machines (SVMs), can employ machine learning techniques. SVMs and their variations offer numerous drug discovery applications, which range from the classification of substances (as active or inactive) to the construction of regression models and the ranking/virtual screening of databased compounds. Areas covered: Herein, the authors consider some of the applications of SVMs in medicinal chemistry, illustrating their main advantages and disadvantages, as well as trends in their utilization, via the available published literature. The aim of this review is to provide an up-to-date review of the recent applications of SVMs in drug discovery as described by the literature, thereby highlighting their strengths, weaknesses, and future challenges. Expert opinion: Techniques based on SVMs are considered as powerful approaches in early drug discovery. The ability of SVMs to classify active or inactive compounds has enabled the prioritization of substances for virtual screening. Indeed, one of the main advantages of SVMs is related to their potential in the analysis of nonlinear problems. However, despite successes in employing SVMs, the challenges of improving accuracy remain.
Collapse
Affiliation(s)
- Vinicius Gonçalves Maltarollo
- a Departamento de Produtos Farmacêuticos, Faculdade de Farmácia , Universidade Federal de Minas Gerais , Belo Horizonte , Brazil
| | - Thales Kronenberger
- b Department of Internal Medicine VIII , University Hospital of Tübingen , Tübingen , Germany
| | - Gabriel Zarzana Espinoza
- c Escola de Artes, Ciências e Humanidades , Universidade de São Paulo (USP) , São Paulo , Brazil
| | - Patricia Rufino Oliveira
- c Escola de Artes, Ciências e Humanidades , Universidade de São Paulo (USP) , São Paulo , Brazil
| | - Kathia Maria Honorio
- c Escola de Artes, Ciências e Humanidades , Universidade de São Paulo (USP) , São Paulo , Brazil.,d Centro de Ciências Naturais e Humanas , Universidade Federal do ABC , Santo André , Brazil
| |
Collapse
|
15
|
Fang J, Liu C, Wang Q, Lin P, Cheng F. In silico polypharmacology of natural products. Brief Bioinform 2018; 19:1153-1171. [PMID: 28460068 DOI: 10.1093/bib/bbx045] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2017] [Indexed: 01/03/2025] Open
Abstract
Natural products with polypharmacological profiles have demonstrated promise as novel therapeutics for various complex diseases, including cancer. Currently, many gaps exist in our knowledge of which compounds interact with which targets, and experimentally testing all possible interactions is infeasible. Recent advances and developments of systems pharmacology and computational (in silico) approaches provide powerful tools for exploring the polypharmacological profiles of natural products. In this review, we introduce recent progresses and advances of computational tools and systems pharmacology approaches for identifying drug targets of natural products by focusing on the development of targeted cancer therapy. We survey the polypharmacological and systems immunology profiles of five representative natural products that are being considered as cancer therapies. We summarize various chemoinformatics, bioinformatics and systems biology resources for reconstructing drug-target networks of natural products. We then review currently available computational approaches and tools for prediction of drug-target interactions by focusing on five domains: target-based, ligand-based, chemogenomics-based, network-based and omics-based systems biology approaches. In addition, we describe a practical example of the application of systems pharmacology approaches by integrating the polypharmacology of natural products and large-scale cancer genomics data for the development of precision oncology under the systems biology framework. Finally, we highlight the promise of cancer immunotherapies and combination therapies that target tumor ecosystems (e.g. clones or 'selfish' sub-clones) via exploiting the immunological and inflammatory 'side' effects of natural products in the cancer post-genomics era.
Collapse
Affiliation(s)
- Jiansong Fang
- Institute of Clinical Pharmacology, Guangzhou University of Chinese Medicine, Guangzhou, China
| | - Chuang Liu
- Alibaba Research Center for Complexity Sciences at the Hangzhou Normal University, Hangzhou, China
| | - Qi Wang
- Institute of Clinical Pharmacology, Guangzhou University of Chinese Medicine, Guangzhou, China
| | - Ping Lin
- National Key Laboratory of Biotherapy and Cancer Center, West China Hospital, West China Medical School, Sichuan University, Chengdu, Sichuan, China
| | - Feixiong Cheng
- Department of Biomedical Informatics, Vanderbilt University Medical Center in Nashville (United States)
| |
Collapse
|
16
|
Zhao X, Chen L, Lu J. A similarity-based method for prediction of drug side effects with heterogeneous information. Math Biosci 2018; 306:136-144. [PMID: 30296417 DOI: 10.1016/j.mbs.2018.09.010] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 09/22/2018] [Accepted: 09/25/2018] [Indexed: 12/25/2022]
Abstract
Drugs can produce intended therapeutic effects to treat different diseases. However, they may also cause side effects at the same time. For an approved drug, it is best to detect all side effects it can produce. Otherwise, it may bring great risks for pharmaceuticals companies as well as be harmful to human body. It is urgent to design quick and reliable identification methods to detect the side effects for a given drug. In this study, a binary classification model was proposed to predict drug side effects. Different from most previous methods, our model termed the pair of drug and side effect as a sample and convert the original problem to a binary classification problem. Based on the similarity idea, each pair was represented by five features, each of which was derived from a type of drug property. The strong machine learning algorithm, random forest, was adopted as the prediction engine. The ten-fold cross-validation on five datasets with different negative samples indicated that the proposed model yielded a good performance of Matthews correlation coefficient around 0.550 and AUC around 0.8492. In addition, we also analyzed the contribution of each drug property for construction of the model. The results indicated that drug similarity in fingerprint was most related to the prediction of drug side effects and all drug properties gave less or more contributions.
Collapse
Affiliation(s)
- Xian Zhao
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China; Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai 200241, People's Republic of China.
| | - Jing Lu
- School of Pharmacy, Key Laboratory of Molecular Pharmacology and Drug Evaluation (Yantai University), Ministry of Education, Collaborative Innovation Center of Advanced Drug Delivery System and Biotech Drugs in Universities of Shandong, Yantai University, Yantai 264005, People's Republic of China
| |
Collapse
|
17
|
Zhai X, Chen M, Lu W. Predicting the toxicities of metal oxide nanoparticles based on support vector regression with a residual bootstrapping method. Toxicol Mech Methods 2018; 28:440-449. [DOI: 10.1080/15376516.2018.1449278] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Affiliation(s)
- Xiuyun Zhai
- School of Materials Science and Engineering, Shanghai University, Shanghai, China
- School of Mechanical Engineering, Panzhihua University, Panzhihua, China
| | - Mingtong Chen
- Material Engineering School, Panzhihua University, Panzhihua, China
| | - Wencong Lu
- College of Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
18
|
Zhang L, Tan J, Han D, Zhu H. From machine learning to deep learning: progress in machine intelligence for rational drug discovery. Drug Discov Today 2017; 22:1680-1685. [PMID: 28881183 DOI: 10.1016/j.drudis.2017.08.010] [Citation(s) in RCA: 311] [Impact Index Per Article: 38.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Revised: 07/13/2017] [Accepted: 08/30/2017] [Indexed: 01/29/2023]
Abstract
Machine intelligence, which is normally presented as artificial intelligence, refers to the intelligence exhibited by computers. In the history of rational drug discovery, various machine intelligence approaches have been applied to guide traditional experiments, which are expensive and time-consuming. Over the past several decades, machine-learning tools, such as quantitative structure-activity relationship (QSAR) modeling, were developed that can identify potential biological active molecules from millions of candidate compounds quickly and cheaply. However, when drug discovery moved into the era of 'big' data, machine learning approaches evolved into deep learning approaches, which are a more powerful and efficient way to deal with the massive amounts of data generated from modern drug discovery approaches. Here, we summarize the history of machine learning and provide insight into recently developed deep learning approaches and their applications in rational drug discovery. We suggest that this evolution of machine intelligence now provides a guide for early-stage drug design and discovery in the current big data era.
Collapse
Affiliation(s)
- Lu Zhang
- College of Life Science and Bio-engineering, Beijing University of Technology, Beijing, 100124, China
| | - Jianjun Tan
- College of Life Science and Bio-engineering, Beijing University of Technology, Beijing, 100124, China.
| | - Dan Han
- College of Life Science and Bio-engineering, Beijing University of Technology, Beijing, 100124, China
| | - Hao Zhu
- College of Life Science and Bio-engineering, Beijing University of Technology, Beijing, 100124, China; Department of Chemistry, Rutgers University, Camden, NJ 08102, USA; The Rutgers Center for Computational and Integrative Biology, Camden, NJ 08102, USA.
| |
Collapse
|
19
|
Chen L, Zhang YH, Zou Q, Chu C, Ji Z. Analysis of the chemical toxicity effects using the enrichment of Gene Ontology terms and KEGG pathways. Biochim Biophys Acta Gen Subj 2016; 1860:2619-26. [PMID: 27208425 DOI: 10.1016/j.bbagen.2016.05.015] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Revised: 04/25/2016] [Accepted: 05/13/2016] [Indexed: 02/06/2023]
Abstract
BACKGROUND Chemical toxicity is one of the major barriers for designing and detecting new chemical entities during drug discovery. Unexpected toxicity of an approved drug may lead to withdrawal from the market and significant loss of the associated costs. Better understanding of the mechanisms underlying various toxicity effects can help eliminate unqualified candidate drugs in early stages, allowing researchers to focus their attention on other more viable candidates. METHODS In this study, we aimed to understand the mechanisms underlying several toxicity effects using Gene Ontology (GO) terms and KEGG pathways. GO term and KEGG pathway enrichment theories were adopted to encode each chemical, and the minimum redundancy maximum relevance (mRMR) was used to analyze the GO terms and the KEGG pathways. Based on the feature list obtained by the mRMR method, the most related GO terms and KEGG pathways were extracted. RESULTS Some important GO terms and KEGG pathways were uncovered, which were concluded to be significant for determining chemical toxicity effects. CONCLUSIONS Several GO terms and KEGG pathways are highly related to all investigated toxicity effects, while some are specific to a certain toxicity effect. GENERAL SIGNIFICANCE The findings in this study have the potential to further our understanding of different chemical toxicity mechanisms and to assist scientists in developing new chemical toxicity prediction algorithms. This article is part of a Special Issue entitled "System Genetics" Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.
Collapse
Affiliation(s)
- Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China.
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, People's Republic of China.
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin 300072, People's Republic of China.
| | - Chen Chu
- Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, People's Republic of China.
| | - Zhiliang Ji
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Xiamen University, Xiamen, Fujian 361102, People's Republic of China.
| |
Collapse
|
20
|
Sedykh A. CurveP Method for Rendering High-Throughput Screening Dose-Response Data into Digital Fingerprints. Methods Mol Biol 2016; 1473:135-41. [PMID: 27518631 DOI: 10.1007/978-1-4939-6346-1_14] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The nature of high-throughput screening (HTS) puts certain limits on optimal test conditions for each particular sample, therefore, on top of usual data normalization, additional parsing is often needed to account for incomplete read outs or various artifacts that arise from signal interferences.CurveP is a heuristic, user-tunable, curve-cleaning algorithm that attempts to find a minimum set of corrections, which would give a monotonic dose-response curve. After applying the corrections, the algorithm proceeds to calculate a set of numeric features, which can be used as a fingerprint characterizing the sample, or as a vector of independent variables (e.g., molecular descriptors in case of chemical substances testing). The resulting output can be a part of HTS data analysis or can be used as input for a broad spectrum of computational applications, such as Quantitative Structure-Activity Relationship (QSAR) modeling, computational toxicology, bio- and cheminformatics.
Collapse
Affiliation(s)
- Alexander Sedykh
- Multicase Inc., 23811 Chagrin Blvd., Ste 305,, Beachwood, OH, 44122, USA.
| |
Collapse
|
21
|
Gawehn E, Hiss JA, Schneider G. Deep Learning in Drug Discovery. Mol Inform 2015; 35:3-14. [PMID: 27491648 DOI: 10.1002/minf.201501008] [Citation(s) in RCA: 334] [Impact Index Per Article: 33.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2015] [Accepted: 12/01/2015] [Indexed: 12/18/2022]
Abstract
Artificial neural networks had their first heyday in molecular informatics and drug discovery approximately two decades ago. Currently, we are witnessing renewed interest in adapting advanced neural network architectures for pharmaceutical research by borrowing from the field of "deep learning". Compared with some of the other life sciences, their application in drug discovery is still limited. Here, we provide an overview of this emerging field of molecular informatics, present the basic concepts of prominent deep learning methods and offer motivation to explore these techniques for their usefulness in computer-assisted drug discovery and design. We specifically emphasize deep neural networks, restricted Boltzmann machine networks and convolutional networks.
Collapse
Affiliation(s)
- Erik Gawehn
- Swiss Federal Institute of Technology (ETH), Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, CH-8093 Zurich, Switzerland, Fax: +41 44 633 13 79, Tel: +41 44 633 74 38
| | - Jan A Hiss
- Swiss Federal Institute of Technology (ETH), Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, CH-8093 Zurich, Switzerland, Fax: +41 44 633 13 79, Tel: +41 44 633 74 38
| | - Gisbert Schneider
- Swiss Federal Institute of Technology (ETH), Department of Chemistry and Applied Biosciences, Vladimir-Prelog-Weg 4, CH-8093 Zurich, Switzerland, Fax: +41 44 633 13 79, Tel: +41 44 633 74 38.
| |
Collapse
|