1
|
Curcio A, Rocca R, Alcaro S, Artese A. The Histone Deacetylase Family: Structural Features and Application of Combined Computational Methods. Pharmaceuticals (Basel) 2024; 17:620. [PMID: 38794190 PMCID: PMC11124352 DOI: 10.3390/ph17050620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 05/03/2024] [Accepted: 05/08/2024] [Indexed: 05/26/2024] Open
Abstract
Histone deacetylases (HDACs) are crucial in gene transcription, removing acetyl groups from histones. They also influence the deacetylation of non-histone proteins, contributing to the regulation of various biological processes. Thus, HDACs play pivotal roles in various diseases, including cancer, neurodegenerative disorders, and inflammatory conditions, highlighting their potential as therapeutic targets. This paper reviews the structure and function of the four classes of human HDACs. While four HDAC inhibitors are currently available for treating hematological malignancies, numerous others are undergoing clinical trials. However, their non-selective toxicity necessitates ongoing research into safer and more efficient class-selective or isoform-selective inhibitors. Computational techniques have greatly facilitated the discovery of HDAC inhibitors that achieve the desired potency and selectivity. These techniques encompass ligand-based strategies such as scaffold hopping, pharmacophore modeling, three-dimensional quantitative structure–activity relationships (3D-QSAR), and structure-based virtual screening (molecular docking). Additionally, advancements in molecular dynamics simulations, along with Poisson–Boltzmann/molecular mechanics generalized Born surface area (PB/MM-GBSA) methods, have enhanced the accuracy of predicting ligand binding affinity. In this review, we delve into the ways in which these methods have contributed to designing and identifying HDAC inhibitors.
Collapse
Affiliation(s)
- Antonio Curcio
- Dipartimento di Scienze della Salute, Campus “S. Venuta”, Università degli Studi “Magna Græcia” di Catanzaro, Viale Europa, 88100 Catanzaro, Italy; (A.C.); (S.A.); (A.A.)
| | - Roberta Rocca
- Dipartimento di Scienze della Salute, Campus “S. Venuta”, Università degli Studi “Magna Græcia” di Catanzaro, Viale Europa, 88100 Catanzaro, Italy; (A.C.); (S.A.); (A.A.)
- Net4Science S.r.l., Università degli Studi “Magna Græcia” di Catanzaro, Viale Europa, 88100 Catanzaro, Italy
| | - Stefano Alcaro
- Dipartimento di Scienze della Salute, Campus “S. Venuta”, Università degli Studi “Magna Græcia” di Catanzaro, Viale Europa, 88100 Catanzaro, Italy; (A.C.); (S.A.); (A.A.)
- Net4Science S.r.l., Università degli Studi “Magna Græcia” di Catanzaro, Viale Europa, 88100 Catanzaro, Italy
| | - Anna Artese
- Dipartimento di Scienze della Salute, Campus “S. Venuta”, Università degli Studi “Magna Græcia” di Catanzaro, Viale Europa, 88100 Catanzaro, Italy; (A.C.); (S.A.); (A.A.)
- Net4Science S.r.l., Università degli Studi “Magna Græcia” di Catanzaro, Viale Europa, 88100 Catanzaro, Italy
| |
Collapse
|
2
|
Matsumoto K, Miyao T, Funatsu K. Ranking-Oriented Quantitative Structure-Activity Relationship Modeling Combined with Assay-Wise Data Integration. ACS OMEGA 2021; 6:11964-11973. [PMID: 34056351 PMCID: PMC8154010 DOI: 10.1021/acsomega.1c00463] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/21/2021] [Indexed: 05/15/2023]
Abstract
In ligand-based drug design, quantitative structure-activity relationship (QSAR) models play an important role in activity prediction. One of the major end points of QSAR models is half-maximal inhibitory concentration (IC50). Experimental IC50 data from various research groups have been accumulated in publicly accessible databases, providing an opportunity for us to use such data in predictive QSAR models. In this study, we focused on using a ranking-oriented QSAR model as a predictive model because relative potency strength within the same assay is solid information that is not based on any mechanical assumptions. We conducted rigorous validation using the ChEMBL database and previously reported data sets. Ranking support vector machine (ranking-SVM) models trained on compounds from similar assays were as good as support vector regression (SVR) with the Tanimoto kernel trained on compounds from all the assays. As effective ways of data integration, for ranking-SVM, integrated compounds should be selected from only similar assays in terms of compounds. For SVR with the Tanimoto kernel, entire compounds from different assays can be incorporated.
Collapse
Affiliation(s)
- Katsuhisa Matsumoto
- Graduate
School of Science and Technology, Nara Institute
of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
| | - Tomoyuki Miyao
- Graduate
School of Science and Technology, Nara Institute
of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
- Data
Science Center, Nara Institute of Science
and Technology, 8916-5
Takayama-cho, Ikoma, Nara, 630-0192, Japan
| | - Kimito Funatsu
- Data
Science Center, Nara Institute of Science
and Technology, 8916-5
Takayama-cho, Ikoma, Nara, 630-0192, Japan
- Department
of Chemical System Engineering, School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
- E-mail: . Phone: +81-3-5841-7751. Fax: +81-3-5841-7771
| |
Collapse
|
3
|
Ru X, Ye X, Sakurai T, Zou Q. Application of learning to rank in bioinformatics tasks. Brief Bioinform 2021; 22:6102666. [PMID: 33454758 DOI: 10.1093/bib/bbaa394] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2020] [Revised: 11/09/2020] [Accepted: 11/24/2020] [Indexed: 12/17/2022] Open
Abstract
Over the past decades, learning to rank (LTR) algorithms have been gradually applied to bioinformatics. Such methods have shown significant advantages in multiple research tasks in this field. Therefore, it is necessary to summarize and discuss the application of these algorithms so that these algorithms are convenient and contribute to bioinformatics. In this paper, the characteristics of LTR algorithms and their strengths over other types of algorithms are analyzed based on the application of multiple perspectives in bioinformatics. Finally, the paper further discusses the shortcomings of the LTR algorithms, the methods and means to better use the algorithms and some open problems that currently exist.
Collapse
Affiliation(s)
| | - Xiucai Ye
- Department of Computer Science and Center for Artificial Intelligence Research (C-AIR), University of Tsukuba
| | | | - Quan Zou
- University of Electronic Science and Technology of China
| |
Collapse
|
4
|
Ru X, Wang L, Li L, Ding H, Ye X, Zou Q. Exploration of the correlation between GPCRs and drugs based on a learning to rank algorithm. Comput Biol Med 2020; 119:103660. [PMID: 32090901 DOI: 10.1016/j.compbiomed.2020.103660] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 02/04/2020] [Accepted: 02/12/2020] [Indexed: 02/01/2023]
Abstract
Exploring the protein - drug correlation can not only solve the problem of selecting candidate compounds but also solve related problems such as drug redirection and finding potential drug targets. Therefore, many researchers have proposed different machine learning methods for prediction of protein-drug correlations. However, many existing models simply divide the protein-drug relationship into related or irrelevant categories and do not deeply explore the most relevant target (or drug) for a given drug (or target). In order to solve this problem, this paper applies the ranking concept to the prediction of the GPCR (G Protein-Coupled Receptors)-drug correlation. This study uses two different types of data sets to explore candidate compound and potential target problems, and both sets achieved good results. In addition, this study also found that the family to which a protein belongs is not an inherent factor that affects the ranking of GPCR-drug correlations; however, if the drug affects other family members of the protein, then the protein is likely to be a potential target of the drug. This study showed that the learning to rank algorithm is a good tool for exploring protein-drug correlations.
Collapse
Affiliation(s)
- Xiaoqing Ru
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China
| | - Lida Wang
- Scientific Research Department, Heilongjiang Agricultural Recalmation General Hospital, Harbin, China.
| | - Lihong Li
- School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China
| | - Hui Ding
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba Science City, Japan
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
5
|
Pirzada RH, Javaid N, Choi S. The Roles of the NLRP3 Inflammasome in Neurodegenerative and Metabolic Diseases and in Relevant Advanced Therapeutic Interventions. Genes (Basel) 2020; 11:E131. [PMID: 32012695 PMCID: PMC7074480 DOI: 10.3390/genes11020131] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2019] [Revised: 01/22/2020] [Accepted: 01/22/2020] [Indexed: 02/07/2023] Open
Abstract
Inflammasomes are intracellular multiprotein complexes in the cytoplasm that regulate inflammation activation in the innate immune system in response to pathogens and to host self-derived molecules. Recent advances greatly improved our understanding of the activation of nucleotide-binding oligomerization domain-like receptor (NLR) family pyrin domain containing 3 (NLRP3) inflammasomes at the molecular level. The NLRP3 belongs to the subfamily of NLRP which activates caspase 1, thus causing the production of proinflammatory cytokines (interleukin 1β and interleukin 18) and pyroptosis. This inflammasome is involved in multiple neurodegenerative and metabolic disorders including Alzheimer's disease, multiple sclerosis, type 2 diabetes mellitus, and gout. Therefore, therapeutic targeting to the NLRP3 inflammasome complex is a promising way to treat these diseases. Recent research advances paved the way toward drug research and development using a variety of machine learning-based and artificial intelligence-based approaches. These state-of-the-art approaches will lead to the discovery of better drugs after the training of such a system.
Collapse
Affiliation(s)
| | | | - Sangdun Choi
- Department of Molecular Science and Technology, Ajou University, Suwon 16499, Korea; (R.H.P.); (N.J.)
| |
Collapse
|
6
|
Learning-to-rank technique based on ignoring meaningless ranking orders between compounds. J Mol Graph Model 2019; 92:192-200. [DOI: 10.1016/j.jmgm.2019.07.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 07/17/2019] [Accepted: 07/17/2019] [Indexed: 11/19/2022]
|
7
|
Yasuo N, Watanabe K, Hara H, Rikimaru K, Sekijima M. Predicting Strategies for Lead Optimization via Learning to Rank. ACTA ACUST UNITED AC 2018. [DOI: 10.2197/ipsjtbio.11.41] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Nobuaki Yasuo
- Department of Computer Science, Tokyo Institute of Technology
- Japan Society for the Promotion of Science
| | | | - Hideto Hara
- Shonan Research Center, Takeda Pharmaceutical Company Limited
| | | | - Masakazu Sekijima
- Department of Computer Science, Tokyo Institute of Technology
- Advanced Computational Drug Discovery Unit, Tokyo Institute of Technology
| |
Collapse
|
8
|
Suzuki SD, Ohue M, Akiyama Y. PKRank: a novel learning-to-rank method for ligand-based virtual screening using pairwise kernel and RankSVM. ARTIFICIAL LIFE AND ROBOTICS 2017. [DOI: 10.1007/s10015-017-0416-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
9
|
Yuan Q, Gao J, Wu D, Zhang S, Mamitsuka H, Zhu S. DrugE-Rank: improving drug-target interaction prediction of new candidate drugs or targets by ensemble learning to rank. Bioinformatics 2017; 32:i18-i27. [PMID: 27307615 PMCID: PMC4908328 DOI: 10.1093/bioinformatics/btw244] [Citation(s) in RCA: 99] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Motivation: Identifying drug–target interactions is an important task in drug discovery. To reduce heavy time and financial cost in experimental way, many computational approaches have been proposed. Although these approaches have used many different principles, their performance is far from satisfactory, especially in predicting drug–target interactions of new candidate drugs or targets. Methods: Approaches based on machine learning for this problem can be divided into two types: feature-based and similarity-based methods. Learning to rank is the most powerful technique in the feature-based methods. Similarity-based methods are well accepted, due to their idea of connecting the chemical and genomic spaces, represented by drug and target similarities, respectively. We propose a new method, DrugE-Rank, to improve the prediction performance by nicely combining the advantages of the two different types of methods. That is, DrugE-Rank uses LTR, for which multiple well-known similarity-based methods can be used as components of ensemble learning. Results: The performance of DrugE-Rank is thoroughly examined by three main experiments using data from DrugBank: (i) cross-validation on FDA (US Food and Drug Administration) approved drugs before March 2014; (ii) independent test on FDA approved drugs after March 2014; and (iii) independent test on FDA experimental drugs. Experimental results show that DrugE-Rank outperforms competing methods significantly, especially achieving more than 30% improvement in Area under Prediction Recall curve for FDA approved new drugs and FDA experimental drugs. Availability:http://datamining-iip.fudan.edu.cn/service/DrugE-Rank Contact:zhusf@fudan.edu.cn Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qingjun Yuan
- School of Computer Science, Fudan University, Shanghai, China Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China
| | - Junning Gao
- School of Computer Science, Fudan University, Shanghai, China Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China
| | - Dongliang Wu
- School of Computer Science, Fudan University, Shanghai, China Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China
| | - Shihua Zhang
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Japan Department of Computer Science, Aalto University, Finland
| | - Shanfeng Zhu
- School of Computer Science, Fudan University, Shanghai, China Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China Centre for Computational System Biology, Fudan University, Shanghai, China
| |
Collapse
|
10
|
Al-Dabbagh MM, Salim N, Himmat M, Ahmed A, Saeed F. Quantum probability ranking principle for ligand-based virtual screening. J Comput Aided Mol Des 2017; 31:365-378. [PMID: 28220440 DOI: 10.1007/s10822-016-0003-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2016] [Accepted: 12/16/2016] [Indexed: 10/20/2022]
Abstract
Chemical libraries contain thousands of compounds that need screening, which increases the need for computational methods that can rank or prioritize compounds. The tools of virtual screening are widely exploited to enhance the cost effectiveness of lead drug discovery programs by ranking chemical compounds databases in decreasing probability of biological activity based upon probability ranking principle (PRP). In this paper, we developed a novel ranking approach for molecular compounds inspired by quantum mechanics, called quantum probability ranking principle (QPRP). The QPRP ranking criteria would make an attempt to draw an analogy between the physical experiment and molecular structure ranking process for 2D fingerprints in ligand based virtual screening (LBVS). The development of QPRP criteria in LBVS has employed the concepts of quantum at three different levels, firstly at representation level, this model makes an effort to develop a new framework of molecular representation by connecting the molecular compounds with mathematical quantum space. Secondly, estimate the similarity between chemical libraries and references based on quantum-based similarity searching method. Finally, rank the molecules using QPRP approach. Simulated virtual screening experiments with MDL drug data report (MDDR) data sets showed that QPRP outperformed the classical ranking principle (PRP) for molecular chemical compounds.
Collapse
Affiliation(s)
| | - Naomie Salim
- Faculty of Computing, Universiti Teknologi Malaysia, Skudia, 81310, Malaysia
| | - Mubarak Himmat
- Faculty of Computing, Universiti Teknologi Malaysia, Skudia, 81310, Malaysia
| | - Ali Ahmed
- Faculty of Engineering, Karary University, Khartoum, 12304, Sudan
| | - Faisal Saeed
- Faculty of Computing, Universiti Teknologi Malaysia, Skudia, 81310, Malaysia
| |
Collapse
|
11
|
Balfer J, Bajorath J. Visualization and Interpretation of Support Vector Machine Activity Predictions. J Chem Inf Model 2015; 55:1136-47. [DOI: 10.1021/acs.jcim.5b00175] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Affiliation(s)
- Jenny Balfer
- Department of Life Science
Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal
Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany
| | - Jürgen Bajorath
- Department of Life Science
Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal
Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany
| |
Collapse
|
12
|
Korkmaz S, Zararsiz G, Goksuluk D. MLViS: A Web Tool for Machine Learning-Based Virtual Screening in Early-Phase of Drug Discovery and Development. PLoS One 2015; 10:e0124600. [PMID: 25928885 PMCID: PMC4415797 DOI: 10.1371/journal.pone.0124600] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Accepted: 03/03/2015] [Indexed: 12/18/2022] Open
Abstract
Virtual screening is an important step in early-phase of drug discovery process. Since there are thousands of compounds, this step should be both fast and effective in order to distinguish drug-like and nondrug-like molecules. Statistical machine learning methods are widely used in drug discovery studies for classification purpose. Here, we aim to develop a new tool, which can classify molecules as drug-like and nondrug-like based on various machine learning methods, including discriminant, tree-based, kernel-based, ensemble and other algorithms. To construct this tool, first, performances of twenty-three different machine learning algorithms are compared by ten different measures, then, ten best performing algorithms have been selected based on principal component and hierarchical cluster analysis results. Besides classification, this application has also ability to create heat map and dendrogram for visual inspection of the molecules through hierarchical cluster analysis. Moreover, users can connect the PubChem database to download molecular information and to create two-dimensional structures of compounds. This application is freely available through www.biosoft.hacettepe.edu.tr/MLViS/.
Collapse
Affiliation(s)
- Selcuk Korkmaz
- Department of Biostatistics, Faculty of Medicine, Hacettepe University, Sihhiye, Ankara, Turkey
- * E-mail:
| | - Gokmen Zararsiz
- Department of Biostatistics, Faculty of Medicine, Hacettepe University, Sihhiye, Ankara, Turkey
| | - Dincer Goksuluk
- Department of Biostatistics, Faculty of Medicine, Hacettepe University, Sihhiye, Ankara, Turkey
| |
Collapse
|
13
|
Zhang W, Ji L, Chen Y, Tang K, Wang H, Zhu R, Jia W, Cao Z, Liu Q. When drug discovery meets web search: Learning to Rank for ligand-based virtual screening. J Cheminform 2015; 7:5. [PMID: 25705262 PMCID: PMC4333300 DOI: 10.1186/s13321-015-0052-z] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Accepted: 01/07/2015] [Indexed: 11/30/2022] Open
Abstract
Background The rapid increase in the emergence of novel chemical substances presents a substantial demands for more sophisticated computational methodologies for drug discovery. In this study, the idea of Learning to Rank in web search was presented in drug virtual screening, which has the following unique capabilities of 1). Applicable of identifying compounds on novel targets when there is not enough training data available for these targets, and 2). Integration of heterogeneous data when compound affinities are measured in different platforms. Results A standard pipeline was designed to carry out Learning to Rank in virtual screening. Six Learning to Rank algorithms were investigated based on two public datasets collected from Binding Database and the newly-published Community Structure-Activity Resource benchmark dataset. The results have demonstrated that Learning to rank is an efficient computational strategy for drug virtual screening, particularly due to its novel use in cross-target virtual screening and heterogeneous data integration. Conclusions To the best of our knowledge, we have introduced here the first application of Learning to Rank in virtual screening. The experiment workflow and algorithm assessment designed in this study will provide a standard protocol for other similar studies. All the datasets as well as the implementations of Learning to Rank algorithms are available at http://www.tongji.edu.cn/~qiliu/lor_vs.html. The analogy between web search and ligand-based drug discovery ![]()
Collapse
Affiliation(s)
- Wei Zhang
- Department of Central Laboratory, Shanghai Tenth People's Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Lijuan Ji
- Huai'an Second People's Hospital affiliated to Xuzhou Medical College, Huai'an, China
| | - Yanan Chen
- Department of Central Laboratory, Shanghai Tenth People's Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Kailin Tang
- Department of Central Laboratory, Shanghai Tenth People's Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Haiping Wang
- Department of Central Laboratory, Shanghai Tenth People's Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China ; Department of Computer Science, Hefei University of Technology, Hefei, 230009 China
| | - Ruixin Zhu
- Department of Central Laboratory, Shanghai Tenth People's Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Wei Jia
- R & D Information, AstraZeneca, Shanghai, China
| | - Zhiwei Cao
- Department of Central Laboratory, Shanghai Tenth People's Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Qi Liu
- Department of Central Laboratory, Shanghai Tenth People's Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, China
| |
Collapse
|
14
|
Dörr A, Rosenbaum L, Zell A. A ranking method for the concurrent learning of compounds with various activity profiles. J Cheminform 2015; 7:2. [PMID: 25643067 PMCID: PMC4306736 DOI: 10.1186/s13321-014-0050-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 12/11/2014] [Indexed: 11/30/2022] Open
Abstract
Background In this study, we present a SVM-based ranking algorithm for the concurrent learning of compounds with different activity profiles and their varying prioritization. To this end, a specific labeling of each compound was elaborated in order to infer virtual screening models against multiple targets. We compared the method with several state-of-the-art SVM classification techniques that are capable of inferring multi-target screening models on three chemical data sets (cytochrome P450s, dehydrogenases, and a trypsin-like protease data set) containing three different biological targets each. Results The experiments show that ranking-based algorithms show an increased performance for single- and multi-target virtual screening. Moreover, compounds that do not completely fulfill the desired activity profile are still ranked higher than decoys or compounds with an entirely undesired profile, compared to other multi-target SVM methods. Conclusions SVM-based ranking methods constitute a valuable approach for virtual screening in multi-target drug design. The utilization of such methods is most helpful when dealing with compounds with various activity profiles and the finding of many ligands with an already perfectly matching activity profile is not to be expected. Electronic supplementary material The online version of this article (doi:10.1186/s13321-014-0050-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alexander Dörr
- Center for Bioinformatics Tübingen (ZBIT), University of Tuebingen, Sand 1, Tübingen, 72076 Germany
| | - Lars Rosenbaum
- Center for Bioinformatics Tübingen (ZBIT), University of Tuebingen, Sand 1, Tübingen, 72076 Germany
| | - Andreas Zell
- Center for Bioinformatics Tübingen (ZBIT), University of Tuebingen, Sand 1, Tübingen, 72076 Germany
| |
Collapse
|
15
|
Lavecchia A. Machine-learning approaches in drug discovery: methods and applications. Drug Discov Today 2014; 20:318-31. [PMID: 25448759 DOI: 10.1016/j.drudis.2014.10.012] [Citation(s) in RCA: 384] [Impact Index Per Article: 34.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Revised: 09/27/2014] [Accepted: 10/24/2014] [Indexed: 12/19/2022]
Abstract
During the past decade, virtual screening (VS) has evolved from traditional similarity searching, which utilizes single reference compounds, into an advanced application domain for data mining and machine-learning approaches, which require large and representative training-set compounds to learn robust decision rules. The explosive growth in the amount of public domain-available chemical and biological data has generated huge effort to design, analyze, and apply novel learning methodologies. Here, I focus on machine-learning techniques within the context of ligand-based VS (LBVS). In addition, I analyze several relevant VS studies from recent publications, providing a detailed view of the current state-of-the-art in this field and highlighting not only the problematic issues, but also the successes and opportunities for further advances.
Collapse
Affiliation(s)
- Antonio Lavecchia
- Department of Pharmacy, Drug Discovery Laboratory, University of Napoli 'Federico II', via D. Montesano 49, I-80131 Napoli, Italy.
| |
Collapse
|
16
|
Stock M, Fober T, Hüllermeier E, Glinca S, Klebe G, Pahikkala T, Airola A, De Baets B, Waegeman W. Identification of Functionally Related Enzymes by Learning-to-Rank Methods. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:1157-1169. [PMID: 26357052 DOI: 10.1109/tcbb.2014.2338308] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Enzyme sequences and structures are routinely used in the biological sciences as queries to search for functionally related enzymes in online databases. To this end, one usually departs from some notion of similarity, comparing two enzymes by looking for correspondences in their sequences, structures or surfaces. For a given query, the search operation results in a ranking of the enzymes in the database, from very similar to dissimilar enzymes, while information about the biological function of annotated database enzymes is ignored. In this work, we show that rankings of that kind can be substantially improved by applying kernel-based learning algorithms. This approach enables the detection of statistical dependencies between similarities of the active cleft and the biological function of annotated enzymes. This is in contrast to search-based approaches, which do not take annotated training data into account. Similarity measures based on the active cleft are known to outperform sequence-based or structure-based measures under certain conditions. We consider the Enzyme Commission (EC) classification hierarchy for obtaining annotated enzymes during the training phase. The results of a set of sizeable experiments indicate a consistent and significant improvement for a set of similarity measures that exploit information about small cavities in the surface of enzymes.
Collapse
|
17
|
Korkmaz S, Zararsiz G, Goksuluk D. Drug/nondrug classification using Support Vector Machines with various feature selection strategies. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2014; 117:51-60. [PMID: 25224081 DOI: 10.1016/j.cmpb.2014.08.009] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Revised: 08/15/2014] [Accepted: 08/27/2014] [Indexed: 06/03/2023]
Abstract
In conjunction with the advance in computer technology, virtual screening of small molecules has been started to use in drug discovery. Since there are thousands of compounds in early-phase of drug discovery, a fast classification method, which can distinguish between active and inactive molecules, can be used for screening large compound collections. In this study, we used Support Vector Machines (SVM) for this type of classification task. SVM is a powerful classification tool that is becoming increasingly popular in various machine-learning applications. The data sets consist of 631 compounds for training set and 216 compounds for a separate test set. In data pre-processing step, the Pearson's correlation coefficient used as a filter to eliminate redundant features. After application of the correlation filter, a single SVM has been applied to this reduced data set. Moreover, we have investigated the performance of SVM with different feature selection strategies, including SVM-Recursive Feature Elimination, Wrapper Method and Subset Selection. All feature selection methods generally represent better performance than a single SVM while Subset Selection outperforms other feature selection methods. We have tested SVM as a classification tool in a real-life drug discovery problem and our results revealed that it could be a useful method for classification task in early-phase of drug discovery.
Collapse
Affiliation(s)
- Selcuk Korkmaz
- Hacettepe University, Faculty of Medicine, Department of Biostatistics, 06100 Sihhiye, Ankara, Turkey.
| | - Gokmen Zararsiz
- Hacettepe University, Faculty of Medicine, Department of Biostatistics, 06100 Sihhiye, Ankara, Turkey
| | - Dincer Goksuluk
- Hacettepe University, Faculty of Medicine, Department of Biostatistics, 06100 Sihhiye, Ankara, Turkey
| |
Collapse
|
18
|
Balfer J, Heikamp K, Laufer S, Bajorath J. Modeling of Compound Profiling Experiments Using Support Vector Machines. Chem Biol Drug Des 2014; 84:75-85. [DOI: 10.1111/cbdd.12294] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2013] [Revised: 01/06/2014] [Accepted: 01/19/2014] [Indexed: 11/28/2022]
Affiliation(s)
- Jenny Balfer
- Department of Life Science Informatics; B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry; Rheinische Friedrich-Wilhelms-Universität; Dahlmannstr. 2 D-53113 Bonn Germany
| | - Kathrin Heikamp
- Department of Life Science Informatics; B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry; Rheinische Friedrich-Wilhelms-Universität; Dahlmannstr. 2 D-53113 Bonn Germany
| | - Stefan Laufer
- Department of Pharmacy and Biochemistry, Pharmaceutical/Medicinal Chemistry; Eberhard-Karls-Universität Tübingen; Auf der Morgenstelle 8 D-72076 Tübingen Germany
| | - Jürgen Bajorath
- Department of Life Science Informatics; B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry; Rheinische Friedrich-Wilhelms-Universität; Dahlmannstr. 2 D-53113 Bonn Germany
| |
Collapse
|
19
|
|
20
|
Vogt M, Bajorath J. Chemoinformatics: A view of the field and current trends in method development. Bioorg Med Chem 2012; 20:5317-23. [DOI: 10.1016/j.bmc.2012.03.030] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2012] [Revised: 03/09/2012] [Accepted: 03/12/2012] [Indexed: 12/18/2022]
|
21
|
Pérez-Castillo Y, Lazar C, Taminau J, Froeyen M, Cabrera-Pérez MÁ, Nowé A. GA(M)E-QSAR: A Novel, Fully Automatic Genetic-Algorithm-(Meta)-Ensembles Approach for Binary Classification in Ligand-Based Drug Design. J Chem Inf Model 2012; 52:2366-86. [DOI: 10.1021/ci300146h] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Affiliation(s)
- Yunierkis Pérez-Castillo
- Computational Modeling Lab (CoMo), Department
of Computer Sciences, Faculty
of Sciences, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussel, Belgium
- Molecular Simulations and Drug
Design Group, Centro de Bioactivos Químicos, Universidad Central “Marta Abreu” de Las Villas, Santa
Clara, Cuba
- Laboratory for
Medicinal Chemistry,
Rega Institute for Medical Research, Katholieke Universiteit Leuven, Minderbroedersstraat 10, B-3000 Leuven, Belgium
| | - Cosmin Lazar
- Computational Modeling Lab (CoMo), Department
of Computer Sciences, Faculty
of Sciences, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussel, Belgium
| | - Jonatan Taminau
- Computational Modeling Lab (CoMo), Department
of Computer Sciences, Faculty
of Sciences, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussel, Belgium
| | - Mathy Froeyen
- Laboratory for
Medicinal Chemistry,
Rega Institute for Medical Research, Katholieke Universiteit Leuven, Minderbroedersstraat 10, B-3000 Leuven, Belgium
| | - Miguel Ángel Cabrera-Pérez
- Molecular Simulations and Drug
Design Group, Centro de Bioactivos Químicos, Universidad Central “Marta Abreu” de Las Villas, Santa
Clara, Cuba
- Engineering
Department, Pharmacy and Pharmaceutical Technology Area,
Faculty of Pharmacy, University Miguel Hernandez, Alicante 03550, Spain
| | - Ann Nowé
- Computational Modeling Lab (CoMo), Department
of Computer Sciences, Faculty
of Sciences, Vrije Universiteit Brussel, Pleinlaan 2, B-1050 Brussel, Belgium
| |
Collapse
|
22
|
Varnek A, Baskin I. Machine learning methods for property prediction in chemoinformatics: Quo Vadis? J Chem Inf Model 2012; 52:1413-37. [PMID: 22582859 DOI: 10.1021/ci200409x] [Citation(s) in RCA: 152] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
This paper is focused on modern approaches to machine learning, most of which are as yet used infrequently or not at all in chemoinformatics. Machine learning methods are characterized in terms of the "modes of statistical inference" and "modeling levels" nomenclature and by considering different facets of the modeling with respect to input/ouput matching, data types, models duality, and models inference. Particular attention is paid to new approaches and concepts that may provide efficient solutions of common problems in chemoinformatics: improvement of predictive performance of structure-property (activity) models, generation of structures possessing desirable properties, model applicability domain, modeling of properties with functional endpoints (e.g., phase diagrams and dose-response curves), and accounting for multiple molecular species (e.g., conformers or tautomers).
Collapse
Affiliation(s)
- Alexandre Varnek
- Laboratoire d'Infochimie, UMR 7177 CNRS, Université de Strasbourg, 4, rue B. Pascal, Strasbourg 67000, France.
| | | |
Collapse
|
23
|
Hansen K, Baehrens D, Schroeter T, Rupp M, Müller KR. Visual Interpretation of Kernel-Based Prediction Models. Mol Inform 2011; 30:817-26. [DOI: 10.1002/minf.201100059] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2011] [Accepted: 07/21/2011] [Indexed: 02/05/2023]
|