1
|
Giguère S, Laviolette F, Marchand M, Tremblay D, Moineau S, Liang X, Biron É, Corbeil J. Machine learning assisted design of highly active peptides for drug discovery. PLoS Comput Biol 2015; 11:e1004074. [PMID: 25849257 PMCID: PMC4388847 DOI: 10.1371/journal.pcbi.1004074] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Accepted: 12/05/2014] [Indexed: 01/15/2023] Open
Abstract
The discovery of peptides possessing high biological activity is very challenging due to the enormous diversity for which only a minority have the desired properties. To lower cost and reduce the time to obtain promising peptides, machine learning approaches can greatly assist in the process and even partly replace expensive laboratory experiments by learning a predictor with existing data or with a smaller amount of data generation. Unfortunately, once the model is learned, selecting peptides having the greatest predicted bioactivity often requires a prohibitive amount of computational time. For this combinatorial problem, heuristics and stochastic optimization methods are not guaranteed to find adequate solutions. We focused on recent advances in kernel methods and machine learning to learn a predictive model with proven success. For this type of model, we propose an efficient algorithm based on graph theory, that is guaranteed to find the peptides for which the model predicts maximal bioactivity. We also present a second algorithm capable of sorting the peptides of maximal bioactivity. Extensive analyses demonstrate how these algorithms can be part of an iterative combinatorial chemistry procedure to speed up the discovery and the validation of peptide leads. Moreover, the proposed approach does not require the use of known ligands for the target protein since it can leverage recent multi-target machine learning predictors where ligands for similar targets can serve as initial training data. Finally, we validated the proposed approach in vitro with the discovery of new cationic antimicrobial peptides. Source code freely available at http://graal.ift.ulaval.ca/peptide-design/. Part of the complexity of drug discovery is the sheer chemical diversity to explore combined to all requirements a compound must meet to become a commercial drug. Hence, it makes sense to automate this chemical exploration endeavor in a wise, informed, and efficient fashion. Here, we focused on peptides as they have properties that make them excellent drug starting points. Machine learning techniques may replace expensive in-vitro laboratory experiments by learning an accurate model of it. However, computational models also suffer from the combinatorial explosion due to the enormous chemical diversity. Indeed, applying the model to every peptides would take an astronomical amount of computer time. Therefore, given a model, is it possible to determine, using reasonable computational time, the peptide that has the best properties and chance for success? This exact question is what motivated our work. We focused on recent advances in kernel methods and machine learning to learn a model that already had excellent results. We demonstrate that this class of model has mathematical properties that makes it possible to rapidly identify and sort the best peptides. Finally, in-vitro and in-silico results are provided to support and validate this theoretical discovery.
Collapse
Affiliation(s)
- Sébastien Giguère
- Department of Computer Science and Software Engineering, Université Laval, Québec, Canada
- * E-mail:
| | - François Laviolette
- Department of Computer Science and Software Engineering, Université Laval, Québec, Canada
| | - Mario Marchand
- Department of Computer Science and Software Engineering, Université Laval, Québec, Canada
| | - Denise Tremblay
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, Canada
| | - Sylvain Moineau
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, Canada
| | - Xinxia Liang
- Faculty of Pharmacy, Université Laval, Québec, Canada
| | - Éric Biron
- Faculty of Pharmacy, Université Laval, Québec, Canada
| | - Jacques Corbeil
- Department of Molecular Medicine, Université Laval, Québec, Canada
| |
Collapse
|
2
|
Wang L, Dai Z, Zhang H, Bai L, Yuan Z. Quantitative Sequence-Activity Model Analysis of Oligopeptides Coupling an Improved High-Dimension Feature Selection Method with Support Vector Regression. Chem Biol Drug Des 2014; 83:379-91. [DOI: 10.1111/cbdd.12242] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2013] [Revised: 08/31/2013] [Accepted: 09/27/2013] [Indexed: 01/20/2023]
Affiliation(s)
- Lifeng Wang
- Hunan Provincial Key Laboratory of Crop Germplasm Innovation and Utilization; Hunan Agricultural University; Changsha 410128 China
- College of Plant Protection; Hunan Agricultural University; Changsha 410128 China
| | - Zhijun Dai
- Hunan Provincial Key Laboratory of Crop Germplasm Innovation and Utilization; Hunan Agricultural University; Changsha 410128 China
- College of Plant Protection; Hunan Agricultural University; Changsha 410128 China
| | - Hongyan Zhang
- Hunan Provincial Key Laboratory of Crop Germplasm Innovation and Utilization; Hunan Agricultural University; Changsha 410128 China
- College of Plant Protection; Hunan Agricultural University; Changsha 410128 China
| | - Lianyang Bai
- College of Plant Protection; Hunan Agricultural University; Changsha 410128 China
- Hunan Academy of Agricultural Sciences; Changsha 410125 China
| | - Zheming Yuan
- Hunan Provincial Key Laboratory of Crop Germplasm Innovation and Utilization; Hunan Agricultural University; Changsha 410128 China
- College of Plant Protection; Hunan Agricultural University; Changsha 410128 China
| |
Collapse
|
3
|
Abstract
INTRODUCTION A frightening increase in the number of isolated multidrug resistant bacterial strains linked to the decline in novel antimicrobial drugs entering the market is a great cause for concern. Cationic antimicrobial peptides (AMPs) have lately been introduced as a potential new class of antimicrobial drugs, and computational methods utilizing molecular descriptors can significantly accelerate the development of new peptide drug candidates. AREAS COVERED This paper gives a broad overview of peptide and amino-acid scale descriptors available for AMP modeling and highlights which of these are currently being used in quantitative structure-activity relationship (QSAR) studies for AMP optimization. Additionally, some key commercial computational tools are discussed, and both successful and less successful studies are referenced, illustrating some of the challenges facing AMP scientists. Through examples of different peptide QSAR studies, this review highlights some of the missing links and illuminates some of the questions that would be interesting to challenge in a more systematic fashion. EXPERT OPINION Computer-aided peptide QSAR using molecular descriptors may provide the necessary edge to peptide drug discovery, enabling successful design of a new generation anti-infective drug molecules. However, if this wonderful scenario is to play out, computational chemists and peptide microbiologists would need to start playing together and not just side by side.
Collapse
Affiliation(s)
- Håvard Jenssen
- Roskilde University, Institute of Science, Systems and Models, Universitetsvej 1, Building 17.1, DK-4000 Roskilde, Denmark +45 4674 2877 ; +45 4674 3010 ;
| |
Collapse
|
4
|
Tyunina EY, Badelin VG. Molecular descriptors of amino acids for the evaluation of the physicochemical parameters and biological activity of peptides. RUSSIAN JOURNAL OF BIOORGANIC CHEMISTRY 2009. [DOI: 10.1134/s1068162009040062] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
5
|
Gaussian process: an alternative approach for QSAM modeling of peptides. Amino Acids 2009; 38:199-212. [DOI: 10.1007/s00726-008-0228-1] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2008] [Accepted: 12/18/2008] [Indexed: 10/21/2022]
|
6
|
Austel V, Sjöström M, Eriksson L, Hall LH, Waterbeemd VDH, Costantino G, Clementi S, Cruciani G, Valigi R. Experimental Design in Synthesis Planning and Structure‐Property Correlations. ACTA ACUST UNITED AC 2008. [DOI: 10.1002/9783527615452.ch3] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
7
|
|
8
|
Lu Y, Bulka B, desJardins M, Freeland SJ. Amino acid quantitative structure property relationship database: a web-based platform for quantitative investigations of amino acids. Protein Eng Des Sel 2007; 20:347-51. [PMID: 17557765 DOI: 10.1093/protein/gzm027] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Here, we present the AA-QSPR Db (Amino Acid Quantitative Structure Property Relationship Database): a novel, freely available web-resource of data pertaining to amino acids, both engineered and naturally occurring. In addition to presenting fundamental molecular descriptors of size, charge and hydrophobicity, it also includes online visualization tools for users to perform instant, interactive analyses of amino acid sub-sets in which they are interested. The database has been designed with extensible markup language technology to provide a flexible structure, suitable for future development. In addition to providing easy access for queries by external computers, it also offers a user-friendly web-based interface that facilitates human interactions (submission, storage and retrieval of amino acid data) and an associated e-forum that encourages users to question and discuss current and future database contents.
Collapse
Affiliation(s)
- Yi Lu
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD 21250, USA
| | | | | | | |
Collapse
|
9
|
A new descriptor of amino acids based on the three-dimensional vector of atomic interaction field. ACTA ACUST UNITED AC 2006. [DOI: 10.1007/s11434-006-0524-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
10
|
Mei H, Liao ZH, Zhou Y, Li SZ. A new set of amino acid descriptors and its application in peptide QSARs. Biopolymers 2006; 80:775-86. [PMID: 15895431 DOI: 10.1002/bip.20296] [Citation(s) in RCA: 124] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
In this work, a new set of amino acid descriptors, i.e., VHSE (principal components score Vectors of Hydrophobic, Steric, and Electronic properties), is derived from principal components analysis (PCA) on independent families of 18 hydrophobic properties, 17 steric properties, and 15 electronic properties, respectively, which are included in total 50 physicochemical variables of 20 coded amino acids. Using the stepwise multiple regression (SMR) method combined with partial least squares (PLS), the VHSE scales are then applied to QSAR studies of bitter-tasting dipeptides (BTD), angiotensin-converting enzyme (ACE) inhibitors, and bradykinin-potentiating pentapeptides (BPP). To validate the predictive power of resulting models, external validation are also performed. A comparison of the results to those obtained with z scores and other two-dimensional (2D) or three-dimensional(3D) descriptors shows that the VHSE scales are comparable for parameterizing the structural variability of the peptide series.
Collapse
Affiliation(s)
- Hu Mei
- College of Chemistry and Chemical Engineering, Chongqing University, Chongqing, 400044 People's Republic of China
| | | | | | | |
Collapse
|
11
|
Theoretical Study on Hydrophobicity of Amino Acids by the Solvation Free Energy Density Model. B KOREAN CHEM SOC 2003. [DOI: 10.5012/bkcs.2003.24.12.1742] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
12
|
Gustafsson C, Govindarajan S, Emig R. Exploration of sequence space for protein engineering. J Mol Recognit 2001; 14:308-14. [PMID: 11746951 DOI: 10.1002/jmr.543] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The process of protein engineering is currently evolving towards a heuristic understanding of the sequence-function relationship. Improved DNA sequencing capacity, efficient protein function characterization and improved quality of data points in conjunction with well-established statistical tools from other industries are changing the protein engineering field. Algorithms capturing the heuristic sequence-function relationships will have a drastic impact on the field of protein engineering. In this review, several alternative approaches to quantitatively assess sequence space are discussed and the relatively few examples of wet-lab validation of statistical sequence-function characterization/correlation are described.
Collapse
Affiliation(s)
- C Gustafsson
- Maxygen Inc., Galveston Drive 515, Redwood City, CA 94063, USA.
| | | | | |
Collapse
|
13
|
Matter H. A validation study of molecular descriptors for the rational design of peptide libraries. THE JOURNAL OF PEPTIDE RESEARCH : OFFICIAL JOURNAL OF THE AMERICAN PEPTIDE SOCIETY 1998; 52:305-14. [PMID: 9832309 DOI: 10.1111/j.1399-3011.1998.tb01245.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Important molecular descriptors used for establishing quantitative structure-activity relationships are investigated to classify similar versus dissimilar peptides. When searching new lead structures, synthesizing and testing compounds which are too similar wastes time and resources. In contrast, any lead optimization program requires the investigation of similar compounds to that lead. Thus, it is important to maximize or minimize the structural diversity of peptides to design useful compound libraries for lead finding or lead refinement projects. If a molecular descriptor is a useful measure of similarity for the design of peptide libraries, small differences in this descriptor for a pair of molecules should only translate into small biological differences. Using this paradigm as a basis for descriptor validation, it was possible to rank different molecular descriptors. Those physicochemical descriptors are 2D fingerprints and five experimentally or theoretically derived principal property scales. Some theoretically derived metrics are obtained by computing interaction energies or similarity indices on predefined 3D grid points using canonical conformations for individual amino acids. The resulting 3D data matrices are analyzed using a principal component analysis leading to three principal properties for CoMFA (Comparative Molecular Field Analysis) or CoMSIA (Comparative Molecular Similarity Index Analysis) derived molecular fields. The descriptor validation results reveal the applicability of design tools on peptide data sets. Experimentally derived descriptors, in general, are more acceptable than computationally derived metrics, while the latter provide a statistically valid alternative to characterize novel building blocks. The CoMSIA metrics perform slightly better than the CoMFA-based principal properties, while GRID-based descriptors are always less acceptable.
Collapse
Affiliation(s)
- H Matter
- Hoechst Marion Roussel AG, Computational Chemistry, Core Research Functions, Frankfurt am Main, Germany.
| |
Collapse
|
14
|
Sotomatsu-Niwa T, Ogino A. Evaluation of the hydrophobic parameters of the amino acid side chains of peptides and their application in QSAR and conformational studies. ACTA ACUST UNITED AC 1997. [DOI: 10.1016/s0166-1280(97)90371-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
15
|
Meki AR, Nassar AY, Rochat H. A bradykinin-potentiating peptide (peptide K12) isolated from the venom of Egyptian scorpion Buthus occitanus. Peptides 1995; 16:1359-65. [PMID: 8745044 DOI: 10.1016/0196-9781(95)02036-5] [Citation(s) in RCA: 57] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
A nontoxic peptide with bradykinin-potentiating activity was isolated from the dialyzed venom of the scorpion Buthus occitanus by reverse-phase high performance liquid chromatography (RP-HPLC). The pharmacological activity of the peptide was bioassayed by its ability to potentiate added bradykinin (BK) on the isolated guinea pig ileum as well as the isolated rat uterus for contraction. Moreover, the peptide potentiates in vivo the depressor effect of BK on arterial blood pressure in the normotensive anesthetized rat. Chemical characterization of the peptide was also performed. The amino acid composition of the peptide showed 21 amino acid residues per molecule including three proline residues. The amino acid sequence of the purified peptide was confirmed by mass spectrometry. Either N- or C-terminal ends were free. The sequence does not show a homology with bradykinin-potentiating peptides isolated from either scorpion or snake venoms. Furthermore, we did not find a significant sequence homology between the sequence of the isolated peptide and any of proteins or peptides in GenPro or NBRF data banks. The peptide also inhibited angiotensin-converting enzyme (ACE), and could not serve as substrate for the enzyme. It could be concluded that the mechanism of bradykinin-potentiating peptide (BPP) activity may be due to ACE inhibition.
Collapse
Affiliation(s)
- A R Meki
- Department of Biochemistry, Faculty of Medicine, University of Assiut, Egypt
| | | | | |
Collapse
|
16
|
Abstract
A simple and computationally nonintensive technique based on principal component analysis of 3-dimensional fields to derive theoretical descriptors is presented. The descriptors are then applied to a quantitative structure-activity relationship study on bradykinin potentiating pentapeptides.
Collapse
Affiliation(s)
- U Norinder
- Astra Research Centre, Södertälje, Sweden
| |
Collapse
|
17
|
Jonsson JÃ, Eriksson L, Hellberg S, Sjöström M, Wold S. Multivariate Parametrization of 55 Coded and Non-Coded Amino Acids. ACTA ACUST UNITED AC 1989. [DOI: 10.1002/qsar.19890080303] [Citation(s) in RCA: 100] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
18
|
Ståhle L, Wold S. Multivariate data analysis and experimental design in biomedical research. PROGRESS IN MEDICINAL CHEMISTRY 1988; 25:291-338. [PMID: 3076969 DOI: 10.1016/s0079-6468(08)70281-9] [Citation(s) in RCA: 108] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
19
|
Wold S, Sjöström M, Carlson R, Lundstedt T, Hellberg S, Skagerberg B, Wikström C, Öhman J. Multivariate design. Anal Chim Acta 1986. [DOI: 10.1016/s0003-2670(00)86294-7] [Citation(s) in RCA: 80] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|