1
|
ABP-Finder: A Tool to Identify Antibacterial Peptides and the Gram-Staining Type of Targeted Bacteria. Antibiotics (Basel) 2022; 11:antibiotics11121708. [PMID: 36551365 PMCID: PMC9774453 DOI: 10.3390/antibiotics11121708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 11/16/2022] [Accepted: 11/17/2022] [Indexed: 11/29/2022] Open
Abstract
Multi-drug resistance in bacteria is a major health problem worldwide. To overcome this issue, new approaches allowing for the identification and development of antibacterial agents are urgently needed. Peptides, due to their binding specificity and low expected side effects, are promising candidates for a new generation of antibiotics. For over two decades, a large diversity of antimicrobial peptides (AMPs) has been discovered and annotated in public databases. The AMP family encompasses nearly 20 biological functions, thus representing a potentially valuable resource for data mining analyses. Nonetheless, despite the availability of machine learning-based approaches focused on AMPs, these tools lack evidence of successful application for AMPs' discovery, and many are not designed to predict a specific function for putative AMPs, such as antibacterial activity. Consequently, among the apparent variety of data mining methods to screen peptide sequences for antibacterial activity, only few tools can deal with such task consistently, although with limited precision and generally no information about the possible targets. Here, we addressed this gap by introducing a tool specifically designed to identify antibacterial peptides (ABPs) with an estimation of which type of bacteria is susceptible to the action of these peptides, according to their response to the Gram-staining assay. Our tool is freely available via a web server named ABP-Finder. This new method ranks within the top state-of-the-art ABP predictors, particularly in terms of precision. Importantly, we showed the successful application of ABP-Finder for the screening of a large peptide library from the human urine peptidome and the identification of an antibacterial peptide.
Collapse
|
2
|
Romero-Molina S, Ruiz-Blanco YB, Mieres-Perez J, Harms M, Münch J, Ehrmann M, Sanchez-Garcia E. PPI-Affinity: A Web Tool for the Prediction and Optimization of Protein-Peptide and Protein-Protein Binding Affinity. J Proteome Res 2022; 21:1829-1841. [PMID: 35654412 PMCID: PMC9361347 DOI: 10.1021/acs.jproteome.2c00020] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
Virtual screening
of protein–protein and protein–peptide
interactions is a challenging task that directly impacts the processes
of hit identification and hit-to-lead optimization in drug design
projects involving peptide-based pharmaceuticals. Although several
screening tools designed to predict the binding affinity of protein–protein
complexes have been proposed, methods specifically developed to predict
protein–peptide binding affinity are comparatively scarce.
Frequently, predictors trained to score the affinity of small molecules
are used for peptides indistinctively, despite the larger complexity
and heterogeneity of interactions rendered by peptide binders. To
address this issue, we introduce PPI-Affinity, a tool that leverages
support vector machine (SVM) predictors of binding affinity to screen
datasets of protein–protein and protein–peptide complexes,
as well as to generate and rank mutants of a given structure. The
performance of the SVM models was assessed on four benchmark datasets,
which include protein–protein and protein–peptide binding
affinity data. In addition, we evaluated our model on a set of mutants
of EPI-X4, an endogenous peptide inhibitor of the chemokine receptor
CXCR4, and on complexes of the serine proteases HTRA1 and HTRA3 with
peptides. PPI-Affinity is freely accessible at https://protdcal.zmb.uni-due.de/PPIAffinity.
Collapse
Affiliation(s)
- Sandra Romero-Molina
- Computational Biochemistry, Center of Medical Biotechnology, University of Duisburg-Essen, Essen 45141, Germany
| | - Yasser B Ruiz-Blanco
- Computational Biochemistry, Center of Medical Biotechnology, University of Duisburg-Essen, Essen 45141, Germany
| | - Joel Mieres-Perez
- Computational Biochemistry, Center of Medical Biotechnology, University of Duisburg-Essen, Essen 45141, Germany
| | - Mirja Harms
- Institute of Molecular Virology, Ulm University Medical Center, Ulm 89081, Germany
| | - Jan Münch
- Institute of Molecular Virology, Ulm University Medical Center, Ulm 89081, Germany.,Core Facility Functional Peptidomics, Ulm University Medical Center, Ulm 89081, Germany
| | - Michael Ehrmann
- Faculty of Biology, Center of Medical Biotechnology, University of Duisburg-Essen, Essen 45141, Germany
| | - Elsa Sanchez-Garcia
- Computational Biochemistry, Center of Medical Biotechnology, University of Duisburg-Essen, Essen 45141, Germany
| |
Collapse
|
3
|
Zhao H, Su Y, Wang M, Lyu Z, Xu P, Jiao Y, Zhang L, Han W, Tian L, Fu P. The Machine Learning Model for Distinguishing Pathological Subtypes of Non-Small Cell Lung Cancer. Front Oncol 2022; 12:875761. [PMID: 35692759 PMCID: PMC9177952 DOI: 10.3389/fonc.2022.875761] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 04/26/2022] [Indexed: 12/15/2022] Open
Abstract
Purpose Machine learning models were developed and validated to identify lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) using clinical factors, laboratory metrics, and 2-deoxy-2[18F]fluoro-D-glucose ([18F]F-FDG) positron emission tomography (PET)/computed tomography (CT) radiomic features. Methods One hundred and twenty non-small cell lung cancer (NSCLC) patients (62 LUAD and 58 LUSC) were analyzed retrospectively and randomized into a training group (n = 85) and validation group (n = 35). A total of 99 feature parameters—four clinical factors, four laboratory indicators, and 91 [18F]F-FDG PET/CT radiomic features—were used for data analysis and model construction. The Boruta algorithm was used to screen the features. The retained minimum optimal feature subset was input into ten machine learning to construct a classifier for distinguishing between LUAD and LUSC. Univariate and multivariate analyses were used to identify the independent risk factors of the NSCLC subtype and constructed the Clinical model. Finally, the area under the receiver operating characteristic curve (AUC) values, sensitivity, specificity, and accuracy (ACC) was used to validate the machine learning model with the best performance effect and Clinical model in the validation group, and the DeLong test was used to compare the model performance. Results Boruta algorithm selected the optimal subset consisting of 13 features, including two clinical features, two laboratory indicators, and nine PEF/CT radiomic features. The Random Forest (RF) model and Support Vector Machine (SVM) model in the training group showed the best performance. Gender (P=0.018) and smoking status (P=0.011) construct the Clinical model. In the validation group, the SVM model (AUC: 0.876, ACC: 0.800) and RF model (AUC: 0.863, ACC: 0.800) performed well, while Clinical model (AUC:0.712, ACC: 0.686) performed moderately. There was no significant difference between the RF and Clinical models, but the SVM model was significantly better than the Clinical model. Conclusions The proposed SVM and RF models successfully identified LUAD and LUSC. The results indicate that the proposed model is an accurate and noninvasive predictive tool that can assist clinical decision-making, especially for patients who cannot have biopsies or where a biopsy fails.
Collapse
Affiliation(s)
- Hongyue Zhao
- Department of Nuclear Medicine, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Yexin Su
- Department of Magnetic Resonance, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Mengjiao Wang
- Department of Nuclear Medicine, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Zhehao Lyu
- Department of Nuclear Medicine, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Peng Xu
- Department of Nuclear Medicine, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Yuying Jiao
- Department of Nuclear Medicine, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Linhan Zhang
- Department of Nuclear Medicine, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Wei Han
- Department of Nuclear Medicine, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Lin Tian
- Department of Pathology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Peng Fu
- Department of Nuclear Medicine, The First Affiliated Hospital of Harbin Medical University, Harbin, China
- *Correspondence: Peng Fu,
| |
Collapse
|
4
|
Yang L, Jiao X. Distinguishing Enzymes and Non-enzymes Based on Structural Information with an Alignment Free Approach. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200324134037] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Knowledge of protein functions is very crucial for the understanding of biological processes. Experimental methods for protein function prediction are powerless to treat the growing amount of protein sequence and structure data.
Objective:
To develop some computational techniques for the protein function prediction.
Method:
Based on the residue interaction network features and the motion mode information, an
SVM model was constructed and used as the predictor. The role of these features was analyzed
and some interesting results were obtained.
Results:
An alignment-free method for the classification of enzyme and non-enzyme is developed in this work. There is not any single feature that occupies a dominant position in the prediction process. The topological and the information-theoretic residue interaction network features have a better performance. The combination of the fast mode and the slow mode can get a better explanation for the classification result.
Conclusion:
The method proposed in this paper can act as a classifier for the enzymes and nonenzymes.
Collapse
Affiliation(s)
- Lifeng Yang
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030600,China
| | - Xiong Jiao
- College of Biomedical Engineering, Taiyuan University of Technology, Taiyuan, 030600,China
| |
Collapse
|
5
|
Romero-Molina S, Ruiz-Blanco YB, Green JR, Sanchez-Garcia E. ProtDCal-Suite: A web server for the numerical codification and functional analysis of proteins. Protein Sci 2020; 28:1734-1743. [PMID: 31271472 DOI: 10.1002/pro.3673] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Revised: 06/21/2019] [Accepted: 06/24/2019] [Indexed: 12/24/2022]
Abstract
Computational tools for the analysis of protein data and the prediction of biological properties are essential in life sciences and biomedical research. Here, we introduce ProtDCal-Suite, a web server comprising a set of machine learning-based methods for studying proteins. The main module of ProtDCal-Suite is the ProtDCal software. ProtDCal translates the structural information of proteins into numerical descriptors that serve as input to machine-learning techniques. The ProtDCal-Suite server also incorporates a post-processing optional stage that allows ranking and filtering the obtained descriptors by computing their Shannon entropy values across the input set of proteins. ProtDCal's codification was used in the development of models for the prediction of specific protein properties. Thus, the other modules of ProtDCal-Suite are protein analysis tools implemented using ProtDCal's descriptors. Among them are PPI-Detect, for predicting the interaction likelihood of protein-protein and protein-peptide pairs, Enzyme Identifier, for identifying enzymes from amino acid sequences or 3D structures, and Pred-NGlyco, for predicting N-glycosylation sites. ProtDCal-Suite is freely accessible at https://protdcal.zmb.uni-due.de.
Collapse
Affiliation(s)
- Sandra Romero-Molina
- Computational Biochemistry, Center of Medical Biotechnology, University of Duisburg-Essen, Essen, Germany
| | - Yasser B Ruiz-Blanco
- Computational Biochemistry, Center of Medical Biotechnology, University of Duisburg-Essen, Essen, Germany
| | - James R Green
- Systems and Computer Engineering, Carleton University, Ottawa, Ontario, Canada
| | - Elsa Sanchez-Garcia
- Computational Biochemistry, Center of Medical Biotechnology, University of Duisburg-Essen, Essen, Germany
| |
Collapse
|
6
|
Contreras-Torres E, Marrero-Ponce Y, Terán JE, García-Jacas CR, Brizuela CA, Sánchez-Rodríguez JC. MuLiMs-MCoMPAs: A Novel Multiplatform Framework to Compute Tensor Algebra-Based Three-Dimensional Protein Descriptors. J Chem Inf Model 2020; 60:1042-1059. [PMID: 31663741 DOI: 10.1021/acs.jcim.9b00629] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
This report introduces the MuLiMs-MCoMPAs software (acronym for Multi-Linear Maps based on N-Metric and Contact Matrices of 3D Protein and Amino-acid weightings), designed to compute tensor-based 3D protein structural descriptors by applying two- and three-linear algebraic forms. Moreover, these descriptors contemplate generalizing components such as novel 3D protein structural representations, (dis)similarity metrics, and multimetrics to extract geometrical related information between two and three amino acids, weighting schemes based on amino acid properties, matrix normalization procedures that consider simple-stochastic and mutual probability transformations, topological and geometrical cutoffs, amino acid, and group-based MD calculations, and aggregation operators for merging amino acidic and group MDs. The MuLiMs-MCoMPAs software, which belongs to the ToMoCoMD-CAMPS suite, was developed in Java (version 1.8) using the Chemistry Development Kit (CDK) (version 1.4.19) and the Jmol libraries. This software implemented a divide-and-conquer strategy to parallelize the computation of the indices as well as modules for data preprocessing and batch computing functionalities. Furthermore, it consists of two components: (i) a desktop-graphical user interface (GUI) and (ii) an API library. The relevance of this novel approach is demonstrated through two analyses that considered Shannon's entropy-based variability and a principal component analysis. These studies showed that the MuLiMs-MCoMPAs' three-linear descriptor family contains higher informational entropy than several other descriptors generated with available computation tools. Moreover, the MuLiMs-MCoMPAs indices capture additional orthogonal information to the one codified by the available calculation approaches. As a result, two sets of suggested theoretical configurations that contain 13648 two-linear indices and 20263 three-linear indices are available for download at tomocomd.com . Furthermore, as a demonstration of the applicability and easy integration of the MuLiMs library into a QSAR-based expert system, a software application (ProStAF) was generated to predict SCOP protein structural classes and folding rate. It can thus be anticipated that the MuLiMs-MCoMPAs framework will turn into a valuable contribution to the chem- and bioinformatics research fields.
Collapse
Affiliation(s)
- Ernesto Contreras-Torres
- Computer-Aided Molecular "Biosilico" Discovery and Bioinformatics Research International Network (CAMD-BIR IN) , Cumbayá, Quito , Ecuador.,Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas; and Instituto de Simulación Computacional (ISC-USFQ) , Universidad San Francisco de Quito (USFQ) , Diego de Robles y vía Interoceánica , Quito 170157 , Pichincha , Ecuador
| | - Yovani Marrero-Ponce
- Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas; and Instituto de Simulación Computacional (ISC-USFQ) , Universidad San Francisco de Quito (USFQ) , Diego de Robles y vía Interoceánica , Quito 170157 , Pichincha , Ecuador.,Grupo GINUMED, Facultad de Salud, Programa de Medicina , Corporacion Universitaria Rafal Nuñez , Cartagena , Colombia.,Unidad de Investigación de Diseño de Fármacos y Conectividad Molecular, Departamento de Química Física, Facultad de Farmacia , Universitat de València , 46010 Valéncia , Spain
| | - Julio E Terán
- Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas; and Instituto de Simulación Computacional (ISC-USFQ) , Universidad San Francisco de Quito (USFQ) , Diego de Robles y vía Interoceánica , Quito 170157 , Pichincha , Ecuador.,Grupo de Química Computacional y Teórica, Departamento de Ingeniería Química , Universidad San Francisco de Quito (USFQ) , Diego de Robles y vía Interoceánica , Quito 170157 , Pichincha Ecuador
| | - César R García-Jacas
- Cátedras Conacyt-Departamento de Ciencias de la Computación , Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE) , Ensenada , Baja California , México
| | - Carlos A Brizuela
- Departamento de Ciencias de la Computación , Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE) , Ensenada , Baja California , México
| | | |
Collapse
|
7
|
Agüero-Chapin G, Galpert D, Molina-Ruiz R, Ancede-Gallardo E, Pérez-Machado G, De la Riva GA, Antunes A. Graph Theory-Based Sequence Descriptors as Remote Homology Predictors. Biomolecules 2019; 10:E26. [PMID: 31878100 PMCID: PMC7022958 DOI: 10.3390/biom10010026] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 12/16/2019] [Accepted: 12/18/2019] [Indexed: 12/23/2022] Open
Abstract
Alignment-free (AF) methodologies have increased in popularity in the last decades as alternative tools to alignment-based (AB) algorithms for performing comparative sequence analyses. They have been especially useful to detect remote homologs within the twilight zone of highly diverse gene/protein families and superfamilies. The most popular alignment-free methodologies, as well as their applications to classification problems, have been described in previous reviews. Despite a new set of graph theory-derived sequence/structural descriptors that have been gaining relevance in the detection of remote homology, they have been omitted as AF predictors when the topic is addressed. Here, we first go over the most popular AF approaches used for detecting homology signals within the twilight zone and then bring out the state-of-the-art tools encoding graph theory-derived sequence/structure descriptors and their success for identifying remote homologs. We also highlight the tendency of integrating AF features/measures with the AB ones, either into the same prediction model or by assembling the predictions from different algorithms using voting/weighting strategies, for improving the detection of remote signals. Lastly, we briefly discuss the efforts made to scale up AB and AF features/measures for the comparison of multiple genomes and proteomes. Alongside the achieved experiences in remote homology detection by both the most popular AF tools and other less known ones, we provide our own using the graphical-numerical methodologies, MARCH-INSIDE, TI2BioP, and ProtDCal. We also present a new Python-based tool (SeqDivA) with a friendly graphical user interface (GUI) for delimiting the twilight zone by using several similar criteria.
Collapse
Affiliation(s)
- Guillermin Agüero-Chapin
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Deborah Galpert
- Departamento de Ciencia de la Computación. Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Reinaldo Molina-Ruiz
- Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Evys Ancede-Gallardo
- Programa de Doctorado en Fisicoquímica Molecular, Facultad de Ciencias Exactas, Universidad Andrés Bello, Av. República 239, Santiago 8370146, Chile;
| | - Gisselle Pérez-Machado
- EpiDisease S.L. Spin-Off of Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), 46980 Valencia, Spain;
| | - Gustavo A. De la Riva
- Laboratorio de Biotecnología Aplicada S. de R.L. de C.V., GRECA Inc., Carretera La Piedad-Carapán, km 3.5, La Piedad, Michoacán 59300, Mexico;
- Tecnológico Nacional de México, Instituto Tecnológico de la Piedad, Av. Ricardo Guzmán Romero, Santa Fe, La Piedad de Cavadas, Michoacán 59370, Mexico
| | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| |
Collapse
|
8
|
Terán JE, Marrero-Ponce Y, Contreras-Torres E, García-Jacas CR, Vivas-Reyes R, Terán E, Torres FJ. Tensor Algebra-based Geometrical (3D) Biomacro-Molecular Descriptors for Protein Research: Theory, Applications and Comparison with other Methods. Sci Rep 2019; 9:11391. [PMID: 31388082 PMCID: PMC6684663 DOI: 10.1038/s41598-019-47858-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Accepted: 07/22/2019] [Indexed: 11/16/2022] Open
Abstract
In this report, a new type of tridimensional (3D) biomacro-molecular descriptors for proteins are proposed. These descriptors make use of multi-linear algebra concepts based on the application of 3-linear forms (i.e., Canonical Trilinear (Tr), Trilinear Cubic (TrC), Trilinear-Quadratic-Bilinear (TrQB) and so on) as a specific case of the N-linear algebraic forms. The definition of the kth 3-tuple similarity-dissimilarity spatial matrices (Tensor’s Form) are used for the transformation and for the representation of the existing chemical information available in the relationships between three amino acids of a protein. Several metrics (Minkowski-type, wave-edge, etc) and multi-metrics (Triangle area, Bond-angle, etc) are proposed for the interaction information extraction, as well as probabilistic transformations (e.g., simple stochastic and mutual probability) to achieve matrix normalization. A generalized procedure considering amino acid level-based indices that can be fused together by using aggregator operators for descriptors calculations is proposed. The obtained results demonstrated that the new proposed 3D biomacro-molecular indices perform better than other approaches in the SCOP-based discrimination and the prediction of folding rate of proteins by using simple linear parametrical models. It can be concluded that the proposed method allows the definition of 3D biomacro-molecular descriptors that contain orthogonal information capable of providing better models for applications in protein science.
Collapse
Affiliation(s)
- Julio E Terán
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Translacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Quito, Pichincha, Ecuador.,Universidad San Francisco de Quito (USFQ), Grupo de Química Computacional y Teórica (QCT-USFQ), Departamento de Ingeniería Química, and Instituto de Simulación Computacional (ISC-USFQ), Quito, Pichincha, Ecuador
| | - Yovani Marrero-Ponce
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Translacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Quito, Pichincha, Ecuador. .,Universidad de San Buenaventura - Cartagena - Facultad de Ciencias de la Salud - Grupo de Investigación Microbiología & Ambiente (GIMA) - Calle Real de Ternera, Diagonal 32, No. 30-966, Cartagena, Código postal: 1300 10, Colombia.
| | - Ernesto Contreras-Torres
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Translacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Quito, Pichincha, Ecuador
| | - César R García-Jacas
- Cátedras CONACYT - Departamento de Ciencia de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California, Mexico
| | - Ricardo Vivas-Reyes
- Grupo de Química Cuántica y Teórica de la Universidad de Cartagena-Facultad de Ciencias Exactas y Naturales. Programa de Química. Campus de San Pablo and Grupo GINUMED Corporacion Universitaria Rafal Nuñez. Facultad de Salud. Programa de Medicina., Cartagena, Colombia.,Grupo CipTec, Facultad de Ingenierias. Fundacion Universitaria Tecnologico Comfenalco - Cartagena, Cartagena, Bolívar, Colombia
| | - Enrique Terán
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Translacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Quito, Pichincha, Ecuador
| | - F Javier Torres
- Universidad San Francisco de Quito (USFQ), Grupo de Química Computacional y Teórica (QCT-USFQ), Departamento de Ingeniería Química, and Instituto de Simulación Computacional (ISC-USFQ), Quito, Pichincha, Ecuador
| |
Collapse
|
9
|
Romero-Molina S, Ruiz-Blanco YB, Harms M, Münch J, Sanchez-Garcia E. PPI-Detect: A support vector machine model for sequence-based prediction of protein-protein interactions. J Comput Chem 2019; 40:1233-1242. [PMID: 30768790 DOI: 10.1002/jcc.25780] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Revised: 11/29/2018] [Accepted: 12/29/2018] [Indexed: 12/18/2022]
Abstract
The prediction of peptide-protein or protein-protein interactions (PPI) is a challenging task, especially if amino acid sequences are the only information available. Machine learning methods allow us to exploit the information content in PPI datasets. However, the numerical codification of these datasets often influences the performance of data mining approaches. Here, we introduce a procedure for the general-purpose numerical codification of polypeptides. This procedure transforms pairs of amino acid sequences into a machine learning-friendly vector, whose elements represent numerical descriptors of residues in proteins. We used this numerical encoding procedure for the development of a support vector machine model (PPI-Detect), which allows predicting whether two proteins will interact or not. PPI-Detect (https://ppi-detect.zmb.uni-due.de/) outperforms state of the art sequence-based predictors of PPI. We employed PPI-Detect for the analysis of derivatives of EPI-X4, an endogenous peptide inhibitor of CXCR4, a G-protein-coupled receptor. There, we identified with high accuracy those peptides which bind better than EPI-X4 to the receptor. Also using PPI-Detect, we designed a novel peptide and then experimentally established its anti-CXCR4 activity. © 2019 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Sandra Romero-Molina
- Center of Medical Biotechnology, University of Duisburg-Essen, Duisburg, Germany
| | - Yasser B Ruiz-Blanco
- Center of Medical Biotechnology, University of Duisburg-Essen, Duisburg, Germany
| | - Mirja Harms
- Institute of Molecular Virology, Ulm University Medical Center, Ulm, Germany
| | - Jan Münch
- Institute of Molecular Virology, Ulm University Medical Center, Ulm, Germany.,Core Facility Functional Peptidomics, Ulm University Medical Center, Ulm, Germany
| | - Elsa Sanchez-Garcia
- Center of Medical Biotechnology, University of Duisburg-Essen, Duisburg, Germany
| |
Collapse
|