Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Perez-Riverol Y, Audain E, Millan A, Ramos Y, Sanchez A, Vizcaíno JA, Wang R, Müller M, Machado YJ, Betancourt LH, González LJ, Padrón G, Besada V. Isoelectric point optimization using peptide descriptors and support vector machines. J Proteomics 2012;75:2269-74. [PMID: 22326964 DOI: 10.1016/j.jprot.2012.01.029] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2011] [Revised: 01/23/2012] [Accepted: 01/25/2012] [Indexed: 11/24/2022]

For:	Perez-Riverol Y, Audain E, Millan A, Ramos Y, Sanchez A, Vizcaíno JA, Wang R, Müller M, Machado YJ, Betancourt LH, González LJ, Padrón G, Besada V. Isoelectric point optimization using peptide descriptors and support vector machines. J Proteomics 2012;75:2269-74. [PMID: 22326964 DOI: 10.1016/j.jprot.2012.01.029] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2011] [Revised: 01/23/2012] [Accepted: 01/25/2012] [Indexed: 11/24/2022]

Number

Cited by Other Article(s)

Wang J, Chen C, Yao G, Ding J, Wang L, Jiang H. Intelligent Protein Design and Molecular Characterization Techniques: A Comprehensive Review. Molecules 2023;28:7865. [PMID: 38067593 PMCID: PMC10707872 DOI: 10.3390/molecules28237865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/13/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open

Moloney NM, Barylyuk K, Tromer E, Crook OM, Breckels LM, Lilley KS, Waller RF, MacGregor P. Mapping diversity in African trypanosomes using high resolution spatial proteomics. Nat Commun 2023;14:4401. [PMID: 37479728 PMCID: PMC10361982 DOI: 10.1038/s41467-023-40125-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 07/06/2023] [Indexed: 07/23/2023] Open

Chiriac MC, Haber M, Salcher MM. Adaptive genetic traits in pelagic freshwater microbes. Environ Microbiol 2023;25:606-641. [PMID: 36513610 DOI: 10.1111/1462-2920.16313] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Accepted: 12/12/2022] [Indexed: 12/15/2022]

Kozlowski LP. IPC 2.0: prediction of isoelectric point and pKa dissociation constants. Nucleic Acids Res 2021;49:W285-W292. [PMID: 33905510 PMCID: PMC8262712 DOI: 10.1093/nar/gkab295] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 04/03/2021] [Accepted: 04/12/2021] [Indexed: 01/05/2023] Open

Ricardo F, Pradilla D, Cruz JC, Alvarez O. Emerging Emulsifiers: Conceptual Basis for the Identification and Rational Design of Peptides with Surface Activity. Int J Mol Sci 2021;22:4615. [PMID: 33924804 PMCID: PMC8124350 DOI: 10.3390/ijms22094615] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 04/24/2021] [Accepted: 04/26/2021] [Indexed: 01/06/2023] Open

Sampaio-Dias IE, Rodríguez-Borges JE, Yáñez-Pérez V, Arrasate S, Llorente J, Brea JM, Bediaga H, Viña D, Loza MI, Caamaño O, García-Mera X, González-Díaz H. Synthesis, Pharmacological, and Biological Evaluation of 2-Furoyl-Based MIF-1 Peptidomimetics and the Development of a General-Purpose Model for Allosteric Modulators (ALLOPTML). ACS Chem Neurosci 2021;12:203-215. [PMID: 33347281 DOI: 10.1021/acschemneuro.0c00687] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open

Abstract

This work describes the synthesis and pharmacological evaluation of 2-furoyl-based Melanostatin (MIF-1) peptidomimetics as dopamine D₂ modulating agents. Eight novel peptidomimetics were tested for their ability to enhance the maximal effect of tritiated N-propylapomorphine ([³H]-NPA) at D₂ receptors (D₂R). In this series, 2-furoyl-l-leucylglycinamide (6a) produced a statistically significant increase in the maximal [³H]-NPA response at 10 pM (11 ± 1%), comparable to the effect of MIF-1 (18 ± 9%) at the same concentration. This result supports previous evidence that the replacement of proline residue by heteroaromatic scaffolds are tolerated at the allosteric binding site of MIF-1. Biological assays performed for peptidomimetic 6a using cortex neurons from 19-day-old Wistar-Kyoto rat embryos suggest that 6a displays no neurotoxicity up to 100 μM. Overall, the pharmacological and toxicological profile and the structural simplicity of 6a makes this peptidomimetic a potential lead compound for further development and optimization, paving the way for the development of novel modulating agents of D₂R suitable for the treatment of CNS-related diseases. Additionally, the pharmacological and biological data herein reported, along with >20 000 outcomes of preclinical assays, was used to seek a general model to predict the allosteric modulatory potential of molecular candidates for a myriad of target receptors, organisms, cell lines, and biological activity parameters based on perturbation theory (PT) ideas and machine learning (ML) techniques, abbreviated as ALLOPTML. By doing so, ALLOPTML shows high specificity Sp = 89.2/89.4%, sensitivity Sn = 71.3/72.2%, and accuracy Ac = 86.1%/86.4% in training/validation series, respectively. To the best of our knowledge, ALLOPTML is the first general-purpose chemoinformatic tool using a PTML-based model for the multioutput and multicondition prediction of allosteric compounds, which is expected to save both time and resources during the early drug discovery of allosteric modulators.

Collapse

Affiliation(s)

Ivo E. Sampaio-Dias LAQV/REQUIMTE, Dept. of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
José E. Rodríguez-Borges LAQV/REQUIMTE, Dept. of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
Víctor Yáñez-Pérez Dept. of Organic Chemistry II, University of Basque Country (UPV-EHU), 48940 Leioa, Spain
Sonia Arrasate Dept. of Pharmacology, Faculty of Medicine and Nursing, University of Basque Country (UPV-EHU), 48940 Leioa, Spain
Javier Llorente Dept. of Pharmacology, Faculty of Medicine and Nursing, University of Basque Country (UPV-EHU), 48940 Leioa, Spain Dept. of Pharmacology, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
José M. Brea Innopharma Screening Platform, Biofarma Research group, Centre of Research in Molecular Medicine and Chronic Diseases CIMUS, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
Harbil Bediaga Dept. of Organic Chemistry II, University of Basque Country (UPV-EHU), 48940 Leioa, Spain Dept. of Physical Chemistry, University of Basque Country (UPV-EHU), 48940 Leioa, Spain
Dolores Viña Dept. of Pharmacology, Faculty of Pharmacy, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain Centre of Research in Molecular Medicine and Chronic Diseases CIMUS, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
María Isabel Loza Innopharma Screening Platform, Biofarma Research group, Centre of Research in Molecular Medicine and Chronic Diseases CIMUS, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
Olga Caamaño Dept. of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
Xerardo García-Mera Dept. of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
Humberto González-Díaz Dept. of Organic Chemistry II, University of Basque Country (UPV-EHU), 48940 Leioa, Spain Basque Center for Biophysics (CSIC UPV/EHU), University of Basque Country (UPV-EHU), 48940 Leioa, Spain IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Spain

Collapse

Gómez SA, Rojas‐Valencia N, Gómez S, Egidi F, Cappelli C, Restrepo A. Binding of SARS-CoV-2 to Cell Receptors: A Tale of Molecular Evolution. Chembiochem 2020;22:724-732. [PMID: 32986926 PMCID: PMC7537057 DOI: 10.1002/cbic.202000618] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 09/26/2020] [Indexed: 12/31/2022]

Sandoval-Lira J, Mondragón-Solórzano G, Lugo-Fuentes LI, Barroso-Flores J. Accurate Estimation of pK_b Values for Amino Groups from Surface Electrostatic Potential (V_S,min) Calculations: The Isoelectric Points of Amino Acids as a Case Study. J Chem Inf Model 2020;60:1445-1452. [PMID: 32108480 DOI: 10.1021/acs.jcim.9b01173] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

Ramos Y, González A, Sosa‐Acosta P, Perez‐Riverol Y, García Y, Castellanos‐Serra L, Gil J, Sánchez A, González LJ, Besada V. Sodium dodecyl sulfate free gel electrophoresis/electroelution sorting for peptide fractionation. J Sep Sci 2019;42:3712-3717. [DOI: 10.1002/jssc.201900495] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Revised: 10/08/2019] [Accepted: 10/10/2019] [Indexed: 12/13/2022]

Gomez N, Barkhordarian H, Lull J, Huh J, GhattyVenkataKrishna P, Zhang X. Perfusion CHO cell culture applied to lower aggregation and increase volumetric productivity for a bispecific recombinant protein. J Biotechnol 2019;304:70-77. [DOI: 10.1016/j.jbiotec.2019.08.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2019] [Revised: 06/14/2019] [Accepted: 08/01/2019] [Indexed: 11/29/2022]

De Las Rivas J, Bonavides-Martínez C, Campos-Laborie FJ. Bioinformatics in Latin America and SoIBio impact, a tale of spin-off and expansion around genomes and protein structures. Brief Bioinform 2019;20:390-397. [PMID: 28981567 PMCID: PMC6433739 DOI: 10.1093/bib/bbx064] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2017] [Revised: 04/18/2017] [Indexed: 11/30/2022] Open

Vásquez-Domínguez E, Armijos-Jaramillo VD, Tejera E, González-Díaz H. Multioutput Perturbation-Theory Machine Learning (PTML) Model of ChEMBL Data for Antiretroviral Compounds. Mol Pharm 2019;16:4200-4212. [PMID: 31426639 DOI: 10.1021/acs.molpharmaceut.9b00538] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

Abstract

Retroviral infections, such as HIV, are, until now, diseases with no cure. Medicine and pharmaceutical chemistry need and consider it a huge goal to define target proteins of new antiretroviral compounds. ChEMBL manages Big Data features with a complex data set, which is hard to organize. This makes information difficult to analyze due to a big number of characteristics described in order to predict new drug candidates for retroviral infections. For this reason, we propose to develop a new predictive model combining perturbation theory (PT) bases and machine learning (ML) modeling to create a new tool that can take advantage of all the available information. The PTML model proposed in this work for the ChEMBL data set preclinical experimental assays for antiretroviral compounds consists of a linear equation with four variables. The PT operators used are founded on multicondition moving averages, combining different features and simplifying the difficulty to manage all data. More than 140 000 preclinical assays for 56 105 compounds with different characteristics or experimental conditions have been carried out and can be found in ChEMBL database, covering combinations with 359 biological activity parameters (c₀), 55 protein accessions (c₁), 83 cell lines (c₂), 64 organisms of assay (c₃), and 773 subtypes or strains. We have included 150 148 preclinical experimental assays for HIV virus, 1188 for HTLV virus, 84 for simian immunodeficiency virus, 370 for murine leukemia virus, 119 for Rous sarcoma virus, 1581 for MMTV, etc. We also included 5277 assays for hepatitis B virus. The developed PTML model reached considerable values in sensibility (73.05% for training and 73.10% for validation), specificity (86.61% for training and 87.17% for validation), and accuracy (75.84% for training and 75.98% for validation). We also compared alternative PTML models with different PT operators such as covariance, moments, and exponential terms. Finally, we made a comparison between literature ML models with our PTML model and also artificial neural network (ANN) nonlinear models. We conclude that this PTML model is the first one to consider multiple characteristics of preclinical experimental antiretroviral assays combined, generating a simple, useful, and adaptable instrument, which could reduce time and costs in antiretroviral drugs research.

Collapse

Ferreira da Costa J, Silva D, Caamaño O, Brea JM, Loza MI, Munteanu CR, Pazos A, García-Mera X, González-Díaz H. Perturbation Theory/Machine Learning Model of ChEMBL Data for Dopamine Targets: Docking, Synthesis, and Assay of New l-Prolyl-l-leucyl-glycinamide Peptidomimetics. ACS Chem Neurosci 2018;9:2572-2587. [PMID: 29791132 DOI: 10.1021/acschemneuro.8b00083] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

Abstract

Predicting drug-protein interactions (DPIs) for target proteins involved in dopamine pathways is a very important goal in medicinal chemistry. We can tackle this problem using Molecular Docking or Machine Learning (ML) models for one specific protein. Unfortunately, these models fail to account for large and complex big data sets of preclinical assays reported in public databases. This includes multiple conditions of assays, such as different experimental parameters, biological assays, target proteins, cell lines, organism of the target, or organism of assay. On the other hand, perturbation theory (PT) models allow us to predict the properties of a query compound or molecular system in experimental assays with multiple boundary conditions based on a previously known case of reference. In this work, we report the first PTML (PT + ML) study of a large ChEMBL data set of preclinical assays of compounds targeting dopamine pathway proteins. The best PTML model found predicts 50000 cases with accuracy of 70-91% in training and external validation series. We also compared the linear PTML model with alternative PTML models trained with multiple nonlinear methods (artificial neural network (ANN), Random Forest, Deep Learning, etc.). Some of the nonlinear methods outperform the linear model but at the cost of a notable increment of the complexity of the model. We illustrated the practical use of the new model with a proof-of-concept theoretical-experimental study. We reported for the first time the organic synthesis, chemical characterization, and pharmacological assay of a new series of l-prolyl-l-leucyl-glycinamide (PLG) peptidomimetic compounds. In addition, we performed a molecular docking study for some of these compounds with the software Vina AutoDock. The work ends with a PTML model predictive study of the outcomes of the new compounds in a large number of assays. Therefore, this study offers a new computational methodology for predicting the outcome for any compound in new assays. This PTML method focuses on the prediction with a simple linear model of multiple pharmacological parameters (IC₅₀, EC₅₀, K_i, etc.) for compounds in assays involving different cell lines used, organisms of the protein target, or organism of assay for proteins in the dopamine pathway.

Collapse

Bediaga H, Arrasate S, González-Díaz H. PTML Combinatorial Model of ChEMBL Compounds Assays for Multiple Types of Cancer. ACS COMBINATORIAL SCIENCE 2018;20:621-632. [PMID: 30240186 DOI: 10.1021/acscombsci.8b00090] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Abstract

Determining the target proteins of new anticancer compounds is a very important task in Medicinal Chemistry. In this sense, chemists carry out preclinical assays with a high number of combinations of experimental conditions (c _j). In fact, ChEMBL database contains outcomes of 65 534 different anticancer activity preclinical assays for 35 565 different chemical compounds (1.84 assays per compound). These assays cover different combinations of c _j formed from >70 different biological activity parameters ( c₀), >300 different drug targets ( c₁), >230 cell lines ( c₂), and 5 organisms of assay ( c₃) or organisms of the target ( c₄). It include a total of 45 833 assays in leukemia, 6227 assays in breast cancer, 2499 assays in ovarian cancer, 3499 in colon cancer, 3159 in lung cancer, 2750 in prostate cancer, 601 in melanoma, etc. This is a very complex data set with multiple Big Data features. This data is hard to be rationalized by researchers to extract useful relationships and predict new compounds. In this context, we propose to combine perturbation theory (PT) ideas and machine learning (ML) modeling to solve this combinatorial-like problem. In this work, we report a PTML (PT + ML) model for ChEMBL data set of preclinical assays of anticancer compounds. This is a simple linear model with only three variables. The model presented values of area under receiver operating curve = AUROC = 0.872, specificity = Sp(%) = 90.2, sensitivity = Sn(%) = 70.6, and overall accuracy = Ac(%) = 87.7 in training series. The model also have Sp(%) = 90.1, Sn(%) = 71.4, and Ac(%) = 87.8 in external validation series. The model use PT operators based on multicondition moving averages to capture all the complexity of the data set. We also compared the model with nonlinear artificial neural network (ANN) models obtaining similar results. This confirms the hypothesis of a linear relationship between the PT operators and the classification as anticancer compounds in different combinations of assay conditions. Last, we compared the model with other PTML models reported in the literature concluding that this is the only one PTML model able to predict activity against multiple types of cancer. This model is a simple but versatile tool for the prediction of the targets of anticancer compounds taking into consideration multiple combinations of experimental conditions in preclinical assays.

Collapse

Tovar G. Design of a software for calculating isoelectric point of a polypeptide according to their net charge using the graphical programming language LabVIEW. BIOCHEMISTRY AND MOLECULAR BIOLOGY EDUCATION : A BIMONTHLY PUBLICATION OF THE INTERNATIONAL UNION OF BIOCHEMISTRY AND MOLECULAR BIOLOGY 2018;46:39-46. [PMID: 29105959 DOI: 10.1002/bmb.21088] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Revised: 06/23/2017] [Accepted: 10/06/2017] [Indexed: 06/07/2023]

Accurate and fast feature selection workflow for high-dimensional omics data. PLoS One 2017;12:e0189875. [PMID: 29261781 PMCID: PMC5738110 DOI: 10.1371/journal.pone.0189875] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2017] [Accepted: 12/04/2017] [Indexed: 02/04/2023] Open

Bjerrum EJ, Jensen JH, Tolborg JL. pICalculax: Improved Prediction of Isoelectric Point for Modified Peptides. J Chem Inf Model 2017;57:1723-1727. [PMID: 28671456 DOI: 10.1021/acs.jcim.7b00030] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

Audain E, Uszkoreit J, Sachsenberg T, Pfeuffer J, Liang X, Hermjakob H, Sanchez A, Eisenacher M, Reinert K, Tabb DL, Kohlbacher O, Perez-Riverol Y. In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics. J Proteomics 2017;150:170-182. [DOI: 10.1016/j.jprot.2016.08.002] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Revised: 07/30/2016] [Accepted: 08/02/2016] [Indexed: 12/24/2022]

Mao J, Moore LR, Blank CE, Wu EHH, Ackerman M, Ranade S, Cui H. Microbial phenomics information extractor (MicroPIE): a natural language processing tool for the automated acquisition of prokaryotic phenotypic characters from text sources. BMC Bioinformatics 2016;17:528. [PMID: 27955641 PMCID: PMC5153691 DOI: 10.1186/s12859-016-1396-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2016] [Accepted: 11/29/2016] [Indexed: 12/24/2022] Open

Abstract

BACKGROUND

The large-scale analysis of phenomic data (i.e., full phenotypic traits of an organism, such as shape, metabolic substrates, and growth conditions) in microbial bioinformatics has been hampered by the lack of tools to rapidly and accurately extract phenotypic data from existing legacy text in the field of microbiology. To quickly obtain knowledge on the distribution and evolution of microbial traits, an information extraction system needed to be developed to extract phenotypic characters from large numbers of taxonomic descriptions so they can be used as input to existing phylogenetic analysis software packages.

RESULTS

We report the development and evaluation of Microbial Phenomics Information Extractor (MicroPIE, version 0.1.0). MicroPIE is a natural language processing application that uses a robust supervised classification algorithm (Support Vector Machine) to identify characters from sentences in prokaryotic taxonomic descriptions, followed by a combination of algorithms applying linguistic rules with groups of known terms to extract characters as well as character states. The input to MicroPIE is a set of taxonomic descriptions (clean text). The output is a taxon-by-character matrix-with taxa in the rows and a set of 42 pre-defined characters (e.g., optimum growth temperature) in the columns. The performance of MicroPIE was evaluated against a gold standard matrix and another student-made matrix. Results show that, compared to the gold standard, MicroPIE extracted 21 characters (50%) with a Relaxed F1 score > 0.80 and 16 characters (38%) with Relaxed F1 scores ranging between 0.50 and 0.80. Inclusion of a character prediction component (SVM) improved the overall performance of MicroPIE, notably the precision. Evaluated against the same gold standard, MicroPIE performed significantly better than the undergraduate students.

CONCLUSION

MicroPIE is a promising new tool for the rapid and efficient extraction of phenotypic character information from prokaryotic taxonomic descriptions. However, further development, including incorporation of ontologies, will be necessary to improve the performance of the extraction for some character types.

Collapse

Kozlowski LP. IPC - Isoelectric Point Calculator. Biol Direct 2016;11:55. [PMID: 27769290 PMCID: PMC5075173 DOI: 10.1186/s13062-016-0159-9] [Citation(s) in RCA: 239] [Impact Index Per Article: 29.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2016] [Accepted: 10/10/2016] [Indexed: 01/19/2023] Open

Gatto L, Breckels LM, Naake T, Gibb S. Visualization of proteomics data using R and bioconductor. Proteomics 2016;15:1375-89. [PMID: 25690415 PMCID: PMC4510819 DOI: 10.1002/pmic.201400392] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2014] [Revised: 02/05/2015] [Accepted: 02/09/2015] [Indexed: 12/30/2022]

Audain E, Ramos Y, Hermjakob H, Flower DR, Perez-Riverol Y. Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences. ACTA ACUST UNITED AC 2015;32:821-7. [PMID: 26568629 PMCID: PMC5939969 DOI: 10.1093/bioinformatics/btv674] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Accepted: 11/10/2015] [Indexed: 12/02/2022]

Abstract

Motivation: In any macromolecular polyprotic system—for example protein, DNA or RNA—the isoelectric point—commonly referred to as the pI—can be defined as the point of singularity in a titration curve, corresponding to the solution pH value at which the net overall surface charge—and thus the electrophoretic mobility—of the ampholyte sums to zero. Different modern analytical biochemistry and proteomics methods depend on the isoelectric point as a principal feature for protein and peptide characterization. Protein separation by isoelectric point is a critical part of 2-D gel electrophoresis, a key precursor of proteomics, where discrete spots can be digested in-gel, and proteins subsequently identified by analytical mass spectrometry. Peptide fractionation according to their pI is also widely used in current proteomics sample preparation procedures previous to the LC-MS/MS analysis. Therefore accurate theoretical prediction of pI would expedite such analysis. While such pI calculation is widely used, it remains largely untested, motivating our efforts to benchmark pI prediction methods.

Results: Using data from the database PIP-DB and one publically available dataset as our reference gold standard, we have undertaken the benchmarking of pI calculation methods. We find that methods vary in their accuracy and are highly sensitive to the choice of basis set. The machine-learning algorithms, especially the SVM-based algorithm, showed a superior performance when studying peptide mixtures. In general, learning-based pI prediction methods (such as Cofactor, SVM and Branca) require a large training dataset and their resulting performance will strongly depend of the quality of that data. In contrast with Iterative methods, machine-learning algorithms have the advantage of being able to add new features to improve the accuracy of prediction.

Contact: yperez@ebi.ac.uk

Availability and Implementation: The software and data are freely available at https://github.com/ypriverol/pIR.

Supplementary information:Supplementary data are available at Bioinformatics online.

Collapse

Perez-Riverol Y, Xu QW, Wang R, Uszkoreit J, Griss J, Sanchez A, Reisinger F, Csordas A, Ternent T, Del-Toro N, Dianes JA, Eisenacher M, Hermjakob H, Vizcaíno JA. PRIDE Inspector Toolsuite: Moving Toward a Universal Visualization Tool for Proteomics Data Standard Formats and Quality Assessment of ProteomeXchange Datasets. Mol Cell Proteomics 2015;15:305-17. [PMID: 26545397 PMCID: PMC4762524 DOI: 10.1074/mcp.o115.050229] [Citation(s) in RCA: 121] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2015] [Indexed: 12/25/2022] Open

Affiliation(s)

Yasset Perez-Riverol From the ‡European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Qing-Wei Xu From the ‡European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Rui Wang From the ‡European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Julian Uszkoreit §Ruhr-Universität Bochum, Medizinisches Proteom-Zenter, Medical Bioinformatics, ZKF, E.142, Universitätsstr. 150, D-44801 Bochum, Germany
Johannes Griss From the ‡European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK; ¶Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Austria
Aniel Sanchez ‖Department of Proteomics, Center for Genetic Engineering and Biotechnology, Ciudad de la Habana, Cuba
Florian Reisinger From the ‡European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Attila Csordas From the ‡European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Tobias Ternent From the ‡European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Noemi Del-Toro From the ‡European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Jose A Dianes From the ‡European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Martin Eisenacher §Ruhr-Universität Bochum, Medizinisches Proteom-Zenter, Medical Bioinformatics, ZKF, E.142, Universitätsstr. 150, D-44801 Bochum, Germany
Henning Hermjakob From the ‡European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Juan Antonio Vizcaíno From the ‡European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK;

Collapse

Skvortsov VS, Alekseytchuk NN, Khudyakov DV, Romero Reyes IV. pIPredict: A computer tool for prediction of isoelectric points of peptides and proteins. BIOCHEMISTRY MOSCOW-SUPPLEMENT SERIES B-BIOMEDICAL CHEMISTRY 2015. [DOI: 10.1134/s1990750815030099] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Perez-Riverol Y, Uszkoreit J, Sanchez A, Ternent T, Del Toro N, Hermjakob H, Vizcaíno JA, Wang R. ms-data-core-api: an open-source, metadata-oriented library for computational proteomics. Bioinformatics 2015;31:2903-5. [PMID: 25910694 PMCID: PMC4547611 DOI: 10.1093/bioinformatics/btv250] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Accepted: 04/20/2015] [Indexed: 11/20/2022] Open

Wang R, Perez-Riverol Y, Hermjakob H, Vizcaíno JA. Open source libraries and frameworks for biological data visualisation: a guide for developers. Proteomics 2015;15:1356-74. [PMID: 25475079 PMCID: PMC4409855 DOI: 10.1002/pmic.201400377] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2014] [Revised: 10/21/2014] [Accepted: 11/26/2014] [Indexed: 12/21/2022]

Pirmoradian M, Zhang B, Chingin K, Astorga-Wells J, Zubarev RA. Membrane-Assisted Isoelectric Focusing Device As a Micropreparative Fractionator for Two-Dimensional Shotgun Proteomics. Anal Chem 2014;86:5728-32. [DOI: 10.1021/ac404180e] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Perez-Riverol Y, Hermjakob H, Kohlbacher O, Martens L, Creasy D, Cox J, Leprevost F, Shan BP, Pérez-Nueno VI, Blazejczyk M, Punta M, Vierlinger K, Valiente PA, Leon K, Chinea G, Guirola O, Bringas R, Cabrera G, Guillen G, Padron G, Gonzalez LJ, Besada V. Computational proteomics pitfalls and challenges: HavanaBioinfo 2012 workshop report. J Proteomics 2013;87:134-8. [PMID: 23376229 DOI: 10.1016/j.jprot.2013.01.019] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2013] [Accepted: 01/22/2013] [Indexed: 10/27/2022]

Silvestre DD, Zoppis I, Brambilla F, Bellettato V, Mauri G, Mauri P. Availability of MudPIT data for classification of biological samples. J Clin Bioinforma 2013;3:1. [PMID: 23317455 PMCID: PMC3563498 DOI: 10.1186/2043-9113-3-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2012] [Accepted: 01/07/2013] [Indexed: 01/18/2023] Open