1
|
Wang J, Chen C, Yao G, Ding J, Wang L, Jiang H. Intelligent Protein Design and Molecular Characterization Techniques: A Comprehensive Review. Molecules 2023; 28:7865. [PMID: 38067593 PMCID: PMC10707872 DOI: 10.3390/molecules28237865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/13/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open
Abstract
In recent years, the widespread application of artificial intelligence algorithms in protein structure, function prediction, and de novo protein design has significantly accelerated the process of intelligent protein design and led to many noteworthy achievements. This advancement in protein intelligent design holds great potential to accelerate the development of new drugs, enhance the efficiency of biocatalysts, and even create entirely new biomaterials. Protein characterization is the key to the performance of intelligent protein design. However, there is no consensus on the most suitable characterization method for intelligent protein design tasks. This review describes the methods, characteristics, and representative applications of traditional descriptors, sequence-based and structure-based protein characterization. It discusses their advantages, disadvantages, and scope of application. It is hoped that this could help researchers to better understand the limitations and application scenarios of these methods, and provide valuable references for choosing appropriate protein characterization techniques for related research in the field, so as to better carry out protein research.
Collapse
Affiliation(s)
| | | | | | - Junjie Ding
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| | - Liangliang Wang
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| | - Hui Jiang
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| |
Collapse
|
2
|
Moloney NM, Barylyuk K, Tromer E, Crook OM, Breckels LM, Lilley KS, Waller RF, MacGregor P. Mapping diversity in African trypanosomes using high resolution spatial proteomics. Nat Commun 2023; 14:4401. [PMID: 37479728 PMCID: PMC10361982 DOI: 10.1038/s41467-023-40125-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 07/06/2023] [Indexed: 07/23/2023] Open
Abstract
African trypanosomes are dixenous eukaryotic parasites that impose a significant human and veterinary disease burden on sub-Saharan Africa. Diversity between species and life-cycle stages is concomitant with distinct host and tissue tropisms within this group. Here, the spatial proteomes of two African trypanosome species, Trypanosoma brucei and Trypanosoma congolense, are mapped across two life-stages. The four resulting datasets provide evidence of expression of approximately 5500 proteins per cell-type. Over 2500 proteins per cell-type are classified to specific subcellular compartments, providing four comprehensive spatial proteomes. Comparative analysis reveals key routes of parasitic adaptation to different biological niches and provides insight into the molecular basis for diversity within and between these pathogen species.
Collapse
Affiliation(s)
- Nicola M Moloney
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1QW, UK
| | | | - Eelco Tromer
- Cell Biochemistry, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, 9747 AG, Groningen, Netherlands
| | - Oliver M Crook
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1QW, UK
- Department of Statistics, University of Oxford, Oxford, OX1 3LB, UK
| | - Lisa M Breckels
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1QW, UK
| | - Kathryn S Lilley
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1QW, UK
| | - Ross F Waller
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1QW, UK
| | - Paula MacGregor
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1QW, UK.
- School of Biological Sciences, University of Bristol, Bristol, BS8 1TQ, UK.
| |
Collapse
|
3
|
Chiriac MC, Haber M, Salcher MM. Adaptive genetic traits in pelagic freshwater microbes. Environ Microbiol 2023; 25:606-641. [PMID: 36513610 DOI: 10.1111/1462-2920.16313] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Accepted: 12/12/2022] [Indexed: 12/15/2022]
Abstract
Pelagic microbes have adopted distinct strategies to inhabit the pelagial of lakes and oceans and can be broadly categorized in two groups: free-living, specialized oligotrophs and patch-associated generalists or copiotrophs. In this review, we aim to identify genomic traits that enable pelagic freshwater microbes to thrive in their habitat. To do so, we discuss the main genetic differences of pelagic marine and freshwater microbes that are both dominated by specialized oligotrophs and the difference to freshwater sediment microbes, where copiotrophs are more prevalent. We phylogenomically analysed a collection of >7700 metagenome-assembled genomes, classified habitat preferences on different taxonomic levels, and compared the metabolic traits of pelagic freshwater, marine, and freshwater sediment microbes. Metabolic differences are mainly associated with transport functions, environmental information processing, components of the electron transport chain, osmoregulation and the isoelectric point of proteins. Several lineages with known habitat transitions (Nitrososphaeria, SAR11, Methylophilaceae, Synechococcales, Flavobacteriaceae, Planctomycetota) and the underlying mechanisms in this process are discussed in this review. Additionally, the distribution, ecology and genomic make-up of the most abundant freshwater prokaryotes are described in details in separate chapters for Actinobacteriota, Bacteroidota, Burkholderiales, Verrucomicrobiota, Chloroflexota, and 'Ca. Patescibacteria'.
Collapse
Affiliation(s)
| | - Markus Haber
- Institute of Hydrobiology, Biology Centre CAS, Ceske Budejovice, Czechia
| | - Michaela M Salcher
- Institute of Hydrobiology, Biology Centre CAS, Ceske Budejovice, Czechia
| |
Collapse
|
4
|
Kozlowski LP. IPC 2.0: prediction of isoelectric point and pKa dissociation constants. Nucleic Acids Res 2021; 49:W285-W292. [PMID: 33905510 PMCID: PMC8262712 DOI: 10.1093/nar/gkab295] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 04/03/2021] [Accepted: 04/12/2021] [Indexed: 01/05/2023] Open
Abstract
The isoelectric point is the pH at which a particular molecule is electrically neutral due to the equilibrium of positive and negative charges. In proteins and peptides, this depends on the dissociation constant (pKa) of charged groups of seven amino acids and NH+ and COO− groups at polypeptide termini. Information regarding isoelectric point and pKa is extensively used in two-dimensional gel electrophoresis (2D-PAGE), capillary isoelectric focusing (cIEF), crystallisation, and mass spectrometry. Therefore, there is a strong need for the in silico prediction of isoelectric point and pKa values. In this paper, I present Isoelectric Point Calculator 2.0 (IPC 2.0), a web server for the prediction of isoelectric points and pKa values using a mixture of deep learning and support vector regression models. The prediction accuracy (RMSD) of IPC 2.0 for proteins and peptides outperforms previous algorithms: 0.848 versus 0.868 and 0.222 versus 0.405, respectively. Moreover, the IPC 2.0 prediction of pKa using sequence information alone was better than the prediction from structure-based methods (0.576 versus 0.826) and a few folds faster. The IPC 2.0 webserver is freely available at www.ipc2-isoelectric-point.org
Collapse
Affiliation(s)
- Lukasz Pawel Kozlowski
- Institute of Informatics, Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, Warsaw, Mazovian Voivodeship 02-097, Poland
| |
Collapse
|
5
|
Ricardo F, Pradilla D, Cruz JC, Alvarez O. Emerging Emulsifiers: Conceptual Basis for the Identification and Rational Design of Peptides with Surface Activity. Int J Mol Sci 2021; 22:4615. [PMID: 33924804 PMCID: PMC8124350 DOI: 10.3390/ijms22094615] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 04/24/2021] [Accepted: 04/26/2021] [Indexed: 01/06/2023] Open
Abstract
Emulsifiers are gradually evolving from synthetic molecules of petrochemical origin to biomolecules mainly due to health and environmental concerns. Peptides represent a type of biomolecules whose molecular structure is composed of a sequence of amino acids that can be easily tailored to have specific properties. However, the lack of knowledge about emulsifier behavior, structure-performance relationships, and the implementation of different design routes have limited the application of these peptides. Some computational and experimental approaches have tried to close this knowledge gap, but restrictions in understanding the fundamental phenomena and the limited property data availability have made the performance prediction for emulsifier peptides an area of intensive research. This study provides the concepts necessary to understand the emulsifying behavior of peptides. Additionally, a straightforward description is given of how the molecular structure and conditions of the system directly impact the peptides' ability to stabilize emulsion droplets. Moreover, the routes to design and discover novel peptides with interfacial and emulsifying activity are also discussed, along with the strategies to address some of their major pitfalls and challenges. Finally, this contribution reviews methodologies to build and use data sets containing standard properties of emulsifying peptides by looking at successful applications in different fields.
Collapse
Affiliation(s)
- Fabian Ricardo
- Department of Chemical and Food Engineering, Universidad de los Andes, Bogotá 111711, Colombia; (F.R.); (D.P.)
| | - Diego Pradilla
- Department of Chemical and Food Engineering, Universidad de los Andes, Bogotá 111711, Colombia; (F.R.); (D.P.)
| | - Juan C. Cruz
- Department of Biomedical Engineering, Universidad de los Andes, Bogotá 111711, Colombia;
| | - Oscar Alvarez
- Department of Chemical and Food Engineering, Universidad de los Andes, Bogotá 111711, Colombia; (F.R.); (D.P.)
| |
Collapse
|
6
|
Sampaio-Dias IE, Rodríguez-Borges JE, Yáñez-Pérez V, Arrasate S, Llorente J, Brea JM, Bediaga H, Viña D, Loza MI, Caamaño O, García-Mera X, González-Díaz H. Synthesis, Pharmacological, and Biological Evaluation of 2-Furoyl-Based MIF-1 Peptidomimetics and the Development of a General-Purpose Model for Allosteric Modulators (ALLOPTML). ACS Chem Neurosci 2021; 12:203-215. [PMID: 33347281 DOI: 10.1021/acschemneuro.0c00687] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
This work describes the synthesis and pharmacological evaluation of 2-furoyl-based Melanostatin (MIF-1) peptidomimetics as dopamine D2 modulating agents. Eight novel peptidomimetics were tested for their ability to enhance the maximal effect of tritiated N-propylapomorphine ([3H]-NPA) at D2 receptors (D2R). In this series, 2-furoyl-l-leucylglycinamide (6a) produced a statistically significant increase in the maximal [3H]-NPA response at 10 pM (11 ± 1%), comparable to the effect of MIF-1 (18 ± 9%) at the same concentration. This result supports previous evidence that the replacement of proline residue by heteroaromatic scaffolds are tolerated at the allosteric binding site of MIF-1. Biological assays performed for peptidomimetic 6a using cortex neurons from 19-day-old Wistar-Kyoto rat embryos suggest that 6a displays no neurotoxicity up to 100 μM. Overall, the pharmacological and toxicological profile and the structural simplicity of 6a makes this peptidomimetic a potential lead compound for further development and optimization, paving the way for the development of novel modulating agents of D2R suitable for the treatment of CNS-related diseases. Additionally, the pharmacological and biological data herein reported, along with >20 000 outcomes of preclinical assays, was used to seek a general model to predict the allosteric modulatory potential of molecular candidates for a myriad of target receptors, organisms, cell lines, and biological activity parameters based on perturbation theory (PT) ideas and machine learning (ML) techniques, abbreviated as ALLOPTML. By doing so, ALLOPTML shows high specificity Sp = 89.2/89.4%, sensitivity Sn = 71.3/72.2%, and accuracy Ac = 86.1%/86.4% in training/validation series, respectively. To the best of our knowledge, ALLOPTML is the first general-purpose chemoinformatic tool using a PTML-based model for the multioutput and multicondition prediction of allosteric compounds, which is expected to save both time and resources during the early drug discovery of allosteric modulators.
Collapse
Affiliation(s)
- Ivo E. Sampaio-Dias
- LAQV/REQUIMTE, Dept. of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
| | - José E. Rodríguez-Borges
- LAQV/REQUIMTE, Dept. of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
| | - Víctor Yáñez-Pérez
- Dept. of Organic Chemistry II, University of Basque Country (UPV-EHU), 48940 Leioa, Spain
| | - Sonia Arrasate
- Dept. of Pharmacology, Faculty of Medicine and Nursing, University of Basque Country (UPV-EHU), 48940 Leioa, Spain
| | - Javier Llorente
- Dept. of Pharmacology, Faculty of Medicine and Nursing, University of Basque Country (UPV-EHU), 48940 Leioa, Spain
- Dept. of Pharmacology, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - José M. Brea
- Innopharma Screening Platform, Biofarma Research group, Centre of Research in Molecular Medicine and Chronic Diseases CIMUS, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - Harbil Bediaga
- Dept. of Organic Chemistry II, University of Basque Country (UPV-EHU), 48940 Leioa, Spain
- Dept. of Physical Chemistry, University of Basque Country (UPV-EHU), 48940 Leioa, Spain
| | - Dolores Viña
- Dept. of Pharmacology, Faculty of Pharmacy, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
- Centre of Research in Molecular Medicine and Chronic Diseases CIMUS, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - María Isabel Loza
- Innopharma Screening Platform, Biofarma Research group, Centre of Research in Molecular Medicine and Chronic Diseases CIMUS, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - Olga Caamaño
- Dept. of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - Xerardo García-Mera
- Dept. of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - Humberto González-Díaz
- Dept. of Organic Chemistry II, University of Basque Country (UPV-EHU), 48940 Leioa, Spain
- Basque Center for Biophysics (CSIC UPV/EHU), University of Basque Country (UPV-EHU), 48940 Leioa, Spain
- IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Spain
| |
Collapse
|
7
|
Gómez SA, Rojas‐Valencia N, Gómez S, Egidi F, Cappelli C, Restrepo A. Binding of SARS-CoV-2 to Cell Receptors: A Tale of Molecular Evolution. Chembiochem 2020; 22:724-732. [PMID: 32986926 PMCID: PMC7537057 DOI: 10.1002/cbic.202000618] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 09/26/2020] [Indexed: 12/31/2022]
Abstract
The magnified infectious power of the SARS‐CoV‐2 virus compared to its precursor SARS‐CoV is intimately linked to an enhanced ability in the mutated virus to find available hydrogen‐bond sites in the host cells. This characteristic is acquired during virus evolution because of the selective pressure exerted at the molecular level. We pinpoint the specific residue (in the virus) to residue (in the cell) contacts during the initial recognition and binding and show that the virus⋅⋅⋅cell interaction is mainly due to an extensive network of hydrogen bonds and to a large surface of noncovalent interactions. In addition to the formal quantum characterization of bonding interactions, computation of absorption spectra for the specific virus⋅⋅⋅cell interacting residues yields significant shifts of Δλmax=47 and 66 nm in the wavelength for maximum absorption in the complex with respect to the isolated host and virus, respectively.
Collapse
Affiliation(s)
- Santiago A. Gómez
- Instituto de QuímicaUniversidad de Antioquia UdeACalle 70 No. 52-21MedellínColombia
| | - Natalia Rojas‐Valencia
- Instituto de QuímicaUniversidad de Antioquia UdeACalle 70 No. 52-21MedellínColombia
- Escuela de Ciencias y HumanidadesDepartamento de Ciencias BásicasUniversidad EafitAA3300MedellínColombia
| | - Sara Gómez
- Scuola Normale SuperioreClasse di ScienzePiazza dei Cavalieri 756126PisaItaly
| | - Franco Egidi
- Scuola Normale SuperioreClasse di ScienzePiazza dei Cavalieri 756126PisaItaly
| | - Chiara Cappelli
- Scuola Normale SuperioreClasse di ScienzePiazza dei Cavalieri 756126PisaItaly
| | - Albeiro Restrepo
- Instituto de QuímicaUniversidad de Antioquia UdeACalle 70 No. 52-21MedellínColombia
| |
Collapse
|
8
|
Sandoval-Lira J, Mondragón-Solórzano G, Lugo-Fuentes LI, Barroso-Flores J. Accurate Estimation of p Kb Values for Amino Groups from Surface Electrostatic Potential ( VS,min) Calculations: The Isoelectric Points of Amino Acids as a Case Study. J Chem Inf Model 2020; 60:1445-1452. [PMID: 32108480 DOI: 10.1021/acs.jcim.9b01173] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Theoretical calculation of equilibrium dissociation constants is a very computationally demanding and time-consuming process since it requires an extremely accurate computation of the solvation free energy changes for each of the species involved. By correlating the minimum surface electrostatic potential (VS,min) on the nitrogen atom of several aliphatic amino groups-calculated at the density functional theory (DFT) ωB97X-D/cc-pVDZ level of theory-we obtained regression models for each kind of substitution pattern from which we interpolate their corresponding pKb values with remarkable accuracy: primary R2 = 0.9519; secondary R2 = 0.9112; and tertiary R2 = 0.8172 (N = 20 for each family). These models were validated with tests sets (N = 5) with mean absolute error (MAE) values of 0.1213 (primary), 0.4407 (secondary), and 0.3057 (tertiary). Combining this ansatz with another previously reported by our group to estimate pKa values [Caballero-García, G.; et al. Molecules 2019, 24(1), 79] we are able to reproduce the isoelectric points of 13 amino acids with no titrable side chains with MAE = 0.4636 pI units.
Collapse
Affiliation(s)
- Jacinto Sandoval-Lira
- Centro Conjunto de Investigación en Quı́mica Sustentable UAEM-UNAM, Carretera Toluca-Atlacomulco Km. 14.5, Unidad San Cayetano, Toluca de Lerdo 50200, México.,Instituto de Quı́mica, Universidad Nacional Autónoma de México, Circuito Exterior, Ciudad Universitaria, CDMX 04510, México
| | - Gustavo Mondragón-Solórzano
- Centro Conjunto de Investigación en Quı́mica Sustentable UAEM-UNAM, Carretera Toluca-Atlacomulco Km. 14.5, Unidad San Cayetano, Toluca de Lerdo 50200, México.,Instituto de Quı́mica, Universidad Nacional Autónoma de México, Circuito Exterior, Ciudad Universitaria, CDMX 04510, México
| | - Leonardo I Lugo-Fuentes
- Departamento de Quı́mica, División de Ciencias Naturales y Exactas, Universidad de Guanajuato, Noria Alta S/N, Guanajuato 36050, México
| | - Joaquín Barroso-Flores
- Centro Conjunto de Investigación en Quı́mica Sustentable UAEM-UNAM, Carretera Toluca-Atlacomulco Km. 14.5, Unidad San Cayetano, Toluca de Lerdo 50200, México.,Instituto de Quı́mica, Universidad Nacional Autónoma de México, Circuito Exterior, Ciudad Universitaria, CDMX 04510, México
| |
Collapse
|
9
|
Ramos Y, González A, Sosa‐Acosta P, Perez‐Riverol Y, García Y, Castellanos‐Serra L, Gil J, Sánchez A, González LJ, Besada V. Sodium dodecyl sulfate free gel electrophoresis/electroelution sorting for peptide fractionation. J Sep Sci 2019; 42:3712-3717. [DOI: 10.1002/jssc.201900495] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Revised: 10/08/2019] [Accepted: 10/10/2019] [Indexed: 12/13/2022]
Affiliation(s)
- Yassel Ramos
- Department of ProteomicsCenter for Genetic Engineering and Biotechnology La Habana Cuba
| | - Annia González
- Department of ProteomicsCenter for Genetic Engineering and Biotechnology La Habana Cuba
| | - Patricia Sosa‐Acosta
- Department of ProteomicsCenter for Genetic Engineering and Biotechnology La Habana Cuba
| | - Yasset Perez‐Riverol
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI)Wellcome Trust Genome Campus Hinxton Cambridge United Kingdom
| | - Yairet García
- Department of ProteomicsCenter for Genetic Engineering and Biotechnology La Habana Cuba
| | | | - Jeovanis Gil
- Clinical Protein Science & Imaging, Biomedical CentreDepartment of Biomedical EngineeringLund University, Lund Lund Sweden
| | - Aniel Sánchez
- Section for Clinical ChemistryDepartment of Translational MedicineLund UniversitySkåne University Hospital Malmö Malmö Sweden
| | - Luis J. González
- Department of ProteomicsCenter for Genetic Engineering and Biotechnology La Habana Cuba
| | - Vladimir Besada
- Department of ProteomicsCenter for Genetic Engineering and Biotechnology La Habana Cuba
| |
Collapse
|
10
|
Gomez N, Barkhordarian H, Lull J, Huh J, GhattyVenkataKrishna P, Zhang X. Perfusion CHO cell culture applied to lower aggregation and increase volumetric productivity for a bispecific recombinant protein. J Biotechnol 2019; 304:70-77. [DOI: 10.1016/j.jbiotec.2019.08.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2019] [Revised: 06/14/2019] [Accepted: 08/01/2019] [Indexed: 11/29/2022]
|
11
|
De Las Rivas J, Bonavides-Martínez C, Campos-Laborie FJ. Bioinformatics in Latin America and SoIBio impact, a tale of spin-off and expansion around genomes and protein structures. Brief Bioinform 2019; 20:390-397. [PMID: 28981567 PMCID: PMC6433739 DOI: 10.1093/bib/bbx064] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2017] [Revised: 04/18/2017] [Indexed: 11/30/2022] Open
Abstract
Owing to the emerging impact of bioinformatics and computational biology, in this article, we present an overview of the history and current state of the research on this field in Latin America (LA). It will be difficult to cover without inequality all the efforts, initiatives and works that have happened for the past two decades in this vast region (that includes >19 million km2 and >600 million people). Despite the difficulty, we have done an analytical search looking for publications in the field made by researchers from 19 LA countries in the past 25 years. In this way, we find that research in bioinformatics in this region should develop twice to approach the average world scientific production in the field. We also found some of the pioneering scientists who initiated and led bioinformatics in the region and were promoters of this new scientific field. Our analysis also reveals that spin-off began around some specific areas within the biomolecular sciences: studies on genomes (anchored in the new generation of deep sequencing technologies, followed by developments in proteomics) and studies on protein structures (supported by three-dimensional structural determination technologies and their computational advancement). Finally, we show that the contribution to this endeavour of the Iberoamerican Society for Bioinformatics, founded in Mexico in 2009, has been significant, as it is a leading forum to join efforts of many scientists from LA interested in promoting research, training and education in bioinformatics.
Collapse
Affiliation(s)
- Javier De Las Rivas
- CSIC and Universidad de Salamanca, Bioinformatics and Functional Genomics Group, Cancer Research Center (IMBCC, CSIC/USAL/IBSAL), Salamanca, Spain
- Corresponding author. Javier De Las Rivas, Bioinformatics and Functional Genomics Group, Cancer Research Center (IMBCC, CSIC/USAL/IBSAL), Consejo Superior de Investigaciones Científicas (CSIC) and Universidad de Salamanca (USAL), Campus Miguel de Unamuno s/n, Salamanca 37007, Spain. Tel.: +34 923294819; Fax: +34923294743; E-mail:
| | - Cesar Bonavides-Martínez
- Universidad Nacional Autonoma de Mexico, Computational Genomics, Centro de Ciencias Genómicas, Cuernavaca, Morelos, Mexico
| | - Francisco Jose Campos-Laborie
- CSIC and Universidad de Salamanca, Bioinformatics and Functional Genomics Group, Cancer Research Center (IMBCC, CSIC/USAL/IBSAL), Salamanca, Spain
| |
Collapse
|
12
|
Vásquez-Domínguez E, Armijos-Jaramillo VD, Tejera E, González-Díaz H. Multioutput Perturbation-Theory Machine Learning (PTML) Model of ChEMBL Data for Antiretroviral Compounds. Mol Pharm 2019; 16:4200-4212. [PMID: 31426639 DOI: 10.1021/acs.molpharmaceut.9b00538] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Retroviral infections, such as HIV, are, until now, diseases with no cure. Medicine and pharmaceutical chemistry need and consider it a huge goal to define target proteins of new antiretroviral compounds. ChEMBL manages Big Data features with a complex data set, which is hard to organize. This makes information difficult to analyze due to a big number of characteristics described in order to predict new drug candidates for retroviral infections. For this reason, we propose to develop a new predictive model combining perturbation theory (PT) bases and machine learning (ML) modeling to create a new tool that can take advantage of all the available information. The PTML model proposed in this work for the ChEMBL data set preclinical experimental assays for antiretroviral compounds consists of a linear equation with four variables. The PT operators used are founded on multicondition moving averages, combining different features and simplifying the difficulty to manage all data. More than 140 000 preclinical assays for 56 105 compounds with different characteristics or experimental conditions have been carried out and can be found in ChEMBL database, covering combinations with 359 biological activity parameters (c0), 55 protein accessions (c1), 83 cell lines (c2), 64 organisms of assay (c3), and 773 subtypes or strains. We have included 150 148 preclinical experimental assays for HIV virus, 1188 for HTLV virus, 84 for simian immunodeficiency virus, 370 for murine leukemia virus, 119 for Rous sarcoma virus, 1581 for MMTV, etc. We also included 5277 assays for hepatitis B virus. The developed PTML model reached considerable values in sensibility (73.05% for training and 73.10% for validation), specificity (86.61% for training and 87.17% for validation), and accuracy (75.84% for training and 75.98% for validation). We also compared alternative PTML models with different PT operators such as covariance, moments, and exponential terms. Finally, we made a comparison between literature ML models with our PTML model and also artificial neural network (ANN) nonlinear models. We conclude that this PTML model is the first one to consider multiple characteristics of preclinical experimental antiretroviral assays combined, generating a simple, useful, and adaptable instrument, which could reduce time and costs in antiretroviral drugs research.
Collapse
Affiliation(s)
- Emilia Vásquez-Domínguez
- Department of Organic Chemistry II , University of Basque Country UPV/EHU , 48940 Leioa , Spain.,Faculty of Engineering and Applied Sciences-Biotechnology , Universidad de Las Américas (UDLA) , 170125 Quito , Ecuador
| | - Vinicio Danilo Armijos-Jaramillo
- Faculty of Engineering and Applied Sciences-Biotechnology , Universidad de Las Américas (UDLA) , 170125 Quito , Ecuador.,Bio-chemioinformatics group , Universidad de Las Américas (UDLA) , 170125 Quito , Ecuador
| | - Eduardo Tejera
- Faculty of Engineering and Applied Sciences-Biotechnology , Universidad de Las Américas (UDLA) , 170125 Quito , Ecuador.,Bio-chemioinformatics group , Universidad de Las Américas (UDLA) , 170125 Quito , Ecuador
| | - Humbert González-Díaz
- Department of Organic Chemistry II , University of Basque Country UPV/EHU , 48940 Leioa , Spain.,IKERBASQUE, Basque Foundation for Science , 48011 Bilbao , Spain
| |
Collapse
|
13
|
Ferreira da Costa J, Silva D, Caamaño O, Brea JM, Loza MI, Munteanu CR, Pazos A, García-Mera X, González-Díaz H. Perturbation Theory/Machine Learning Model of ChEMBL Data for Dopamine Targets: Docking, Synthesis, and Assay of New l-Prolyl-l-leucyl-glycinamide Peptidomimetics. ACS Chem Neurosci 2018; 9:2572-2587. [PMID: 29791132 DOI: 10.1021/acschemneuro.8b00083] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Predicting drug-protein interactions (DPIs) for target proteins involved in dopamine pathways is a very important goal in medicinal chemistry. We can tackle this problem using Molecular Docking or Machine Learning (ML) models for one specific protein. Unfortunately, these models fail to account for large and complex big data sets of preclinical assays reported in public databases. This includes multiple conditions of assays, such as different experimental parameters, biological assays, target proteins, cell lines, organism of the target, or organism of assay. On the other hand, perturbation theory (PT) models allow us to predict the properties of a query compound or molecular system in experimental assays with multiple boundary conditions based on a previously known case of reference. In this work, we report the first PTML (PT + ML) study of a large ChEMBL data set of preclinical assays of compounds targeting dopamine pathway proteins. The best PTML model found predicts 50000 cases with accuracy of 70-91% in training and external validation series. We also compared the linear PTML model with alternative PTML models trained with multiple nonlinear methods (artificial neural network (ANN), Random Forest, Deep Learning, etc.). Some of the nonlinear methods outperform the linear model but at the cost of a notable increment of the complexity of the model. We illustrated the practical use of the new model with a proof-of-concept theoretical-experimental study. We reported for the first time the organic synthesis, chemical characterization, and pharmacological assay of a new series of l-prolyl-l-leucyl-glycinamide (PLG) peptidomimetic compounds. In addition, we performed a molecular docking study for some of these compounds with the software Vina AutoDock. The work ends with a PTML model predictive study of the outcomes of the new compounds in a large number of assays. Therefore, this study offers a new computational methodology for predicting the outcome for any compound in new assays. This PTML method focuses on the prediction with a simple linear model of multiple pharmacological parameters (IC50, EC50, Ki, etc.) for compounds in assays involving different cell lines used, organisms of the protein target, or organism of assay for proteins in the dopamine pathway.
Collapse
Affiliation(s)
- Joana Ferreira da Costa
- Department of Organic Chemistry, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - David Silva
- Department of Organic Chemistry, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - Olga Caamaño
- Department of Organic Chemistry, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - José M. Brea
- CIMUS, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
- Department of Pharmacology, Pharmacy and Pharmaceutical Technology, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - Maria Isabel Loza
- CIMUS, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
- Department of Pharmacology, Pharmacy and Pharmaceutical Technology, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - Cristian R. Munteanu
- Instituto de Investigacion Biomedica de A Coruña (INIBIC), Complexo Hospitalario Universitario de A Coruña (CHUAC), A Coruña, 15006, Spain
| | - Alejandro Pazos
- Instituto de Investigacion Biomedica de A Coruña (INIBIC), Complexo Hospitalario Universitario de A Coruña (CHUAC), A Coruña, 15006, Spain
- Computer Science Department, Faculty of Computer Science, University of A Coruna, 15071 A Coruña, Spain
| | - Xerardo García-Mera
- Department of Organic Chemistry, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - Humbert González-Díaz
- Department of Organic Chemistry II, University of Basque Country UPV/EHU, 48940 Leioa, Spain
- IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Spain
| |
Collapse
|
14
|
Bediaga H, Arrasate S, González-Díaz H. PTML Combinatorial Model of ChEMBL Compounds Assays for Multiple Types of Cancer. ACS COMBINATORIAL SCIENCE 2018; 20:621-632. [PMID: 30240186 DOI: 10.1021/acscombsci.8b00090] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Determining the target proteins of new anticancer compounds is a very important task in Medicinal Chemistry. In this sense, chemists carry out preclinical assays with a high number of combinations of experimental conditions (c j). In fact, ChEMBL database contains outcomes of 65 534 different anticancer activity preclinical assays for 35 565 different chemical compounds (1.84 assays per compound). These assays cover different combinations of c j formed from >70 different biological activity parameters ( c0), >300 different drug targets ( c1), >230 cell lines ( c2), and 5 organisms of assay ( c3) or organisms of the target ( c4). It include a total of 45 833 assays in leukemia, 6227 assays in breast cancer, 2499 assays in ovarian cancer, 3499 in colon cancer, 3159 in lung cancer, 2750 in prostate cancer, 601 in melanoma, etc. This is a very complex data set with multiple Big Data features. This data is hard to be rationalized by researchers to extract useful relationships and predict new compounds. In this context, we propose to combine perturbation theory (PT) ideas and machine learning (ML) modeling to solve this combinatorial-like problem. In this work, we report a PTML (PT + ML) model for ChEMBL data set of preclinical assays of anticancer compounds. This is a simple linear model with only three variables. The model presented values of area under receiver operating curve = AUROC = 0.872, specificity = Sp(%) = 90.2, sensitivity = Sn(%) = 70.6, and overall accuracy = Ac(%) = 87.7 in training series. The model also have Sp(%) = 90.1, Sn(%) = 71.4, and Ac(%) = 87.8 in external validation series. The model use PT operators based on multicondition moving averages to capture all the complexity of the data set. We also compared the model with nonlinear artificial neural network (ANN) models obtaining similar results. This confirms the hypothesis of a linear relationship between the PT operators and the classification as anticancer compounds in different combinations of assay conditions. Last, we compared the model with other PTML models reported in the literature concluding that this is the only one PTML model able to predict activity against multiple types of cancer. This model is a simple but versatile tool for the prediction of the targets of anticancer compounds taking into consideration multiple combinations of experimental conditions in preclinical assays.
Collapse
Affiliation(s)
- Harbil Bediaga
- Department of Organic Chemistry II, University of Basque Country UPV/EHU, 48940, Leioa, Spain
| | - Sonia Arrasate
- Department of Organic Chemistry II, University of Basque Country UPV/EHU, 48940, Leioa, Spain
| | - Humbert González-Díaz
- Department of Organic Chemistry II, University of Basque Country UPV/EHU, 48940, Leioa, Spain
- IKERBASQUE, Basque Foundation for Science, 48011, Bilbao, Spain
| |
Collapse
|
15
|
Tovar G. Design of a software for calculating isoelectric point of a polypeptide according to their net charge using the graphical programming language LabVIEW. BIOCHEMISTRY AND MOLECULAR BIOLOGY EDUCATION : A BIMONTHLY PUBLICATION OF THE INTERNATIONAL UNION OF BIOCHEMISTRY AND MOLECULAR BIOLOGY 2018; 46:39-46. [PMID: 29105959 DOI: 10.1002/bmb.21088] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Revised: 06/23/2017] [Accepted: 10/06/2017] [Indexed: 06/07/2023]
Abstract
A software to calculate the net charge and to predict the isoelectric point (pI) of a polypeptide is developed in this work using the graphical programming language LabVIEW. Through this instrument the net charges of the ionizable residues of the polypeptide chains of the proteins are calculated at different pH values, tabulated, pI is predicted and an Excel (-xls) type file is generated. In this work, the experimental values of the pIs (pI) of different proteins are compared with the values of the pIs (pI) calculated graphically, achieving a correlation coefficient (R) of 0.934746 which represents a good reliability for a p < 0.01. In this way the generated program can constitute an instrument applicable in the laboratory, facilitating the calculation to graduate students and junior researchers. © 2017 by The International Union of Biochemistry and Molecular Biology, 46(1):39-46, 2018.
Collapse
Affiliation(s)
- Glomen Tovar
- Instituto de Investigaciones Biomédicas (BIOMED), Universidad de Carabobo, Núcleo Aragua, Las Delicias Maracay, Venezuela 2101
| |
Collapse
|
16
|
Accurate and fast feature selection workflow for high-dimensional omics data. PLoS One 2017; 12:e0189875. [PMID: 29261781 PMCID: PMC5738110 DOI: 10.1371/journal.pone.0189875] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2017] [Accepted: 12/04/2017] [Indexed: 02/04/2023] Open
Abstract
We are moving into the age of ‘Big Data’ in biomedical research and bioinformatics. This trend could be encapsulated in this simple formula: D = S * F, where the volume of data generated (D) increases in both dimensions: the number of samples (S) and the number of sample features (F). Frequently, a typical omics classification includes redundant and irrelevant features (e.g. genes or proteins) that can result in long computation times; decrease of the model performance and the selection of suboptimal features (genes and proteins) after the classification/regression step. Multiple algorithms and reviews has been published to describe all the existing methods for feature selection, their strengths and weakness. However, the selection of the correct FS algorithm and strategy constitutes an enormous challenge. Despite the number and diversity of algorithms available, the proper choice of an approach for facing a specific problem often falls in a ‘grey zone’. In this study, we select a subset of FS methods to develop an efficient workflow and an R package for bioinformatics machine learning problems. We cover relevant issues concerning FS, ranging from domain’s problems to algorithm solutions and computational tools. Finally, we use seven different proteomics and gene expression datasets to evaluate the workflow and guide the FS process.
Collapse
|
17
|
Bjerrum EJ, Jensen JH, Tolborg JL. pICalculax: Improved Prediction of Isoelectric Point for Modified Peptides. J Chem Inf Model 2017; 57:1723-1727. [PMID: 28671456 DOI: 10.1021/acs.jcim.7b00030] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
The isoelectric point of a peptide is a physicochemical property that can be accurately predicted from the sequence of the peptide when the peptide is built from natural amino acids. Peptides can however have chemical modifications, such as phosphorylations, amidations, and unnatural amino acids, which can result in erroneous predictions if not accounted for. Here we report on an open source program, pICalculax, which in an extensible way can handle pI calculations of modified peptides. Tests on a database of modified peptides and experimentally determined pI values show an improvement in pI predictions when taking the modifications into account. The correlation coefficient improves from 0.45 to 0.91, and the root-mean-square deviation likewise improves from 3.3 to 0.9. The program is available at https://github.com/EBjerrum/pICalculax.
Collapse
Affiliation(s)
- Esben J Bjerrum
- Wildcard Pharmaceutical Consulting , Frødings Alle 41, 2860 Søborg, Denmark.,Biochemfusion Aps , Løvspringsvej 4C, 1.tv, 2920 Charlottenlund, Denmark
| | - Jan H Jensen
- Biochemfusion Aps , Løvspringsvej 4C, 1.tv, 2920 Charlottenlund, Denmark
| | | |
Collapse
|
18
|
Audain E, Uszkoreit J, Sachsenberg T, Pfeuffer J, Liang X, Hermjakob H, Sanchez A, Eisenacher M, Reinert K, Tabb DL, Kohlbacher O, Perez-Riverol Y. In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics. J Proteomics 2017; 150:170-182. [DOI: 10.1016/j.jprot.2016.08.002] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Revised: 07/30/2016] [Accepted: 08/02/2016] [Indexed: 12/24/2022]
|
19
|
Mao J, Moore LR, Blank CE, Wu EHH, Ackerman M, Ranade S, Cui H. Microbial phenomics information extractor (MicroPIE): a natural language processing tool for the automated acquisition of prokaryotic phenotypic characters from text sources. BMC Bioinformatics 2016; 17:528. [PMID: 27955641 PMCID: PMC5153691 DOI: 10.1186/s12859-016-1396-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2016] [Accepted: 11/29/2016] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND The large-scale analysis of phenomic data (i.e., full phenotypic traits of an organism, such as shape, metabolic substrates, and growth conditions) in microbial bioinformatics has been hampered by the lack of tools to rapidly and accurately extract phenotypic data from existing legacy text in the field of microbiology. To quickly obtain knowledge on the distribution and evolution of microbial traits, an information extraction system needed to be developed to extract phenotypic characters from large numbers of taxonomic descriptions so they can be used as input to existing phylogenetic analysis software packages. RESULTS We report the development and evaluation of Microbial Phenomics Information Extractor (MicroPIE, version 0.1.0). MicroPIE is a natural language processing application that uses a robust supervised classification algorithm (Support Vector Machine) to identify characters from sentences in prokaryotic taxonomic descriptions, followed by a combination of algorithms applying linguistic rules with groups of known terms to extract characters as well as character states. The input to MicroPIE is a set of taxonomic descriptions (clean text). The output is a taxon-by-character matrix-with taxa in the rows and a set of 42 pre-defined characters (e.g., optimum growth temperature) in the columns. The performance of MicroPIE was evaluated against a gold standard matrix and another student-made matrix. Results show that, compared to the gold standard, MicroPIE extracted 21 characters (50%) with a Relaxed F1 score > 0.80 and 16 characters (38%) with Relaxed F1 scores ranging between 0.50 and 0.80. Inclusion of a character prediction component (SVM) improved the overall performance of MicroPIE, notably the precision. Evaluated against the same gold standard, MicroPIE performed significantly better than the undergraduate students. CONCLUSION MicroPIE is a promising new tool for the rapid and efficient extraction of phenotypic character information from prokaryotic taxonomic descriptions. However, further development, including incorporation of ontologies, will be necessary to improve the performance of the extraction for some character types.
Collapse
Affiliation(s)
- Jin Mao
- School of Information, University of Arizona, Tucson, 85721 AZ USA
| | - Lisa R. Moore
- Department of Biological Sciences, University of Southern Maine, Portland, 04103 ME USA
| | - Carrine E. Blank
- Department of Geosciences, University of Montana, Missoula, 59812 MT USA
| | | | - Marcia Ackerman
- Department of Biological Sciences, University of Southern Maine, Portland, 04103 ME USA
| | - Sonali Ranade
- School of Information, University of Arizona, Tucson, 85721 AZ USA
| | - Hong Cui
- School of Information, University of Arizona, Tucson, 85721 AZ USA
| |
Collapse
|
20
|
Abstract
BACKGROUND Accurate estimation of the isoelectric point (pI) based on the amino acid sequence is useful for many analytical biochemistry and proteomics techniques such as 2-D polyacrylamide gel electrophoresis, or capillary isoelectric focusing used in combination with high-throughput mass spectrometry. Additionally, pI estimation can be helpful during protein crystallization trials. RESULTS Here, I present the Isoelectric Point Calculator (IPC), a web service and a standalone program for the accurate estimation of protein and peptide pI using different sets of dissociation constant (pKa) values, including two new computationally optimized pKa sets. According to the presented benchmarks, the newly developed IPC pKa sets outperform previous algorithms by at least 14.9 % for proteins and 0.9 % for peptides (on average, 22.1 % and 59.6 %, respectively), which corresponds to an average error of the pI estimation equal to 0.87 and 0.25 pH units for proteins and peptides, respectively. Moreover, the prediction of pI using the IPC pKa's leads to fewer outliers, i.e., predictions affected by errors greater than a given threshold. CONCLUSIONS The IPC service is freely available at http://isoelectric.ovh.org Peptide and protein datasets used in the study and the precalculated pI for the PDB and some of the most frequently used proteomes are available for large-scale analysis and future development. REVIEWERS This article was reviewed by Frank Eisenhaber and Zoltán Gáspári.
Collapse
|
21
|
Gatto L, Breckels LM, Naake T, Gibb S. Visualization of proteomics data using R and bioconductor. Proteomics 2016; 15:1375-89. [PMID: 25690415 PMCID: PMC4510819 DOI: 10.1002/pmic.201400392] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2014] [Revised: 02/05/2015] [Accepted: 02/09/2015] [Indexed: 12/30/2022]
Abstract
Data visualization plays a key role in high-throughput biology. It is an essential tool for data exploration allowing to shed light on data structure and patterns of interest. Visualization is also of paramount importance as a form of communicating data to a broad audience. Here, we provided a short overview of the application of the R software to the visualization of proteomics data. We present a summary of R's plotting systems and how they are used to visualize and understand raw and processed MS-based proteomics data.
Collapse
Affiliation(s)
- Laurent Gatto
- Department of Biochemistry, Cambridge Centre for Proteomics, University of Cambridge, Cambridge, UK; Department of Biochemistry, Computational Proteomics Unit, University of Cambridge, Cambridge, UK
| | | | | | | |
Collapse
|
22
|
Audain E, Ramos Y, Hermjakob H, Flower DR, Perez-Riverol Y. Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences. ACTA ACUST UNITED AC 2015; 32:821-7. [PMID: 26568629 PMCID: PMC5939969 DOI: 10.1093/bioinformatics/btv674] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Accepted: 11/10/2015] [Indexed: 12/02/2022]
Abstract
Motivation: In any macromolecular polyprotic system—for example protein, DNA or RNA—the isoelectric point—commonly referred to as the pI—can be defined as the point of singularity in a titration curve, corresponding to the solution pH value at which the net overall surface charge—and thus the electrophoretic mobility—of the ampholyte sums to zero. Different modern analytical biochemistry and proteomics methods depend on the isoelectric point as a principal feature for protein and peptide characterization. Protein separation by isoelectric point is a critical part of 2-D gel electrophoresis, a key precursor of proteomics, where discrete spots can be digested in-gel, and proteins subsequently identified by analytical mass spectrometry. Peptide fractionation according to their pI is also widely used in current proteomics sample preparation procedures previous to the LC-MS/MS analysis. Therefore accurate theoretical prediction of pI would expedite such analysis. While such pI calculation is widely used, it remains largely untested, motivating our efforts to benchmark pI prediction methods. Results: Using data from the database PIP-DB and one publically available dataset as our reference gold standard, we have undertaken the benchmarking of pI calculation methods. We find that methods vary in their accuracy and are highly sensitive to the choice of basis set. The machine-learning algorithms, especially the SVM-based algorithm, showed a superior performance when studying peptide mixtures. In general, learning-based pI prediction methods (such as Cofactor, SVM and Branca) require a large training dataset and their resulting performance will strongly depend of the quality of that data. In contrast with Iterative methods, machine-learning algorithms have the advantage of being able to add new features to improve the accuracy of prediction. Contact: yperez@ebi.ac.uk Availability and Implementation: The software and data are freely available at https://github.com/ypriverol/pIR. Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Enrique Audain
- Department of Proteomics, Center of Molecular Immunology
| | - Yassel Ramos
- Department of Proteomics, Center for Genetic Engineering and Biotechnology, Ciudad de la Habana, Cuba
| | - Henning Hermjakob
- Department European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK and
| | - Darren R Flower
- School of Life and Health Sciences, Aston University, Aston Triangle, Birmingham, B4 7ET, UK
| | - Yasset Perez-Riverol
- Department European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK and
| |
Collapse
|
23
|
Perez-Riverol Y, Xu QW, Wang R, Uszkoreit J, Griss J, Sanchez A, Reisinger F, Csordas A, Ternent T, Del-Toro N, Dianes JA, Eisenacher M, Hermjakob H, Vizcaíno JA. PRIDE Inspector Toolsuite: Moving Toward a Universal Visualization Tool for Proteomics Data Standard Formats and Quality Assessment of ProteomeXchange Datasets. Mol Cell Proteomics 2015; 15:305-17. [PMID: 26545397 PMCID: PMC4762524 DOI: 10.1074/mcp.o115.050229] [Citation(s) in RCA: 121] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2015] [Indexed: 12/25/2022] Open
Abstract
The original PRIDE Inspector tool was developed as an open source standalone tool to enable the visualization and validation of mass-spectrometry (MS)-based proteomics data before data submission or already publicly available in the Proteomics Identifications (PRIDE) database. The initial implementation of the tool focused on visualizing PRIDE data by supporting the PRIDE XML format and a direct access to private (password protected) and public experiments in PRIDE. The ProteomeXchange (PX) Consortium has been set up to enable a better integration of existing public proteomics repositories, maximizing its benefit to the scientific community through the implementation of standard submission and dissemination pipelines. Within the Consortium, PRIDE is focused on supporting submissions of tandem MS data. The increasing use and popularity of the new Proteomics Standards Initiative (PSI) data standards such as mzIdentML and mzTab, and the diversity of workflows supported by the PX resources, prompted us to design and implement a new suite of algorithms and libraries that would build upon the success of the original PRIDE Inspector and would enable users to visualize and validate PX “complete” submissions. The PRIDE Inspector Toolsuite supports the handling and visualization of different experimental output files, ranging from spectra (mzML, mzXML, and the most popular peak lists formats) and peptide and protein identification results (mzIdentML, PRIDE XML, mzTab) to quantification data (mzTab, PRIDE XML), using a modular and extensible set of open-source, cross-platform libraries. We believe that the PRIDE Inspector Toolsuite represents a milestone in the visualization and quality assessment of proteomics data. It is freely available at http://github.com/PRIDE-Toolsuite/.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- From the ‡European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Qing-Wei Xu
- From the ‡European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Rui Wang
- From the ‡European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Julian Uszkoreit
- §Ruhr-Universität Bochum, Medizinisches Proteom-Zenter, Medical Bioinformatics, ZKF, E.142, Universitätsstr. 150, D-44801 Bochum, Germany
| | - Johannes Griss
- From the ‡European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK; ¶Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Austria
| | - Aniel Sanchez
- ‖Department of Proteomics, Center for Genetic Engineering and Biotechnology, Ciudad de la Habana, Cuba
| | - Florian Reisinger
- From the ‡European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Attila Csordas
- From the ‡European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Tobias Ternent
- From the ‡European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Noemi Del-Toro
- From the ‡European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Jose A Dianes
- From the ‡European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Martin Eisenacher
- §Ruhr-Universität Bochum, Medizinisches Proteom-Zenter, Medical Bioinformatics, ZKF, E.142, Universitätsstr. 150, D-44801 Bochum, Germany
| | - Henning Hermjakob
- From the ‡European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Juan Antonio Vizcaíno
- From the ‡European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK;
| |
Collapse
|
24
|
Skvortsov VS, Alekseytchuk NN, Khudyakov DV, Romero Reyes IV. pIPredict: A computer tool for prediction of isoelectric points of peptides and proteins. BIOCHEMISTRY MOSCOW-SUPPLEMENT SERIES B-BIOMEDICAL CHEMISTRY 2015. [DOI: 10.1134/s1990750815030099] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
25
|
Perez-Riverol Y, Uszkoreit J, Sanchez A, Ternent T, Del Toro N, Hermjakob H, Vizcaíno JA, Wang R. ms-data-core-api: an open-source, metadata-oriented library for computational proteomics. Bioinformatics 2015; 31:2903-5. [PMID: 25910694 PMCID: PMC4547611 DOI: 10.1093/bioinformatics/btv250] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Accepted: 04/20/2015] [Indexed: 11/20/2022] Open
Abstract
Summary: The ms-data-core-api is a free, open-source library for developing computational proteomics tools and pipelines. The Application Programming Interface, written in Java, enables rapid tool creation by providing a robust, pluggable programming interface and common data model. The data model is based on controlled vocabularies/ontologies and captures the whole range of data types included in common proteomics experimental workflows, going from spectra to peptide/protein identifications to quantitative results. The library contains readers for three of the most used Proteomics Standards Initiative standard file formats: mzML, mzIdentML, and mzTab. In addition to mzML, it also supports other common mass spectra data formats: dta, ms2, mgf, pkl, apl (text-based), mzXML and mzData (XML-based). Also, it can be used to read PRIDE XML, the original format used by the PRIDE database, one of the world-leading proteomics resources. Finally, we present a set of algorithms and tools whose implementation illustrates the simplicity of developing applications using the library. Availability and implementation: The software is freely available at https://github.com/PRIDE-Utilities/ms-data-core-api. Supplementary information:Supplementary data are available at Bioinformatics online Contact:juan@ebi.ac.uk
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Julian Uszkoreit
- Ruhr-Universität Bochum, Medizinisches Proteom-Zenter, Medical Bioinformatics, ZKF, E.142, Universitätsstr. 150, D-44801 Bochum, Germany and
| | - Aniel Sanchez
- Department of Proteomics, Center for Genetic Engineering and Biotechnology, Ciudad de la Habana, Cuba
| | - Tobias Ternent
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Noemi Del Toro
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Rui Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
26
|
Wang R, Perez-Riverol Y, Hermjakob H, Vizcaíno JA. Open source libraries and frameworks for biological data visualisation: a guide for developers. Proteomics 2015; 15:1356-74. [PMID: 25475079 PMCID: PMC4409855 DOI: 10.1002/pmic.201400377] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2014] [Revised: 10/21/2014] [Accepted: 11/26/2014] [Indexed: 12/21/2022]
Abstract
Recent advances in high-throughput experimental techniques have led to an exponential increase in both the size and the complexity of the data sets commonly studied in biology. Data visualisation is increasingly used as the key to unlock this data, going from hypothesis generation to model evaluation and tool implementation. It is becoming more and more the heart of bioinformatics workflows, enabling scientists to reason and communicate more effectively. In parallel, there has been a corresponding trend towards the development of related software, which has triggered the maturation of different visualisation libraries and frameworks. For bioinformaticians, scientific programmers and software developers, the main challenge is to pick out the most fitting one(s) to create clear, meaningful and integrated data visualisation for their particular use cases. In this review, we introduce a collection of open source or free to use libraries and frameworks for creating data visualisation, covering the generation of a wide variety of charts and graphs. We will focus on software written in Java, JavaScript or Python. We truly believe this software offers the potential to turn tedious data into exciting visual stories.
Collapse
Affiliation(s)
- Rui Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | | | | | | |
Collapse
|
27
|
Pirmoradian M, Zhang B, Chingin K, Astorga-Wells J, Zubarev RA. Membrane-Assisted Isoelectric Focusing Device As a Micropreparative Fractionator for Two-Dimensional Shotgun Proteomics. Anal Chem 2014; 86:5728-32. [DOI: 10.1021/ac404180e] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Mohammad Pirmoradian
- Department
of Medical Biochemistry and Biophysics, Karolinska Institutet, Scheelesväg 2, SE-17177 Stockholm, Sweden
- Biomotif AB, Stockholm, Sweden
| | - Bo Zhang
- Department
of Medical Biochemistry and Biophysics, Karolinska Institutet, Scheelesväg 2, SE-17177 Stockholm, Sweden
| | - Konstantin Chingin
- Department
of Medical Biochemistry and Biophysics, Karolinska Institutet, Scheelesväg 2, SE-17177 Stockholm, Sweden
| | - Juan Astorga-Wells
- Department
of Medical Biochemistry and Biophysics, Karolinska Institutet, Scheelesväg 2, SE-17177 Stockholm, Sweden
- Biomotif AB, Stockholm, Sweden
| | - Roman A. Zubarev
- Department
of Medical Biochemistry and Biophysics, Karolinska Institutet, Scheelesväg 2, SE-17177 Stockholm, Sweden
| |
Collapse
|
28
|
Perez-Riverol Y, Hermjakob H, Kohlbacher O, Martens L, Creasy D, Cox J, Leprevost F, Shan BP, Pérez-Nueno VI, Blazejczyk M, Punta M, Vierlinger K, Valiente PA, Leon K, Chinea G, Guirola O, Bringas R, Cabrera G, Guillen G, Padron G, Gonzalez LJ, Besada V. Computational proteomics pitfalls and challenges: HavanaBioinfo 2012 workshop report. J Proteomics 2013; 87:134-8. [PMID: 23376229 DOI: 10.1016/j.jprot.2013.01.019] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2013] [Accepted: 01/22/2013] [Indexed: 10/27/2022]
Abstract
The workshop "Bioinformatics for Biotechnology Applications (HavanaBioinfo 2012)", held December 8-11, 2012 in Havana, aimed at exploring new bioinformatics tools and approaches for large-scale proteomics, genomics and chemoinformatics. Major conclusions of the workshop include the following: (i) development of new applications and bioinformatics tools for proteomic repository analysis is crucial; current proteomic repositories contain enough data (spectra/identifications) that can be used to increase the annotations in protein databases and to generate new tools for protein identification; (ii) spectral libraries, de novo sequencing and database search tools should be combined to increase the number of protein identifications; (iii) protein probabilities and FDR are not yet sufficiently mature; (iv) computational proteomics software needs to become more intuitive; and at the same time appropriate education and training should be provided to help in the efficient exchange of knowledge between mass spectrometrists and experimental biologists and bioinformaticians in order to increase their bioinformatics background, especially statistics knowledge.
Collapse
|
29
|
Silvestre DD, Zoppis I, Brambilla F, Bellettato V, Mauri G, Mauri P. Availability of MudPIT data for classification of biological samples. J Clin Bioinforma 2013; 3:1. [PMID: 23317455 PMCID: PMC3563498 DOI: 10.1186/2043-9113-3-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2012] [Accepted: 01/07/2013] [Indexed: 01/18/2023] Open
Abstract
Background Mass spectrometry is an important analytical tool for clinical proteomics. Primarily employed for biomarker discovery, it is increasingly used for developing methods which may help to provide unambiguous diagnosis of biological samples. In this context, we investigated the classification of phenotypes by applying support vector machine (SVM) on experimental data obtained by MudPIT approach. In particular, we compared the performance capabilities of SVM by using two independent collection of complex samples and different data-types, such as mass spectra (m/z), peptides and proteins. Results Globally, protein and peptide data allowed a better discriminant informative content than experimental mass spectra (overall accuracy higher than 87% in both collection 1 and 2). These results indicate that sequencing of peptides and proteins reduces the experimental noise affecting the raw mass spectra, and allows the extraction of more informative features available for the effective classification of samples. In addition, proteins and peptides features selected by SVM matched for 80% with the differentially expressed proteins identified by the MAProMa software. Conclusions These findings confirm the availability of the most label-free quantitative methods based on processing of spectral count and SEQUEST-based SCORE values. On the other hand, it stresses the usefulness of MudPIT data for a correct grouping of sample phenotypes, by applying both supervised and unsupervised learning algorithms. This capacity permit the evaluation of actual samples and it is a good starting point to translate proteomic methodology to clinical application.
Collapse
Affiliation(s)
- Dario Di Silvestre
- , Institute for Biomedical Technologies (ITB-CNR), via F.lli Cervi 93, Segrate (Milan), Italy
| | - Italo Zoppis
- Department of Informatics, Systems and Communication, Viale Sarca 336, University of Milano-Bicocca, Milan, Italy
| | - Francesca Brambilla
- , Institute for Biomedical Technologies (ITB-CNR), via F.lli Cervi 93, Segrate (Milan), Italy
| | - Valeria Bellettato
- , Institute for Biomedical Technologies (ITB-CNR), via F.lli Cervi 93, Segrate (Milan), Italy
| | - Giancarlo Mauri
- Department of Informatics, Systems and Communication, Viale Sarca 336, University of Milano-Bicocca, Milan, Italy
| | - Pierluigi Mauri
- , Institute for Biomedical Technologies (ITB-CNR), via F.lli Cervi 93, Segrate (Milan), Italy
| |
Collapse
|