1
|
Kim MS, Tcho IW, Choi YK. Analyses of unpredictable properties of a wind-driven triboelectric random number generator. Sci Rep 2023; 13:16610. [PMID: 37789198 PMCID: PMC10547768 DOI: 10.1038/s41598-023-43894-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Accepted: 09/29/2023] [Indexed: 10/05/2023] Open
Abstract
Wind-driven triboelectric nanogenerators (W-TENGs) are a promising candidate for an energy harvester because wind itself possesses unexhausted, ubiquitous, and clean properties. W-TENG has also been used as a random number generator (RNG) due to the inherent chaotic properties of wind that is also an entropy source. Thus, a W-TENG which simultaneously generates both power and true random numbers with a two-in-one structure, is a wind-driven RNG (W-RNG) like the Janus. However, a root cause of W-RNG unpredictability has not been elucidated. In this work, the unpredictability, which is essential and critical for an RNG, is statistically and mathematically analyzed by auto-correlation, cross-correlation, joint entropy, and mutual information. Even though the overall shape of the total output analog signals from the W-RNG looks like a sinusoidal wave that is not obviously unpredictable, discretized digital signals from the continuous analog output become unpredictable. Furthermore, partial adoption of 4-bit data from 8-bit raw data, with the aid of analog-to-digital converter hardware, further boosts the unpredictability. The W-RNG, which functions as a W-TENG, can contribute to self-powering and self-securing outdoor electrical systems, such as drones, by harvesting energy and generating true random numbers.
Collapse
Affiliation(s)
- Moon-Seok Kim
- School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
- Department of Semiconductor System Engineering, Hanbat National University, 125 Dongseo-daero, Yuseong-gu, Daejeon, 31538, Republic of Korea
| | - Il-Woong Tcho
- School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| | - Yang-Kyu Choi
- School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
2
|
Quevedo-Tumailli V, Ortega-Tenezaca B, González-Díaz H. IFPTML Mapping of Drug Graphs with Protein and Chromosome Structural Networks vs. Pre-Clinical Assay Information for Discovery of Antimalarial Compounds. Int J Mol Sci 2021; 22:ijms222313066. [PMID: 34884870 PMCID: PMC8657696 DOI: 10.3390/ijms222313066] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 11/23/2021] [Accepted: 11/24/2021] [Indexed: 11/16/2022] Open
Abstract
The parasite species of genus Plasmodium causes Malaria, which remains a major global health problem due to parasite resistance to available Antimalarial drugs and increasing treatment costs. Consequently, computational prediction of new Antimalarial compounds with novel targets in the proteome of Plasmodium sp. is a very important goal for the pharmaceutical industry. We can expect that the success of the pre-clinical assay depends on the conditions of assay per se, the chemical structure of the drug, the structure of the target protein to be targeted, as well as on factors governing the expression of this protein in the proteome such as genes (Deoxyribonucleic acid, DNA) sequence and/or chromosomes structure. However, there are no reports of computational models that consider all these factors simultaneously. Some of the difficulties for this kind of analysis are the dispersion of data in different datasets, the high heterogeneity of data, etc. In this work, we analyzed three databases ChEMBL (Chemical database of the European Molecular Biology Laboratory), UniProt (Universal Protein Resource), and NCBI-GDV (National Center for Biotechnology Information—Genome Data Viewer) to achieve this goal. The ChEMBL dataset contains outcomes for 17,758 unique assays of potential Antimalarial compounds including numeric descriptors (variables) for the structure of compounds as well as a huge amount of information about the conditions of assays. The NCBI-GDV and UniProt datasets include the sequence of genes, proteins, and their functions. In addition, we also created two partitions (cassayj = caj and cdataj = cdj) of categorical variables from theChEMBL dataset. These partitions contain variables that encode information about experimental conditions of preclinical assays (caj) or about the nature and quality of data (cdj). These categorical variables include information about 22 parameters of biological activity (ca0), 28 target proteins (ca1), and 9 organisms of assay (ca2), etc. We also created another partition of (cprotj = cpj) including categorical variables with biological information about the target proteins, genes, and chromosomes. These variables cover32 genes (cp0), 10 chromosomes (cp1), gene orientation (cp2), and 31 protein functions (cp3). We used a Perturbation-Theory Machine Learning Information Fusion (IFPTML) algorithm to map all this information (from three databases) into and train a predictive model. Shannon’s entropy measure Shk (numerical variables) was used to quantify the information about the structure of drugs, protein sequences, gene sequences, and chromosomes in the same information scale. Perturbation Theory Operators (PTOs) with the form of Moving Average (MA) operators have been used to quantify perturbations (deviations) in the structural variables with respect to their expected values for different subsets (partitions) of categorical variables. We obtained three IFPTML models using General Discriminant Analysis (GDA), Classification Tree with Univariate Splits (CTUS), and Classification Tree with Linear Combinations (CTLC). The IFPTML-CTLC presented the better performance with Sensitivity Sn(%) = 83.6/85.1, and Specificity Sp(%) = 89.8/89.7 for training/validation sets, respectively. This model could become a useful tool for the optimization of preclinical assays of new Antimalarial compounds vs. different proteins in the proteome of Plasmodium.
Collapse
Affiliation(s)
- Viviana Quevedo-Tumailli
- Grupo RNASA-IMEDIR, Department of Computer Science, University of A Coruña, 15071 A Coruña, Spain; (V.Q.-T.); (B.O.-T.)
- Research Department, Puyo Campus, Universidad Estatal Amazónica, Puyo 160150, Ecuador
| | - Bernabe Ortega-Tenezaca
- Grupo RNASA-IMEDIR, Department of Computer Science, University of A Coruña, 15071 A Coruña, Spain; (V.Q.-T.); (B.O.-T.)
- Information and Communications Technology Management Department, Puyo Campus, Universidad Estatal Amazónica, Puyo 160150, Ecuador
| | - Humberto González-Díaz
- Department of Organic and Inorganic Chemistry, University of the Basque Country UPV/EHU, 48940 Leioa, Spain
- BIOFISIKA, Basque Centre for Biophysics, CSIC-UPV/EHU, 48940 Leioa, Spain
- IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Spain
- Correspondence: ;Tel.: +34-94-601-3547
| |
Collapse
|
3
|
Čmelo I, Voršilák M, Svozil D. Profiling and analysis of chemical compounds using pointwise mutual information. J Cheminform 2021; 13:3. [PMID: 33423694 PMCID: PMC7798221 DOI: 10.1186/s13321-020-00483-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 12/24/2020] [Indexed: 12/21/2022] Open
Abstract
Pointwise mutual information (PMI) is a measure of association used in information theory. In this paper, PMI is used to characterize several publicly available databases (DrugBank, ChEMBL, PubChem and ZINC) in terms of association strength between compound structural features resulting in database PMI interrelation profiles. As structural features, substructure fragments obtained by coding individual compounds as MACCS, PubChemKey and ECFP fingerprints are used. The analysis of publicly available databases reveals, in accord with other studies, unusual properties of DrugBank compounds which further confirms the validity of PMI profiling approach. Z-standardized relative feature tightness (ZRFT), a PMI-derived measure that quantifies how well the given compound's feature combinations fit these in a particular compound set, is applied for the analysis of compound synthetic accessibility (SA), as well as for the classification of compounds as easy (ES) and hard (HS) to synthesize. ZRFT value distributions are compared with these of SYBA and SAScore. The analysis of ZRFT values of structurally complex compounds in the SAVI database reveals oligopeptide structures that are mispredicted by SAScore as HS, while correctly predicted by ZRFT and SYBA as ES. Compared to SAScore, SYBA and random forest, ZRFT predictions are less accurate, though by a narrow margin (AccZRFT = 94.5%, AccSYBA = 98.8%, AccSAScore = 99.0%, AccRF = 97.3%). However, ZRFT ability to distinguish between ES and HS compounds is surprisingly high considering that while SYBA, SAScore and random forest are dedicated SA models, ZRFT is a generic measurement that merely quantifies the strength of interrelations between structural feature pairs. The results presented in the current work indicate that structural feature co-occurrence, quantified by PMI or ZRFT, contains a significant amount of information relevant to physico-chemical properties of organic compounds.
Collapse
Affiliation(s)
- I. Čmelo
- CZ-OPENSCREEN National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28 Prague, Czech Republic
| | - M. Voršilák
- CZ-OPENSCREEN National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28 Prague, Czech Republic
- CZ-OPENSCREEN National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR v. v. i., Vídeňská 1083, 142 20 Prague 4, Czech Republic
| | - D. Svozil
- CZ-OPENSCREEN National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28 Prague, Czech Republic
- CZ-OPENSCREEN National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR v. v. i., Vídeňská 1083, 142 20 Prague 4, Czech Republic
| |
Collapse
|
4
|
Barigye SJ, Gómez-Ganau S, Serrano-Candelas E, Gozalbes R. PeptiDesCalculator: Software for computation of peptide descriptors. Definition, implementation and case studies for 9 bioactivity endpoints. Proteins 2020; 89:174-184. [PMID: 32881068 DOI: 10.1002/prot.26003] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 08/05/2020] [Accepted: 08/27/2020] [Indexed: 11/09/2022]
Abstract
We present a novel Java-based program denominated PeptiDesCalculator for computing peptide descriptors. These descriptors include: redefinitions of known protein parameters to suite the peptide domain, generalization schemes for the global descriptions of peptide characteristics, as well as empirical descriptors based on experimental evidence on peptide stability and interaction propensity. The PeptiDesCalculator software provides a user-friendly Graphical User Interface (GUI) and is parallelized to maximize the use of computational resources available in current work stations. The PeptiDesCalculator indices are employed in modeling 8 peptide bioactivity endpoints demonstrating satisfactory behavior. Moreover, we compare the performance of a support vector machine (SVM) classifier built using 15 PeptiDesCalculator indices with that of a recently reported deep neural network (DNN) antimicrobial activity classifier, demonstrating comparable test set performance notwithstanding the remarkably lower degree of freedom for the former. This software will facilitate the development of in silico models for the prediction of peptide properties.
Collapse
Affiliation(s)
- Stephen J Barigye
- ProtoQSAR SL, Centro Europeo de Empresas Innovadoras (CEEI), Parque Tecnológico de Valencia, Valencia, Spain.,MolDrug AI Systems SL, Valencia, Spain
| | - Sergi Gómez-Ganau
- ProtoQSAR SL, Centro Europeo de Empresas Innovadoras (CEEI), Parque Tecnológico de Valencia, Valencia, Spain.,Eurofins Agroscience Services Regulatory Spain SL, Valencia, Spain
| | - Eva Serrano-Candelas
- ProtoQSAR SL, Centro Europeo de Empresas Innovadoras (CEEI), Parque Tecnológico de Valencia, Valencia, Spain
| | - Rafael Gozalbes
- ProtoQSAR SL, Centro Europeo de Empresas Innovadoras (CEEI), Parque Tecnológico de Valencia, Valencia, Spain.,MolDrug AI Systems SL, Valencia, Spain
| |
Collapse
|
5
|
Barigye SJ, García de la Vega JM, Perez-Castillo Y. Generative Adversarial Networks (GANs) Based Synthetic Sampling for Predictive Modeling. Mol Inform 2020; 39:e2000086. [PMID: 32558335 DOI: 10.1002/minf.202000086] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2020] [Accepted: 06/19/2020] [Indexed: 12/30/2022]
Abstract
In the present report we evaluate the possible utility of the Generative Adversarial Networks (GANs) in mapping the chemical structural space for molecular property profiles, with the goal of subsequently yielding synthetic (artificial) samples for ligand-based molecular modeling. Two case studies are considered: BACE-1 (β-Secretase 1) and DENV (Dengue Virus) inhibitory activities, with the former focused on data populating and the latter on data balancing tasks. We train GANs using subsamples extracted from datasets for each bioactivity endpoint, and apply the trained networks in generating synthetic examples from the respective bioactivity chemical spaces. Original and synthetic samples are pooled together and employed to build BACE-1 and DENV inhibitory activity classifiers and their performance evaluated over tenfold external validation sets. In both case studies, the obtained classifiers demonstrate satisfactory predictivity with the former yielding accuracy (ACC) and Mathew's correlation coefficient (MCC) values of 0.80 and 0.59, while the latter produces balanced accuracy(BACC) and MCC values of 0.81 and 0.70, respectively. Moreover, the statistics of these classifiers are compared with those of other models in the literature demonstrating comparable to better performance. These results suggest that GANs may be useful in mapping the chemical space for molecular property profiles of interest, and thus allow for the extraction of synthetic examples for computational modeling.
Collapse
Affiliation(s)
- Stephen J Barigye
- Departamento de Química Física Aplicada, Facultad de Ciencias, Universidad Autónoma de Madrid (UAM), 28049, Madrid, Spain
| | - José M García de la Vega
- Departamento de Química Física Aplicada, Facultad de Ciencias, Universidad Autónoma de Madrid (UAM), 28049, Madrid, Spain
| | - Yunierkis Perez-Castillo
- Bio-Chemoinformatics Research Group and Escuela de Ciencias Físicas y Matemáticas, Universidad de Las Américas, Quito, 170504, Ecuador
| |
Collapse
|
6
|
Contreras-Torres E, Marrero-Ponce Y, Terán JE, García-Jacas CR, Brizuela CA, Sánchez-Rodríguez JC. MuLiMs-MCoMPAs: A Novel Multiplatform Framework to Compute Tensor Algebra-Based Three-Dimensional Protein Descriptors. J Chem Inf Model 2020; 60:1042-1059. [PMID: 31663741 DOI: 10.1021/acs.jcim.9b00629] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
This report introduces the MuLiMs-MCoMPAs software (acronym for Multi-Linear Maps based on N-Metric and Contact Matrices of 3D Protein and Amino-acid weightings), designed to compute tensor-based 3D protein structural descriptors by applying two- and three-linear algebraic forms. Moreover, these descriptors contemplate generalizing components such as novel 3D protein structural representations, (dis)similarity metrics, and multimetrics to extract geometrical related information between two and three amino acids, weighting schemes based on amino acid properties, matrix normalization procedures that consider simple-stochastic and mutual probability transformations, topological and geometrical cutoffs, amino acid, and group-based MD calculations, and aggregation operators for merging amino acidic and group MDs. The MuLiMs-MCoMPAs software, which belongs to the ToMoCoMD-CAMPS suite, was developed in Java (version 1.8) using the Chemistry Development Kit (CDK) (version 1.4.19) and the Jmol libraries. This software implemented a divide-and-conquer strategy to parallelize the computation of the indices as well as modules for data preprocessing and batch computing functionalities. Furthermore, it consists of two components: (i) a desktop-graphical user interface (GUI) and (ii) an API library. The relevance of this novel approach is demonstrated through two analyses that considered Shannon's entropy-based variability and a principal component analysis. These studies showed that the MuLiMs-MCoMPAs' three-linear descriptor family contains higher informational entropy than several other descriptors generated with available computation tools. Moreover, the MuLiMs-MCoMPAs indices capture additional orthogonal information to the one codified by the available calculation approaches. As a result, two sets of suggested theoretical configurations that contain 13648 two-linear indices and 20263 three-linear indices are available for download at tomocomd.com . Furthermore, as a demonstration of the applicability and easy integration of the MuLiMs library into a QSAR-based expert system, a software application (ProStAF) was generated to predict SCOP protein structural classes and folding rate. It can thus be anticipated that the MuLiMs-MCoMPAs framework will turn into a valuable contribution to the chem- and bioinformatics research fields.
Collapse
Affiliation(s)
- Ernesto Contreras-Torres
- Computer-Aided Molecular "Biosilico" Discovery and Bioinformatics Research International Network (CAMD-BIR IN) , Cumbayá, Quito , Ecuador.,Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas; and Instituto de Simulación Computacional (ISC-USFQ) , Universidad San Francisco de Quito (USFQ) , Diego de Robles y vía Interoceánica , Quito 170157 , Pichincha , Ecuador
| | - Yovani Marrero-Ponce
- Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas; and Instituto de Simulación Computacional (ISC-USFQ) , Universidad San Francisco de Quito (USFQ) , Diego de Robles y vía Interoceánica , Quito 170157 , Pichincha , Ecuador.,Grupo GINUMED, Facultad de Salud, Programa de Medicina , Corporacion Universitaria Rafal Nuñez , Cartagena , Colombia.,Unidad de Investigación de Diseño de Fármacos y Conectividad Molecular, Departamento de Química Física, Facultad de Farmacia , Universitat de València , 46010 Valéncia , Spain
| | - Julio E Terán
- Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas; and Instituto de Simulación Computacional (ISC-USFQ) , Universidad San Francisco de Quito (USFQ) , Diego de Robles y vía Interoceánica , Quito 170157 , Pichincha , Ecuador.,Grupo de Química Computacional y Teórica, Departamento de Ingeniería Química , Universidad San Francisco de Quito (USFQ) , Diego de Robles y vía Interoceánica , Quito 170157 , Pichincha Ecuador
| | - César R García-Jacas
- Cátedras Conacyt-Departamento de Ciencias de la Computación , Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE) , Ensenada , Baja California , México
| | - Carlos A Brizuela
- Departamento de Ciencias de la Computación , Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE) , Ensenada , Baja California , México
| | | |
Collapse
|
7
|
Marrero-Ponce Y, Teran JE, Contreras-Torres E, García-Jacas CR, Perez-Castillo Y, Cubillan N, Peréz-Giménez F, Valdés-Martini JR. LEGO-based generalized set of two linear algebraic 3D bio-macro-molecular descriptors: Theory and validation by QSARs. J Theor Biol 2020; 485:110039. [DOI: 10.1016/j.jtbi.2019.110039] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2019] [Revised: 09/11/2019] [Accepted: 10/02/2019] [Indexed: 11/28/2022]
|
8
|
Undersampling: case studies of flaviviral inhibitory activities. J Comput Aided Mol Des 2019; 33:997-1008. [DOI: 10.1007/s10822-019-00255-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Accepted: 11/19/2019] [Indexed: 12/22/2022]
|
9
|
Terán JE, Marrero-Ponce Y, Contreras-Torres E, García-Jacas CR, Vivas-Reyes R, Terán E, Torres FJ. Tensor Algebra-based Geometrical (3D) Biomacro-Molecular Descriptors for Protein Research: Theory, Applications and Comparison with other Methods. Sci Rep 2019; 9:11391. [PMID: 31388082 PMCID: PMC6684663 DOI: 10.1038/s41598-019-47858-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Accepted: 07/22/2019] [Indexed: 11/16/2022] Open
Abstract
In this report, a new type of tridimensional (3D) biomacro-molecular descriptors for proteins are proposed. These descriptors make use of multi-linear algebra concepts based on the application of 3-linear forms (i.e., Canonical Trilinear (Tr), Trilinear Cubic (TrC), Trilinear-Quadratic-Bilinear (TrQB) and so on) as a specific case of the N-linear algebraic forms. The definition of the kth 3-tuple similarity-dissimilarity spatial matrices (Tensor’s Form) are used for the transformation and for the representation of the existing chemical information available in the relationships between three amino acids of a protein. Several metrics (Minkowski-type, wave-edge, etc) and multi-metrics (Triangle area, Bond-angle, etc) are proposed for the interaction information extraction, as well as probabilistic transformations (e.g., simple stochastic and mutual probability) to achieve matrix normalization. A generalized procedure considering amino acid level-based indices that can be fused together by using aggregator operators for descriptors calculations is proposed. The obtained results demonstrated that the new proposed 3D biomacro-molecular indices perform better than other approaches in the SCOP-based discrimination and the prediction of folding rate of proteins by using simple linear parametrical models. It can be concluded that the proposed method allows the definition of 3D biomacro-molecular descriptors that contain orthogonal information capable of providing better models for applications in protein science.
Collapse
Affiliation(s)
- Julio E Terán
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Translacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Quito, Pichincha, Ecuador.,Universidad San Francisco de Quito (USFQ), Grupo de Química Computacional y Teórica (QCT-USFQ), Departamento de Ingeniería Química, and Instituto de Simulación Computacional (ISC-USFQ), Quito, Pichincha, Ecuador
| | - Yovani Marrero-Ponce
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Translacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Quito, Pichincha, Ecuador. .,Universidad de San Buenaventura - Cartagena - Facultad de Ciencias de la Salud - Grupo de Investigación Microbiología & Ambiente (GIMA) - Calle Real de Ternera, Diagonal 32, No. 30-966, Cartagena, Código postal: 1300 10, Colombia.
| | - Ernesto Contreras-Torres
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Translacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Quito, Pichincha, Ecuador
| | - César R García-Jacas
- Cátedras CONACYT - Departamento de Ciencia de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California, Mexico
| | - Ricardo Vivas-Reyes
- Grupo de Química Cuántica y Teórica de la Universidad de Cartagena-Facultad de Ciencias Exactas y Naturales. Programa de Química. Campus de San Pablo and Grupo GINUMED Corporacion Universitaria Rafal Nuñez. Facultad de Salud. Programa de Medicina., Cartagena, Colombia.,Grupo CipTec, Facultad de Ingenierias. Fundacion Universitaria Tecnologico Comfenalco - Cartagena, Cartagena, Bolívar, Colombia
| | - Enrique Terán
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Translacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Quito, Pichincha, Ecuador
| | - F Javier Torres
- Universidad San Francisco de Quito (USFQ), Grupo de Química Computacional y Teórica (QCT-USFQ), Departamento de Ingeniería Química, and Instituto de Simulación Computacional (ISC-USFQ), Quito, Pichincha, Ecuador
| |
Collapse
|
10
|
Martínez-López Y, Barigye SJ, Martínez-Santiago O, Marrero-Ponce Y, Green J, Castillo-Garit JA. Prediction of aquatic toxicity of benzene derivatives using molecular descriptor from atomic weighted vectors. ENVIRONMENTAL TOXICOLOGY AND PHARMACOLOGY 2017; 56:314-321. [PMID: 29091819 DOI: 10.1016/j.etap.2017.10.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Revised: 10/09/2017] [Accepted: 10/11/2017] [Indexed: 06/07/2023]
Abstract
Several descriptors from atom weighted vectors are used in the prediction of aquatic toxicity of set of organic compounds of 392 benzene derivatives to the protozoo ciliate Tetrahymena pyriformis (log(IGC50)-1). These descriptors are calculated using the MD-LOVIs software and various Aggregation Operators are examined with the aim comparing their performances in predicting aquatic toxicity. Variability analysis is used to quantify the information content of these molecular descriptors by means of an information theory-based algorithm. Multiple Linear Regression with Genetic Algorithms is used to obtain models of the structure-toxicity relationships; the best model shows values of Q2=0.830 and R2=0.837 using six variables. Our models compare favorably with other previously published models that use the same data set. The obtained results suggest that these descriptors provide an effective alternative for determining aquatic toxicity of benzene derivatives.
Collapse
Affiliation(s)
- Yoan Martínez-López
- Department of Computer Sciences, Faculty of Informatics, Camaguey University, Camaguey City, 74650, Camaguey, Cuba; Unit of Computer-Aided Molecular "Biosilico" Discovery and Bioinformatic Research (CAMD-BIR Unit), Faculty of Chemistry-Pharmacy. Universidad Central "Martha Abreu" de Las Villas, Santa Clara, 54830, Villa Clara, Cuba
| | - Stephen J Barigye
- Departamento de Química, Universidade Federal de Lavras, CP 3037, 37200-000, Lavras, MG, Brazil
| | - Oscar Martínez-Santiago
- Unit of Computer-Aided Molecular "Biosilico" Discovery and Bioinformatic Research (CAMD-BIR Unit), Faculty of Chemistry-Pharmacy. Universidad Central "Martha Abreu" de Las Villas, Santa Clara, 54830, Villa Clara, Cuba
| | - Yovani Marrero-Ponce
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Av. Interoceánica Km 12 ½, Cumbayá, Ecuador
| | - James Green
- Department of Systems and Computer Engineering, Carleton University, Ottawa, Ontario, Canada
| | - Juan A Castillo-Garit
- Unit of Computer-Aided Molecular "Biosilico" Discovery and Bioinformatic Research (CAMD-BIR Unit), Faculty of Chemistry-Pharmacy. Universidad Central "Martha Abreu" de Las Villas, Santa Clara, 54830, Villa Clara, Cuba; Department of Systems and Computer Engineering, Carleton University, Ottawa, Ontario, Canada; Unidad de Toxicologia Experimental, Universidad de Ciencias Médicas de Villa Clara Santa Clara, 50200, Villa Clara, Cuba.
| |
Collapse
|
11
|
Valdés-Martiní JR, Marrero-Ponce Y, García-Jacas CR, Martinez-Mayorga K, Barigye SJ, Vaz d'Almeida YS, Pham-The H, Pérez-Giménez F, Morell CA. QuBiLS-MAS, open source multi-platform software for atom- and bond-based topological (2D) and chiral (2.5D) algebraic molecular descriptors computations. J Cheminform 2017; 9:35. [PMID: 29086120 PMCID: PMC5462671 DOI: 10.1186/s13321-017-0211-5] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2016] [Accepted: 04/07/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In previous reports, Marrero-Ponce et al. proposed algebraic formalisms for characterizing topological (2D) and chiral (2.5D) molecular features through atom- and bond-based ToMoCoMD-CARDD (acronym for Topological Molecular Computational Design-Computer Aided Rational Drug Design) molecular descriptors. These MDs codify molecular information based on the bilinear, quadratic and linear algebraic forms and the graph-theoretical electronic-density and edge-adjacency matrices in order to consider atom- and bond-based relations, respectively. These MDs have been successfully applied in the screening of chemical compounds of different therapeutic applications ranging from antimalarials, antibacterials, tyrosinase inhibitors and so on. To compute these MDs, a computational program with the same name was initially developed. However, this in house software barely offered the functionalities required in contemporary molecular modeling tasks, in addition to the inherent limitations that made its usability impractical. Therefore, the present manuscript introduces the QuBiLS-MAS (acronym for Quadratic, Bilinear and N-Linear mapS based on graph-theoretic electronic-density Matrices and Atomic weightingS) software designed to compute topological (0-2.5D) molecular descriptors based on bilinear, quadratic and linear algebraic forms for atom- and bond-based relations. RESULTS The QuBiLS-MAS module was designed as standalone software, in which extensions and generalizations of the former ToMoCoMD-CARDD 2D-algebraic indices are implemented, considering the following aspects: (a) two new matrix normalization approaches based on double-stochastic and mutual probability formalisms; (b) topological constraints (cut-offs) to take into account particular inter-atomic relations; (c) six additional atomic properties to be used as weighting schemes in the calculation of the molecular vectors; (d) four new local-fragments to consider molecular regions of interest; (e) number of lone-pair electrons in chemical structure defined by diagonal coefficients in matrix representations; and (f) several aggregation operators (invariants) applied over atom/bond-level descriptors in order to compute global indices. This software permits the parallel computation of the indices, contains a batch processing module and data curation functionalities. This program was developed in Java v1.7 using the Chemistry Development Kit library (version 1.4.19). The QuBiLS-MAS software consists of two components: a desktop interface (GUI) and an API library allowing for the easy integration of the latter in chemoinformatics applications. The relevance of the novel extensions and generalizations implemented in this software is demonstrated through three studies. Firstly, a comparative Shannon's entropy based variability study for the proposed QuBiLS-MAS and the DRAGON indices demonstrates superior performance for the former. A principal component analysis reveals that the QuBiLS-MAS approach captures chemical information orthogonal to that codified by the DRAGON descriptors. Lastly, a QSAR study for the binding affinity to the corticosteroid-binding globulin using Cramer's steroid dataset is carried out. CONCLUSIONS From these analyses, it is revealed that the QuBiLS-MAS approach for atom-pair relations yields similar-to-superior performance with regard to other QSAR methodologies reported in the literature. Therefore, the QuBiLS-MAS approach constitutes a useful tool for the diversity analysis of chemical compound datasets and high-throughput screening of structure-activity data.
Collapse
Affiliation(s)
- José R Valdés-Martiní
- StreelBridge Laboratories, SteelBridge Consulting Technology Solutions, Miami, FL, USA
| | - Yovani Marrero-Ponce
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Quito, Ecuador. .,Universidad San Francisco de Quito (USFQ), Instituto de Simulación Computacional (ISC-USFQ), Diego de Robles y vía Interoceánica, 170157, Quito, Pichincha, Ecuador. .,Computer-Aided Molecular "Biosilico" Discovery and Bioinformatics Research International Network (CAMD-BIR IN), Cumbayá, Quito, Ecuador. .,Grupo de Investigación Ambiental (GIA), Fundación Universitaria Tecnológico de Comfenalco, Facultad de Ingenierías, Programa de Ingeniería de Procesos, Cartagena de Indias, Bolívar, Colombia. .,Unidad de Investigación de Diseño de Fármacos y Conectividad Molecular, Departamento de Química Física, Facultad de Farmacia, Universitat de València, Valencia, Spain.
| | - César R García-Jacas
- Instituto de Química, Universidad Nacional Autónoma de México (UNAM), Ciudad de México, México.,Escuela de Sistemas y Computación, Pontificia Universidad Católica del Ecuador Sede Esmeraldas (PUCESE), Esmeraldas, Ecuador.,Grupo de Investigación de Bioinformática, Universidad de las Ciencias Informáticas (UCI), Havana, Cuba
| | - Karina Martinez-Mayorga
- Instituto de Química, Universidad Nacional Autónoma de México (UNAM), Ciudad de México, México
| | - Stephen J Barigye
- Facultad de Medicina, Universidad de Las Américas, Quito, Pichincha, Ecuador
| | | | - Hai Pham-The
- Department of Pharmaceutical Chemistry, Hanoi University of Pharmacy, 13-15 Le Thanh Tong, Hoan Kiem, Hanoi, Vietnam
| | - Facundo Pérez-Giménez
- Unidad de Investigación de Diseño de Fármacos y Conectividad Molecular, Departamento de Química Física, Facultad de Farmacia, Universitat de València, Valencia, Spain
| | - Carlos A Morell
- Laboratorio de Inteligencia Artificial, Centro de Estudios de Informática (CEI), Facultad de Matemática, Física y Computación, Universidad Central "Marta Abreu" de Las Villas, Santa Clara, Villa Clara, Cuba
| |
Collapse
|
12
|
Martínez-Santiago O, Marrero-Ponce Y, Vivas-Reyes R, Rivera-Borroto OM, Hurtado E, Treto-Suarez MA, Ramos Y, Vergara-Murillo F, Orozco-Ugarriza ME, Martínez-López Y. Exploring the QSAR's predictive truthfulness of the novel N-tuple discrete derivative indices on benchmark datasets. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2017; 28:367-389. [PMID: 28590848 DOI: 10.1080/1062936x.2017.1326403] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2017] [Accepted: 04/27/2017] [Indexed: 06/07/2023]
Abstract
Graph derivative indices (GDIs) have recently been defined over N-atoms (N = 2, 3 and 4) simultaneously, which are based on the concept of derivatives in discrete mathematics (finite difference), metaphorical to the derivative concept in classical mathematical analysis. These molecular descriptors (MDs) codify topo-chemical and topo-structural information based on the concept of the derivative of a molecular graph with respect to a given event (S) over duplex, triplex and quadruplex relations of atoms (vertices). These GDIs have been successfully applied in the description of physicochemical properties like reactivity, solubility and chemical shift, among others, and in several comparative quantitative structure activity/property relationship (QSAR/QSPR) studies. Although satisfactory results have been obtained in previous modelling studies with the aforementioned indices, it is necessary to develop new, more rigorous analysis to assess the true predictive performance of the novel structure codification. So, in the present paper, an assessment and statistical validation of the performance of these novel approaches in QSAR studies are executed, as well as a comparison with those of other QSAR procedures reported in the literature. To achieve the main aim of this research, QSARs were developed on eight chemical datasets widely used as benchmarks in the evaluation/validation of several QSAR methods and/or many different MDs (fundamentally 3D MDs). Three to seven variable QSAR models were built for each chemical dataset, according to the original dissection into training/test sets. The models were developed by using multiple linear regression (MLR) coupled with a genetic algorithm as the feature wrapper selection technique in the MobyDigs software. Each family of GDIs (for duplex, triplex and quadruplex) behaves similarly in all modelling, although there were some exceptions. However, when all families were used in combination, the results achieved were quantitatively higher than those reported by other authors in similar experiments. Comparisons with respect to external correlation coefficients (q2ext) revealed that the models based on GDIs possess superior predictive ability in seven of the eight datasets analysed, outperforming methodologies based on similar or more complex techniques and confirming the good predictive power of the obtained models. For the q2ext values, the non-parametric comparison revealed significantly different results to those reported so far, which demonstrated that the models based on DIVATI's indices presented the best global performance and yielded significantly better predictions than the 12 0-3D QSAR procedures used in the comparison. Therefore, GDIs are suitable for structure codification of the molecules and constitute a good alternative to build QSARs for the prediction of physicochemical, biological and environmental endpoints.
Collapse
Affiliation(s)
- O Martínez-Santiago
- a Department of Chemical Sciences , Central University 'Martha Abreu' of Las Villas , Santa Clara , Cuba
- b Unit of Computer-Aided Molecular 'Biosilico' Discovery and Bioinformatics Research International Network (CAMD-BIR IN) , Quito , Ecuador
- c Group of Quantum and Theoretical Chemistry , University of Cartagena , Cartagena de Indias , Colombia
- d Facultad de Ingeniería , Grupo CipTec, Fundación Universitaria Tecnológico Comfenalco , Cartagena de Indias , Colombia
| | - Y Marrero-Ponce
- b Unit of Computer-Aided Molecular 'Biosilico' Discovery and Bioinformatics Research International Network (CAMD-BIR IN) , Quito , Ecuador
- e Escuela de Medicina, Edificio de Especialidades Médicas , Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA) , Av. Interoceánica Km 12 ½, Cumbayá , Ecuador
- f Instituto de Simulación Computacional (ISC-USFQ), Diego de Robles y vía Interoceánica , Quito , Ecuador
- g Grupo de Investigación Ambiental (GIA) , Fundación Universitaria Tecnológico de Comfenalco , Cartagena de Indias , Colombia
| | - R Vivas-Reyes
- c Group of Quantum and Theoretical Chemistry , University of Cartagena , Cartagena de Indias , Colombia
- d Facultad de Ingeniería , Grupo CipTec, Fundación Universitaria Tecnológico Comfenalco , Cartagena de Indias , Colombia
| | - O M Rivera-Borroto
- b Unit of Computer-Aided Molecular 'Biosilico' Discovery and Bioinformatics Research International Network (CAMD-BIR IN) , Quito , Ecuador
- h Departamento de Química Física Aplicada , Universidad Autónoma de Madrid (UAM) , Madrid , España
| | - E Hurtado
- b Unit of Computer-Aided Molecular 'Biosilico' Discovery and Bioinformatics Research International Network (CAMD-BIR IN) , Quito , Ecuador
| | - M A Treto-Suarez
- i Center of Applied Nanosciences (CENAP), Andres Bello University , Chile
| | - Y Ramos
- j Department of Economic Sciences , University of Camagüey , Camagüey , Cuba
| | - F Vergara-Murillo
- c Group of Quantum and Theoretical Chemistry , University of Cartagena , Cartagena de Indias , Colombia
- d Facultad de Ingeniería , Grupo CipTec, Fundación Universitaria Tecnológico Comfenalco , Cartagena de Indias , Colombia
| | - M E Orozco-Ugarriza
- k Seccional Cartagena y Grupo de Investigación Traslacional en Biomedicina & Biotecnología - GITB&B , Universidad del Sinú - Elías Bechara Zainúm , Cartagena de Indias , Colombia
| | - Y Martínez-López
- b Unit of Computer-Aided Molecular 'Biosilico' Discovery and Bioinformatics Research International Network (CAMD-BIR IN) , Quito , Ecuador
- l Grupo de Investigación de Inteligencia Artificial (AIRES) , Universidad de Camagüey , Camagüey , Cuba
| |
Collapse
|
13
|
Guimarães MC, Duarte MH, Silla JM, Freitas MP. Is conformation a fundamental descriptor in QSAR? A case for halogenated anesthetics. Beilstein J Org Chem 2016; 12:760-8. [PMID: 27340468 PMCID: PMC4902069 DOI: 10.3762/bjoc.12.76] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2016] [Accepted: 04/07/2016] [Indexed: 01/08/2023] Open
Abstract
An intriguing question in 3D-QSAR lies on which conformation(s) to use when generating molecular descriptors (MD) for correlation with bioactivity values. This is not a simple task because the bioactive conformation in molecule data sets is usually unknown and, therefore, optimized structures in a receptor-free environment are often used to generate the MD´s. In this case, a wrong conformational choice can cause misinterpretation of the QSAR model. The present computational work reports the conformational analysis of the volatile anesthetic isoflurane (2-chloro-2-(difluoromethoxy)-1,1,1-trifluoroethane) in the gas phase and also in polar and nonpolar implicit and explicit solvents to show that stable minima (ruled by intramolecular interactions) do not necessarily coincide with the bioconformation (ruled by enzyme induced fit). Consequently, a QSAR model based on two-dimensional chemical structures was built and exhibited satisfactory modeling/prediction capability and interpretability, then suggesting that these 2D MD´s can be advantageous over some three-dimensional descriptors.
Collapse
Affiliation(s)
- Maria C Guimarães
- Department of Chemistry, Federal University of Lavras, P. O. Box 3037, 37200-000, Lavras, MG, Brazil
| | - Mariene H Duarte
- Department of Chemistry, Federal University of Lavras, P. O. Box 3037, 37200-000, Lavras, MG, Brazil
| | - Josué M Silla
- Department of Chemistry, Federal University of Lavras, P. O. Box 3037, 37200-000, Lavras, MG, Brazil
| | - Matheus P Freitas
- Department of Chemistry, Federal University of Lavras, P. O. Box 3037, 37200-000, Lavras, MG, Brazil
| |
Collapse
|
14
|
Martínez-Santiago O, Marrero-Ponce Y, Barigye SJ, Le Thi Thu H, Torres FJ, Zambrano CH, Muñiz Olite JL, Cruz-Monteagudo M, Vivas-Reyes R, Vázquez Infante L, Artiles Martínez LM. Physico-Chemical and Structural Interpretation of Discrete Derivative Indices on N-Tuples Atoms. Int J Mol Sci 2016; 17:ijms17060812. [PMID: 27240357 PMCID: PMC4926346 DOI: 10.3390/ijms17060812] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2016] [Revised: 04/27/2016] [Accepted: 05/04/2016] [Indexed: 11/28/2022] Open
Abstract
This report examines the interpretation of the Graph Derivative Indices (GDIs) from three different perspectives (i.e., in structural, steric and electronic terms). It is found that the individual vertex frequencies may be expressed in terms of the geometrical and electronic reactivity of the atoms and bonds, respectively. On the other hand, it is demonstrated that the GDIs are sensitive to progressive structural modifications in terms of: size, ramifications, electronic richness, conjugation effects and molecular symmetry. Moreover, it is observed that the GDIs quantify the interaction capacity among molecules and codify information on the activation entropy. A structure property relationship study reveals that there exists a direct correspondence between the individual frequencies of atoms and Hückel’s Free Valence, as well as between the atomic GDIs and the chemical shift in NMR, which collectively validates the theory that these indices codify steric and electronic information of the atoms in a molecule. Taking in consideration the regularity and coherence found in experiments performed with the GDIs, it is possible to say that GDIs possess plausible interpretation in structural and physicochemical terms.
Collapse
Affiliation(s)
- Oscar Martínez-Santiago
- Computer-Aided Molecular "Biosilico" Discovery and Bioinformatic Research International Network (CAMD-BIR IN), Cumbayá-Tumbaco, Quito 170184, Ecuador.
- Department of Chemical Science, Faculty of Chemistry-Pharmacy, Universidad Central "Martha Abreu" de Las Villas, Santa Clara 54830, Villa Clara, Cuba.
| | - Yovani Marrero-Ponce
- Computer-Aided Molecular "Biosilico" Discovery and Bioinformatic Research International Network (CAMD-BIR IN), Cumbayá-Tumbaco, Quito 170184, Ecuador.
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Hospital de los Valles, Av. Interoceánica Km 12 ½-Cumbayá, Quito 170157, Ecuador.
- Universidad San Francisco de Quito (USFQ), Instituto de Simulación Computacional (ISC-USFQ), Diego de Robles y vía Interoceánica, Quito 170157, Ecuador.
- Grupo de Investigación Microbiología y Ambiente (GIMA), Programa de Bacteriología, Facultad Ciencias de la Salud, Universidad de San Buenaventura, Calle Real de Ternera, Cartagena de Indias, Bolívar 130010, Colombia.
| | - Stephen J Barigye
- Computer-Aided Molecular "Biosilico" Discovery and Bioinformatic Research International Network (CAMD-BIR IN), Cumbayá-Tumbaco, Quito 170184, Ecuador.
- Departamento de Química, Universidade Federal de Lavras (UFLA), Caixa Postal 3037, Lavras 37200-000, MG, Brazil.
| | - Huong Le Thi Thu
- School of Medicine and Pharmacy, Vietnam National University, Hanoi (VNU) 144 Xuan Thuy, Cau Giay, Hanoi 100000, Vietnam.
| | - F Javier Torres
- Universidad San Francisco de Quito (USFQ), Instituto de Simulación Computacional (ISC-USFQ), Diego de Robles y vía Interoceánica, Quito 170157, Ecuador.
- Universidad San Francisco de Quito (USFQ), Grupo de Química Computacional y Teórica (QCT-USFQ), Departamento de Ingeniería Química, Diego de Robles y Vía Interoceánica, Quito 170157, Ecuador.
| | - Cesar H Zambrano
- Universidad San Francisco de Quito (USFQ), Instituto de Simulación Computacional (ISC-USFQ), Diego de Robles y vía Interoceánica, Quito 170157, Ecuador.
- Universidad San Francisco de Quito (USFQ), Grupo de Química Computacional y Teórica (QCT-USFQ), Departamento de Ingeniería Química, Diego de Robles y Vía Interoceánica, Quito 170157, Ecuador.
| | - Jorge L Muñiz Olite
- Grupo de Investigación en Estudios Químicos y Biológicos, Facultad de Ciencias Básicas, Universidad Tecnológica de Bolívar (UTB), Parque Industrial y Tecnológico Carlos Vélez Pombo Km 1 vía Turbaco, Cartagena de Indias, Bolívar 130010, Colombia.
| | - Maykel Cruz-Monteagudo
- Instituto de Investigaciones Biomédicas (IIB), Universidad de Las Américas (UDLA), Quito 170513, Ecuador.
| | - Ricardo Vivas-Reyes
- Grupo de Química Cuántica y Teórica, Facultad de Ciencias, Universidad de Cartagena, Cartagena de Indias, Bolívar 130001, Colombia.
- Grupo CipTec, Fundación Universitaria Tecnológico de Comfenalco, Facultad de Ingenierías, Programa de Ingeniería de Procesos, Cartagena de Indias, Bolívar 130001, Colombia.
| | - Liliana Vázquez Infante
- Computer-Aided Molecular "Biosilico" Discovery and Bioinformatic Research International Network (CAMD-BIR IN), Cumbayá-Tumbaco, Quito 170184, Ecuador.
- Department of Chemical Science, Faculty of Chemistry-Pharmacy, Universidad Central "Martha Abreu" de Las Villas, Santa Clara 54830, Villa Clara, Cuba.
| | - Luis M Artiles Martínez
- Computer-Aided Molecular "Biosilico" Discovery and Bioinformatic Research International Network (CAMD-BIR IN), Cumbayá-Tumbaco, Quito 170184, Ecuador.
| |
Collapse
|
15
|
García-Jacas CR, Contreras-Torres E, Marrero-Ponce Y, Pupo-Meriño M, Barigye SJ, Cabrera-Leyva L. Examining the predictive accuracy of the novel 3D N-linear algebraic molecular codifications on benchmark datasets. J Cheminform 2016; 8:10. [PMID: 26925168 PMCID: PMC4768433 DOI: 10.1186/s13321-016-0122-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2015] [Accepted: 02/09/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Recently, novel 3D alignment-free molecular descriptors (also known as QuBiLS-MIDAS) based on two-linear, three-linear and four-linear algebraic forms have been introduced. These descriptors codify chemical information for relations between two, three and four atoms by using several (dis-)similarity metrics and multi-metrics. Several studies aimed at assessing the quality of these novel descriptors have been performed. However, a deeper analysis of their performance is necessary. Therefore, in the present manuscript an assessment and statistical validation of the performance of these novel descriptors in QSAR studies is performed. RESULTS To this end, eight molecular datasets (angiotensin converting enzyme, acetylcholinesterase inhibitors, benzodiazepine receptor, cyclooxygenase-2 inhibitors, dihydrofolate reductase inhibitors, glycogen phosphorylase b, thermolysin inhibitors, thrombin inhibitors) widely used as benchmarks in the evaluation of several procedures are utilized. Three to nine variable QSAR models based on Multiple Linear Regression are built for each chemical dataset according to the original division into training/test sets. Comparisons with respect to leave-one-out cross-validation correlation coefficients[Formula: see text] reveal that the models based on QuBiLS-MIDAS indices possess superior predictive ability in 7 of the 8 datasets analyzed, outperforming methodologies based on similar or more complex techniques such as: Partial Least Square, Neural Networks, Support Vector Machine and others. On the other hand, superior external correlation coefficients[Formula: see text] are attained in 6 of the 8 test sets considered, confirming the good predictive power of the obtained models. For the [Formula: see text] values non-parametric statistic tests were performed, which demonstrated that the models based on QuBiLS-MIDAS indices have the best global performance and yield significantly better predictions in 11 of the 12 QSAR procedures used in the comparison. Lastly, a study concerning to the performance of the indices according to several conformer generation methods was performed. This demonstrated that the quality of predictions of the QSAR models based on QuBiLS-MIDAS indices depend on 3D structure generation method considered, although in this preliminary study the results achieved do not present significant statistical differences among them. CONCLUSIONS As conclusions it can be stated that the QuBiLS-MIDAS indices are suitable for extracting structural information of the molecules and thus, constitute a promissory alternative to build models that contribute to the prediction of pharmacokinetic, pharmacodynamics and toxicological properties on novel compounds.Graphical abstractComparative graphical representation of the performance of the novel QuBiLS-MIDAS 3D-MDs with respect to other methodologies in QSAR modeling of eight chemical datasets.
Collapse
Affiliation(s)
- César R García-Jacas
- Escuela de Sistemas y Computación, Pontificia Universidad Católica del Ecuador Sede Esmeraldas (PUCESE), Esmeraldas, Ecuador ; Grupo de Investigación de Bioinformática, Centro de Estudio de Matemática Computacional (CEMC), Universidad de las Ciencias Informáticas, La Habana, Cuba ; Computer-Aided Molecular "Biosilico" Discovery and Bioinformatics Research International Network (CAMD-BIR IN), Cumbayá, Quito, Ecuador ; Instituto de Simulación Computacional (ISC-USFQ), Universidad San Francisco de Quito (USFQ), Diego de Robles y vía Interoceánica, 17-1200-841 Quito, Ecuador
| | - Ernesto Contreras-Torres
- Departamento de Técnicas de Programación, Facultad 6, Universidad de las Ciencias Informáticas, La Habana, Cuba
| | - Yovani Marrero-Ponce
- Computer-Aided Molecular "Biosilico" Discovery and Bioinformatics Research International Network (CAMD-BIR IN), Cumbayá, Quito, Ecuador ; Instituto de Simulación Computacional (ISC-USFQ), Universidad San Francisco de Quito (USFQ), Diego de Robles y vía Interoceánica, 17-1200-841 Quito, Ecuador ; Escuela de Medicina, Colegio de Ciencias de la Salud, Edificio de Especialidades Médicas, Hospital de los Valles, Universidad San Francisco de Quito (USFQ), Av. Interoceánica Km 12 ½ - Cumbayá, Quito, Ecuador
| | - Mario Pupo-Meriño
- Grupo de Investigación de Bioinformática, Centro de Estudio de Matemática Computacional (CEMC), Universidad de las Ciencias Informáticas, La Habana, Cuba
| | - Stephen J Barigye
- Computer-Aided Molecular "Biosilico" Discovery and Bioinformatics Research International Network (CAMD-BIR IN), Cumbayá, Quito, Ecuador ; Departamento de Química, Universidade Federal de Lavras, UFLA, Caixa Postal 3037, Lavras, MG 37200-000 Brazil
| | - Lisset Cabrera-Leyva
- Escuela de Sistemas y Computación, Pontificia Universidad Católica del Ecuador Sede Esmeraldas (PUCESE), Esmeraldas, Ecuador ; Grupo de Investigación de Inteligencia Artificial (AIRES), Facultad de Informática, Universidad de Camagüey, Camagüey, Cuba
| |
Collapse
|
16
|
Barigye SJ, Freitas MP. Is molecular alignment an indispensable requirement in the MIA-QSAR method? J Comput Chem 2015; 36:1748-55. [DOI: 10.1002/jcc.23992] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2015] [Revised: 05/18/2015] [Accepted: 06/07/2015] [Indexed: 11/08/2022]
Affiliation(s)
- Stephen J. Barigye
- Department of Chemistry; Federal University of Lavras; P.O. Box 3037 Lavras, Minas Gerais 37200-000 Brazil
| | - Matheus P. Freitas
- Department of Chemistry; Federal University of Lavras; P.O. Box 3037 Lavras, Minas Gerais 37200-000 Brazil
| |
Collapse
|
17
|
Computational fishing of new DNA methyltransferase inhibitors from natural products. J Mol Graph Model 2015; 60:43-54. [PMID: 26099696 DOI: 10.1016/j.jmgm.2015.04.010] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Revised: 03/28/2015] [Accepted: 04/22/2015] [Indexed: 12/31/2022]
Abstract
DNA methyltransferase inhibitors (DNMTis) have become an alternative for cancer therapies. However, only two DNMTis have been approved as anticancer drugs, although with some restrictions. Natural products (NPs) are a promising source of drugs. In order to find NPs with novel chemotypes as DNMTis, 47 compounds with known activity against these enzymes were used to build a LDA-based QSAR model for active/inactive molecules (93% accuracy) based on molecular descriptors. This classifier was employed to identify potential DNMTis on 800 NPs from NatProd Collection. 447 selected compounds were docked on two human DNA methyltransferase (DNMT) structures (PDB codes: 3SWR and 2QRV) using AutoDock Vina and Surflex-Dock, prioritizing according to their score values, contact patterns at 4 Å and molecular diversity. Six consensus NPs were identified as virtual hits against DNMTs, including 9,10-dihydro-12-hydroxygambogic, phloridzin, 2',4'-dihydroxychalcone 4'-glucoside, daunorubicin, pyrromycin and centaurein. This method is an innovative computational strategy for identifying DNMTis, useful in the identification of potent and selective anticancer drugs.
Collapse
|
18
|
Ruiz-Blanco YB, Paz W, Green J, Marrero-Ponce Y. ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins. BMC Bioinformatics 2015; 16:162. [PMID: 25982853 PMCID: PMC4432771 DOI: 10.1186/s12859-015-0586-0] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Accepted: 04/22/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The exponential growth of protein structural and sequence databases is enabling multifaceted approaches to understanding the long sought sequence-structure-function relationship. Advances in computation now make it possible to apply well-established data mining and pattern recognition techniques to these data to learn models that effectively relate structure and function. However, extracting meaningful numerical descriptors of protein sequence and structure is a key issue that requires an efficient and widely available solution. RESULTS We here introduce ProtDCal, a new computational software suite capable of generating tens of thousands of features considering both sequence-based and 3D-structural descriptors. We demonstrate, by means of principle component analysis and Shannon entropy tests, how ProtDCal's sequence-based descriptors provide new and more relevant information not encoded by currently available servers for sequence-based protein feature generation. The wide diversity of the 3D-structure-based features generated by ProtDCal is shown to provide additional complementary information and effectively completes its general protein encoding capability. As demonstration of the utility of ProtDCal's features, prediction models of N-linked glycosylation sites are trained and evaluated. Classification performance compares favourably with that of contemporary predictors of N-linked glycosylation sites, in spite of not using domain-specific features as input information. CONCLUSIONS ProtDCal provides a friendly and cross-platform graphical user interface, developed in the Java programming language and is freely available at: http://bioinf.sce.carleton.ca/ProtDCal/ . ProtDCal introduces local and group-based encoding which enhances the diversity of the information captured by the computed features. Furthermore, we have shown that adding structure-based descriptors contributes non-redundant additional information to the features-based characterization of polypeptide systems. This software is intended to provide a useful tool for general-purpose encoding of protein sequences and structures for applications is protein classification, similarity analyses and function prediction.
Collapse
Affiliation(s)
- Yasser B Ruiz-Blanco
- Unit of Computer-Aided Molecular "Biosilico" Discovery and Bioinformatic Research (CAMD-BIR Unit), Facultad de Química y Farmacia, Universidad Central "Marta Abreu" de Las Villas, Road to Camajuani km 5 ½, Santa Clara, CP: 54830, Villa Clara, Cuba. .,Department of Systems and Computer Engineering, Carleton University, Ottawa, ON, Canada.
| | - Waldo Paz
- Unit of Computer-Aided Molecular "Biosilico" Discovery and Bioinformatic Research (CAMD-BIR Unit), Facultad de Química y Farmacia, Universidad Central "Marta Abreu" de Las Villas, Road to Camajuani km 5 ½, Santa Clara, CP: 54830, Villa Clara, Cuba. .,Centre of Informatics Studies (CEI), Universidad Central "Marta Abreu" de Las Villas, Road to Camajuani km 5 ½, Santa Clara, CP:54830, Villa Clara, Cuba.
| | - James Green
- Department of Systems and Computer Engineering, Carleton University, Ottawa, ON, Canada.
| | - Yovani Marrero-Ponce
- Unit of Computer-Aided Molecular "Biosilico" Discovery and Bioinformatic Research (CAMD-BIR Unit), Facultad de Química y Farmacia, Universidad Central "Marta Abreu" de Las Villas, Road to Camajuani km 5 ½, Santa Clara, CP: 54830, Villa Clara, Cuba. .,Grupo de Investigación Microbiología y Ambiente (GIMA). Programa de Bacteriología, Facultad Ciencias de la Salud, Universidad de San Buenaventura, Calle Real de Ternera, Cartagena (Bolivar), Colombia.
| |
Collapse
|
19
|
Barigye SJ, Marrero-Ponce Y, Zupan J, Pérez-Giménez F, Freitas MP. Structural and Physicochemical Interpretation of GT-STAF Information Theory-Based Indices. BULLETIN OF THE CHEMICAL SOCIETY OF JAPAN 2015. [DOI: 10.1246/bcsj.20140037] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Stephen J. Barigye
- Departamento de Química, Universidade Federal de Lavras, UFLA
- Unit of Computer-Aided Molecular “Biosilico” Discovery and Bioinformatic Research (CAMD-BIR Unit), Faculty of Chemistry-Pharmacy, Universidad Central “Martha Abreu” de Las Villas
| | - Yovani Marrero-Ponce
- Unit of Computer-Aided Molecular “Biosilico” Discovery and Bioinformatic Research (CAMD-BIR Unit), Faculty of Chemistry-Pharmacy, Universidad Central “Martha Abreu” de Las Villas
- Institut Universitari de Ciència Molecular, Universitat de València, Edifici d’Instituts de Paterna
- Unidad de Investigación de Diseño de Fármacos y Conectividad Molecular, Departamento de Química Física, Facultad de Farmacia, Universitat de València
- Facultad de Química Farmacéutica, Universidad de Cartagena
| | - Jure Zupan
- Laboratory of Chemometrics, National Institute of Chemistry
| | - Facundo Pérez-Giménez
- Unidad de Investigación de Diseño de Fármacos y Conectividad Molecular, Departamento de Química Física, Facultad de Farmacia, Universitat de València
| | | |
Collapse
|
20
|
Freitas MR, Barigye SJ, Freitas MP. Coloured chemical image-based models for the prediction of soil sorption of herbicides. RSC Adv 2015. [DOI: 10.1039/c4ra12070a] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Herbicides with high soil sorption profiles constitute important organic pollutants leading to detrimental environmental effects, particularly due to prolonged use.
Collapse
|
21
|
Speck-Planche A, Cordeiro MNDS. A general ANN-based multitasking model for the discovery of potent and safer antibacterial agents. Methods Mol Biol 2015; 1260:45-64. [PMID: 25502375 DOI: 10.1007/978-1-4939-2239-0_4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Bacteria have been one of the world's most dangerous and deadliest pathogens for mankind, nowadays giving rise to significant public health concerns. Given the prevalence of these microbial pathogens and their increasing resistance to existing antibiotics, there is a pressing need for new antibacterial drugs. However, development of a successful drug is a complex, costly, and time-consuming process. Quantitative Structure-Activity Relationships (QSAR)-based approaches are valuable tools for shortening the time of lead compound identification but also for focusing and limiting time-costly synthetic activities and in vitro/vivo evaluations. QSAR-based approaches, supported by powerful statistical techniques such as artificial neural networks (ANNs), have evolved to the point of integrating dissimilar types of chemical and biological data. This chapter reports an overview of the current research and potential applications of QSAR modeling tools toward the rational design of more efficient antibacterial agents. Particular emphasis is given to the setup of multitasking models along with ANNs aimed at jointly predicting different antibacterial activities and safety profiles of drugs/chemicals under diverse experimental conditions.
Collapse
Affiliation(s)
- A Speck-Planche
- Department of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, 4169-007, Porto, Portugal
| | | |
Collapse
|
22
|
García-Jacas CR, Marrero-Ponce Y, Acevedo-Martínez L, Barigye SJ, Valdés-Martiní JR, Contreras-Torres E. QuBiLS-MIDAS: a parallel free-software for molecular descriptors computation based on multilinear algebraic maps. J Comput Chem 2014; 35:1395-409. [PMID: 24889018 DOI: 10.1002/jcc.23640] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2014] [Revised: 04/22/2014] [Accepted: 04/23/2014] [Indexed: 11/12/2022]
Abstract
The present report introduces the QuBiLS-MIDAS software belonging to the ToMoCoMD-CARDD suite for the calculation of three-dimensional molecular descriptors (MDs) based on the two-linear (bilinear), three-linear, and four-linear (multilinear or N-linear) algebraic forms. Thus, it is unique software that computes these tensor-based indices. These descriptors, establish relations for two, three, and four atoms by using several (dis-)similarity metrics or multimetrics, matrix transformations, cutoffs, local calculations and aggregation operators. The theoretical background of these N-linear indices is also presented. The QuBiLS-MIDAS software was developed in the Java programming language and employs the Chemical Development Kit library for the manipulation of the chemical structures and the calculation of the atomic properties. This software is composed by a desktop user-friendly interface and an Abstract Programming Interface library. The former was created to simplify the configuration of the different options of the MDs, whereas the library was designed to allow its easy integration to other software for chemoinformatics applications. This program provides functionalities for data cleaning tasks and for batch processing of the molecular indices. In addition, it offers parallel calculation of the MDs through the use of all available processors in current computers. The studies of complexity of the main algorithms demonstrate that these were efficiently implemented with respect to their trivial implementation. Lastly, the performance tests reveal that this software has a suitable behavior when the amount of processors is increased. Therefore, the QuBiLS-MIDAS software constitutes a useful application for the computation of the molecular indices based on N-linear algebraic maps and it can be used freely to perform chemoinformatics studies.
Collapse
Affiliation(s)
- César R García-Jacas
- Grupo de Investigación de Bioinformática, Centro de Estudio de Matemática Computacional, Universidad de las Ciencias Informáticas, La Habana, Cuba; Unit of Computer-Aided Molecular "Biosilico" Discovery and Bioinformatic Research (CAMD-BIR Unit), Faculty of Chemistry-Pharmacy, Universidad Central "Martha Abreu" de Las Villas, Santa Clara, 54830, Villa Clara, Cuba
| | | | | | | | | | | |
Collapse
|
23
|
Barigye SJ, Marrero-Ponce Y, Pérez-Giménez F, Bonchev D. Trends in information theory-based chemical structure codification. Mol Divers 2014; 18:673-86. [DOI: 10.1007/s11030-014-9517-7] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2013] [Accepted: 03/07/2014] [Indexed: 11/25/2022]
|
24
|
|