1
|
Agüero-Chapin G, Galpert D, Molina-Ruiz R, Ancede-Gallardo E, Pérez-Machado G, De la Riva GA, Antunes A. Graph Theory-Based Sequence Descriptors as Remote Homology Predictors. Biomolecules 2019; 10:E26. [PMID: 31878100 PMCID: PMC7022958 DOI: 10.3390/biom10010026] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 12/16/2019] [Accepted: 12/18/2019] [Indexed: 12/23/2022] Open
Abstract
Alignment-free (AF) methodologies have increased in popularity in the last decades as alternative tools to alignment-based (AB) algorithms for performing comparative sequence analyses. They have been especially useful to detect remote homologs within the twilight zone of highly diverse gene/protein families and superfamilies. The most popular alignment-free methodologies, as well as their applications to classification problems, have been described in previous reviews. Despite a new set of graph theory-derived sequence/structural descriptors that have been gaining relevance in the detection of remote homology, they have been omitted as AF predictors when the topic is addressed. Here, we first go over the most popular AF approaches used for detecting homology signals within the twilight zone and then bring out the state-of-the-art tools encoding graph theory-derived sequence/structure descriptors and their success for identifying remote homologs. We also highlight the tendency of integrating AF features/measures with the AB ones, either into the same prediction model or by assembling the predictions from different algorithms using voting/weighting strategies, for improving the detection of remote signals. Lastly, we briefly discuss the efforts made to scale up AB and AF features/measures for the comparison of multiple genomes and proteomes. Alongside the achieved experiences in remote homology detection by both the most popular AF tools and other less known ones, we provide our own using the graphical-numerical methodologies, MARCH-INSIDE, TI2BioP, and ProtDCal. We also present a new Python-based tool (SeqDivA) with a friendly graphical user interface (GUI) for delimiting the twilight zone by using several similar criteria.
Collapse
Affiliation(s)
- Guillermin Agüero-Chapin
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Deborah Galpert
- Departamento de Ciencia de la Computación. Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Reinaldo Molina-Ruiz
- Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Evys Ancede-Gallardo
- Programa de Doctorado en Fisicoquímica Molecular, Facultad de Ciencias Exactas, Universidad Andrés Bello, Av. República 239, Santiago 8370146, Chile;
| | - Gisselle Pérez-Machado
- EpiDisease S.L. Spin-Off of Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), 46980 Valencia, Spain;
| | - Gustavo A. De la Riva
- Laboratorio de Biotecnología Aplicada S. de R.L. de C.V., GRECA Inc., Carretera La Piedad-Carapán, km 3.5, La Piedad, Michoacán 59300, Mexico;
- Tecnológico Nacional de México, Instituto Tecnológico de la Piedad, Av. Ricardo Guzmán Romero, Santa Fe, La Piedad de Cavadas, Michoacán 59370, Mexico
| | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| |
Collapse
|
2
|
Bielińska-Wąż D, Wąż P. Spectral-dynamic representation of DNA sequences. J Biomed Inform 2017; 72:1-7. [PMID: 28587890 DOI: 10.1016/j.jbi.2017.06.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2017] [Revised: 05/03/2017] [Accepted: 06/01/2017] [Indexed: 11/25/2022]
Abstract
A graphical representation of DNA sequences in which the distribution of a particular base B=A,C,G,T is represented by a set of discrete lines has been formulated. The methodology of this approach has been borrowed from two areas of physics: spectroscopy and dynamics. Consequently, the set of discrete lines is referred to as the B-spectrum. Next, the B-spectrum is transformed to a rigid body composed of material points. In this way a dynamic representation of the DNA sequence has been obtained. The centers of mass of these rigid bodies, divided by their moments of inertia, have been taken as the descriptors of the spectra and, thus, of the DNA sequences. The performance of this method on a standard set of data commonly applied by authors introducing new approaches to bioinformatics (the first exons of β-globin genes of different species) proved to be very good.
Collapse
Affiliation(s)
- Dorota Bielińska-Wąż
- Department of Radiological Informatics and Statistics, Medical University of Gdańsk, Tuwima 15, 80-210 Gdańsk, Poland.
| | - Piotr Wąż
- Department of Nuclear Medicine, Medical University of Gdańsk, Tuwima 15, 80-210 Gdańsk, Poland.
| |
Collapse
|
3
|
Nandy A, Basak SC. A Brief Review of Computer-Assisted Approaches to Rational Design of Peptide Vaccines. Int J Mol Sci 2016; 17:E666. [PMID: 27153063 PMCID: PMC4881492 DOI: 10.3390/ijms17050666] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Revised: 04/25/2016] [Accepted: 04/27/2016] [Indexed: 11/18/2022] Open
Abstract
The growing incidences of new viral diseases and increasingly frequent viral epidemics have strained therapeutic and preventive measures; the high mutability of viral genes puts additional strains on developmental efforts. Given the high cost and time requirements for new drugs development, vaccines remain as a viable alternative, but there too traditional techniques of live-attenuated or inactivated vaccines have the danger of allergenic reactions and others. Peptide vaccines have, over the last several years, begun to be looked on as more appropriate alternatives, which are economically affordable, require less time for development and hold the promise of multi-valent dosages. The developments in bioinformatics, proteomics, immunogenomics, structural biology and other sciences have spurred the growth of vaccinomics where computer assisted approaches serve to identify suitable peptide targets for eventual development of vaccines. In this mini-review we give a brief overview of some of the recent trends in computer assisted vaccine development with emphasis on the primary selection procedures of probable peptide candidates for vaccine development.
Collapse
Affiliation(s)
- Ashesh Nandy
- Centre for Interdisciplinary Research and Education, Jodhpur Park, Kolkata 700068, India.
| | - Subhash C Basak
- Natural Resources Research Institute and Department of Chemistry & Biochemistry, University of Minnesota Duluth, Duluth, MN 55811, USA.
| |
Collapse
|
4
|
20D-dynamic representation of protein sequences. Genomics 2016; 107:16-23. [DOI: 10.1016/j.ygeno.2015.12.003] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2015] [Revised: 12/10/2015] [Accepted: 12/14/2015] [Indexed: 11/23/2022]
|
5
|
Sarkar T, Das S, De A, Nandy P, Chattopadhyay S, Chawla-Sarkar M, Nandy A. H7N9 influenza outbreak in China 2013: In silico analyses of conserved segments of the hemagglutinin as a basis for the selection of peptide vaccine targets. Comput Biol Chem 2015; 59 Pt A:8-15. [PMID: 26364271 DOI: 10.1016/j.compbiolchem.2015.08.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2014] [Revised: 08/03/2015] [Accepted: 08/04/2015] [Indexed: 12/09/2022]
Abstract
The sudden emergence of a human infecting strain of H7N9 influenza virus in China in 2013 leading to fatalities in about 30% of the cases has caused wide concern that additional mutations in the strain leading to human to human transmission could lead to a deadly pandemic. It may happen in a short time span as the outbreak of H7N9 is more and more recurrent, which implies that H7N9 evolution is speeding up. H7N9 flu strains were not known to infect humans before this attack in China in February 2013 and it was solely an avian strain. While currently available drugs such as oseltamivir have been found to be largely effective against the H7N9, albeit with recent reported cases of development of resistance to the drug, there is a necessity to identify alternatives to combat this disease, especially if it assumes pandemic proportions. In our work, we have tried to investigate for the genetic changes in hemagglutinin (HA) protein sequence that lead to human infection by an avian infecting virus and identify possible peptide targets to design vaccines to control this upcoming risk. We identified three highly conserved regions in all H7 subtypes, of which one particular immunogenic surface exposed region was found to be well conserved in all human infecting H7N9 strains (accessed up to 27th March 2014). Compared to H7N9 avian strains, we identified two mutations in this conserved region at the receptor binding site of all post-February 2013 human-infecting H7N9China hemagglutinin protein sequences. One of the mutations is very close (3.6 Å) to the hemagglutinin sialic acid binding pocket that may lead to better binding to human host's sialic acid due to the changes in hydrophobicity of the microenvironment of the binding site. We found that the peptide region with these mutational changes that are specific for human infecting H7N9 virus possess the possibility of being used as target for a peptide vaccine.
Collapse
Affiliation(s)
- Tapati Sarkar
- Physics Department, Jadavpur University, Kolkata 700032, India.
| | - Sukhen Das
- Physics Department, Jadavpur University, Kolkata 700032, India
| | - Antara De
- Centre for Interdisciplinary Research and Education, 404B Jodhpur Park, Kolkata 700068, India
| | - Papiya Nandy
- Centre for Interdisciplinary Research and Education, 404B Jodhpur Park, Kolkata 700068, India
| | - Shiladitya Chattopadhyay
- Division of Virology, National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| | - Mamta Chawla-Sarkar
- Division of Virology, National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| | - Ashesh Nandy
- Centre for Interdisciplinary Research and Education, 404B Jodhpur Park, Kolkata 700068, India
| |
Collapse
|
6
|
Nandy A. The GRANCH Techniques for Analysis of DNA, RNA and Protein Sequences. ADVANCES IN MATHEMATICAL CHEMISTRY AND APPLICATIONS 2015. [PMCID: PMC7151884 DOI: 10.1016/b978-1-68108-053-6.50005-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
The very rapid growth in molecular sequence data from the daily accretion of large gene and protein sequencing projects have led to issues regarding viewing and analyzing the massive amounts of data. Graphical representation and numerical characterization of DNA, RNA and protein sequences have exhibited great potential to address these concerns. We review here in brief several different formulations of these representations and examples of applications to diverse problems based on what this author had presented at the Second Mathematical Chemistry Workshop of the Americas in Bogota, Colombia in 2010. In particular, we note several insights that were gained from such representations, and the applications to the bio-medicinal field.
Collapse
Affiliation(s)
- Ashesh Nandy
- Centre for Interdisciplinary Research and Education, 404B Jodhpur Park, Kolkata 700068, India; Tel: + 91 33 2473 0577;
| |
Collapse
|
7
|
Non-standard similarity/dissimilarity analysis of DNA sequences. Genomics 2014; 104:464-71. [DOI: 10.1016/j.ygeno.2014.08.010] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2014] [Accepted: 08/19/2014] [Indexed: 11/24/2022]
|
8
|
Yao Y, Yan S, Han J, Dai Q, He PA. A novel descriptor of protein sequences and its application. J Theor Biol 2014; 347:109-17. [PMID: 24412564 DOI: 10.1016/j.jtbi.2014.01.001] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2013] [Revised: 11/10/2013] [Accepted: 01/01/2014] [Indexed: 02/05/2023]
Abstract
In this paper, a dynamic 3-D graphical representation of protein sequences is introduced based on three physical-chemical properties of amino acids. The coordinates of the graph have direct biological significance, which could reflect the innate structure of the proteins. The information of principal moments of inertia and range of axis coordinate are extracted as a novel mixed descriptor and proposed for the comparison of protein primary sequences. Meanwhile, the Euclidean distance of the normalized descriptor vectors which avoid the influence of the difference in length of protein sequences under consideration is employed as a quantitative measurement of the similarity of proteins. Finally, we take the nine ND5 (NADH dehydrogenase subunit 5) proteins for example and illustrate the effectiveness of our approach.
Collapse
Affiliation(s)
- Yuhua Yao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China.
| | - Shoujiang Yan
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Jianning Han
- College of Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Qi Dai
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Ping-an He
- College of Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| |
Collapse
|
9
|
Exploring the adenylation domain repertoire of nonribosomal peptide synthetases using an ensemble of sequence-search methods. PLoS One 2013; 8:e65926. [PMID: 23874386 PMCID: PMC3712989 DOI: 10.1371/journal.pone.0065926] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2012] [Accepted: 05/01/2013] [Indexed: 11/24/2022] Open
Abstract
The introduction of two-dimension (2D) graphs and their numerical characterization for comparative analyses of DNA/RNA and protein sequences without the need of sequence alignments is an active yet recent research topic in bioinformatics. Here, we used a 2D artificial representation (four-color maps) with a simple numerical characterization through topological indices (TIs) to aid the discovering of remote homologous of Adenylation domains (A-domains) from the Nonribosomal Peptide Synthetases (NRPS) class in the proteome of the cyanobacteria Microcystis aeruginosa. Cyanobacteria are a rich source of structurally diverse oligopeptides that are predominantly synthesized by NPRS. Several A-domains share amino acid identities lower than 20 % being a possible source of remote homologous. Therefore, A-domains cannot be easily retrieved by BLASTp searches using a single template. To cope with the sequence diversity of the A-domains we have combined homology-search methods with an alignment-free tool that uses protein four-color-maps. TI2BioP (Topological Indices toBioPolymers) version 2.0, available at http://ti2biop.sourceforge.net/ allowed the calculation of simple TIs from the protein sequences (four-color maps). Such TIs were used as input predictors for the statistical estimations required to build the alignment-free models. We concluded that the use of graphical/numerical approaches in cooperation with other sequence search methods, like multi-templates BLASTp and profile HMM, can give the most complete exploration of the repertoire of highly diverse protein families.
Collapse
|
10
|
Ghosh A, Chattopadhyay S, Chawla-Sarkar M, Nandy P, Nandy A. In silico study of rotavirus VP7 surface accessible conserved regions for antiviral drug/vaccine design. PLoS One 2012; 7:e40749. [PMID: 22844409 PMCID: PMC3406019 DOI: 10.1371/journal.pone.0040749] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2012] [Accepted: 06/12/2012] [Indexed: 11/23/2022] Open
Abstract
Background Rotaviral diarrhoea kills about half a million children annually in developing countries and accounts for one third of diarrhea related hospitalizations. Drugs and vaccines against the rotavirus are handicapped, as in all viral diseases, by the rapid mutational changes that take place in the DNA and protein sequences rendering most of these ineffective. As of now only two vaccines are licensed and approved by the WHO (World Health Organization), but display reduced efficiencies in the underdeveloped countries where the disease is more prevalent. We approached this issue by trying to identify regions of surface exposed conserved segments on the surface glycoproteins of the virion, which may then be targeted by specific peptide vaccines. We had developed a bioinformatics protocol for these kinds of problems with reference to the influenza neuraminidase protein, which we have refined and expanded to analyze the rotavirus issue. Results Our analysis of 433 VP7 (Viral Protein 7 from rotavirus) surface protein sequences across 17 subtypes encompassing mammalian hosts using a 20D Graphical Representation and Numerical Characterization method, identified four possible highly conserved peptide segments. Solvent accessibility prediction servers were used to identify that these are predominantly surface situated. These regions analyzed through selected epitope prediction servers for their epitopic properties towards possible T-cell and B-cell activation showed good results as epitopic candidates (only dry lab confirmation). Conclusions The main reasons for the development of alternative vaccine strategies for the rotavirus are the failure of current vaccines and high production costs that inhibit their application in developing countries. We expect that it would be possible to use the protein surface exposed regions identified in our study as targets for peptide vaccines and drug designs for stable immunity against divergent strains of the rotavirus. Though this study is fully dependent on computational prediction algorithms, it provides a platform for wet lab experiments.
Collapse
Affiliation(s)
- Ambarnil Ghosh
- Physics Department, Jadavpur University, Kolkata, West Bengal, India
| | - Shiladitya Chattopadhyay
- Division of Virology, National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| | - Mamta Chawla-Sarkar
- Division of Virology, National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| | - Papiya Nandy
- Physics Department, Jadavpur University, Kolkata, West Bengal, India
| | - Ashesh Nandy
- Centre for Interdisciplinary Research and Education, Kolkata, West Bengal, India
- * E-mail:
| |
Collapse
|
11
|
He PA, Li D, Zhang Y, Wang X, Yao Y. A 3D graphical representation of protein sequences based on the Gray code. J Theor Biol 2012; 304:81-7. [PMID: 22554947 DOI: 10.1016/j.jtbi.2012.03.023] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2011] [Revised: 02/17/2012] [Accepted: 03/17/2012] [Indexed: 11/18/2022]
Abstract
Based on the order of 6-bit binary Gray code, a cyclic order of 20 amino acids is introduced. A novel 3D graphical representation of protein sequences is proposed according to the CGR of DNA sequences. Furthermore, the mathematical descriptor is suggested to characterize the graphical representation curve. The efficiency of our approach can be illustrated by performing the comparison of similarities/dissimilarities among sequences of the ND5 proteins of nine different species. With the correlation and significance analysis, the comparisons of both our results and results of other graphical representation with the ClustalW's results can show the utility of our approach.
Collapse
Affiliation(s)
- Ping-an He
- College of Science, Zhejiang Sci-Tech University, Hangzhou 310018, PR China.
| | | | | | | | | |
Collapse
|
12
|
Disease embryo development network reveals the relationship between disease genes and embryo development genes. J Theor Biol 2011; 287:100-8. [PMID: 21824480 PMCID: PMC7094120 DOI: 10.1016/j.jtbi.2011.07.018] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2011] [Revised: 06/15/2011] [Accepted: 07/22/2011] [Indexed: 11/20/2022]
Abstract
A basic problem for contemporary biology and medicine is exploring the correlation between human disease and underlying cellular mechanisms. For a long time, several efforts were made to reveal the similarity between embryo development and disease process, but few from the system level. In this article, we used the human protein-protein interactions (PPIs), disease genes with their classifications and embryo development genes and reconstructed a human disease-embryo development network to investigate the relationship between disease genes and embryo development genes. We found that disease genes and embryo development genes are prone to connect with each other. Furthermore, diseases can be categorized into three groups according to the closeness with embryo development in gene overlapping, interacting pattern in PPI network and co-regulated by microRNAs or transcription factors. Embryo development high-related disease genes show their closeness with embryo development at least in three biological levels. But it is not for embryo development medium-related disease genes and embryo development low-related disease genes. We also found that embryo development high-related disease genes are more central than other disease genes in the human PPI network. In addition, the results show that embryo development high-related disease genes tend to be essential genes compared with other diseases' genes. This network-based approach could provide evidence for the intricate correlation between disease process and embryo development, and help to uncover potential mechanisms of human complex diseases.
Collapse
|
13
|
Zhou GP. The disposition of the LZCC protein residues in wenxiang diagram provides new insights into the protein-protein interaction mechanism. J Theor Biol 2011; 284:142-8. [PMID: 21718705 PMCID: PMC7094099 DOI: 10.1016/j.jtbi.2011.06.006] [Citation(s) in RCA: 128] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2011] [Revised: 04/28/2011] [Accepted: 06/07/2011] [Indexed: 01/06/2023]
Abstract
Wenxiang diagram is a new two-dimensional representation that characterizes the disposition of hydrophobic and hydrophilic residues in α-helices. In this research, the hydrophobic and hydrophilic residues of two leucine zipper coiled-coil (LZCC) structural proteins, cGKIα(1-59) and MBS(CT35) are dispositioned on the wenxiang diagrams according to heptad repeat pattern (abcdefg)(n), respectively. Their wenxiang diagrams clearly demonstrate that the residues with same repeat letters are laid on same side of the spiral diagrams, where most hydrophobic residues are positioned at a and d, and most hydrophilic residues are localized on b, c, e, f and g polar position regions. The wenxiang diagrams of a dimetric LZCC can be represented by the combination of two monomeric wenxiang diagrams, and the wenxiang diagrams of the two LZCC (tetramer) complex structures can also be assembled by using two pairs of their wenxiang diagrams. Furthermore, by comparing the wenxiang diagrams of cGKIα(1-59) and MBS(CT35), the interaction between cGKIα(1-59) and MBS(CT35) is suggested to be weaker. By analyzing the wenxiang diagram of the cGKIα(1-59.)·MBS(CT42) complex structure, most affected residues of cGKIα(1-59) by the interaction with MBS(CT42) are proposed at positions d, a, e and g of the LZCC structure. These findings are consistent with our previous NMR results. Incorporating NMR spectroscopy, the wenxiang diagrams of LZCC structures may provide novel insights into the interaction mechanisms between dimeric, trimeric, tetrameric coiled-coil structures.
Collapse
Affiliation(s)
- Guo-Ping Zhou
- Gordon Life Science Institute, 13784 Torrey Del Mar Drive, San Diego, CA 92130, USA.
| |
Collapse
|
14
|
González-Díaz H, Muíño L, Anadón AM, Romaris F, Prado-Prado FJ, Munteanu CR, Dorado J, Sierra AP, Mezo M, González-Warleta M, Gárate T, Ubeira FM. MISS-Prot: web server for self/non-self discrimination of protein residue networks in parasites; theory and experiments in Fasciola peptides and Anisakis allergens. MOLECULAR BIOSYSTEMS 2011; 7:1938-55. [PMID: 21468430 DOI: 10.1039/c1mb05069a] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Infections caused by human parasites (HPs) affect the poorest 500 million people worldwide but chemotherapy has become expensive, toxic, and/or less effective due to drug resistance. On the other hand, many 3D structures in Protein Data Bank (PDB) remain without function annotation. We need theoretical models to quickly predict biologically relevant Parasite Self Proteins (PSP), which are expressed differentially in a given parasite and are dissimilar to proteins expressed in other parasites and have a high probability to become new vaccines (unique sequence) or drug targets (unique 3D structure). We present herein a model for PSPs in eight different HPs (Ascaris, Entamoeba, Fasciola, Giardia, Leishmania, Plasmodium, Trypanosoma, and Toxoplasma) with 90% accuracy for 15 341 training and validation cases. The model combines protein residue networks, Markov Chain Models (MCM) and Artificial Neural Networks (ANN). The input parameters are the spectral moments of the Markov transition matrix for electrostatic interactions associated with the protein residue complex network calculated with the MARCH-INSIDE software. We implemented this model in a new web-server called MISS-Prot (MARCH-INSIDE Scores for Self-Proteins). MISS-Prot was programmed using PHP/HTML/Python and MARCH-INSIDE routines and is freely available at: . This server is easy to use by non-experts in Bioinformatics who can carry out automatic online upload and prediction with 3D structures deposited at PDB (mode 1). We can also study outcomes of Peptide Mass Fingerprinting (PMFs) and MS/MS for query proteins with unknown 3D structures (mode 2). We illustrated the use of MISS-Prot in experimental and/or theoretical studies of peptides from Fasciola hepatica cathepsin proteases or present on 10 Anisakis simplex allergens (Ani s 1 to Ani s 10). In doing so, we combined electrophoresis (1DE), MALDI-TOF Mass Spectroscopy, and MASCOT to seek sequences, Molecular Mechanics + Molecular Dynamics (MM/MD) to generate 3D structures and MISS-Prot to predict PSP scores. MISS-Prot also allows the prediction of PSP proteins in 16 additional species including parasite hosts, fungi pathogens, disease transmission vectors, and biotechnologically relevant organisms.
Collapse
Affiliation(s)
- Humberto González-Díaz
- Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782, Santiago de Compostela, Spain.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Ghosh A, Nandy A. Graphical representation and mathematical characterization of protein sequences and applications to viral proteins. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2011; 83:1-42. [PMID: 21570664 PMCID: PMC7150266 DOI: 10.1016/b978-0-12-381262-9.00001-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Graphical representation and numerical characterization (GRANCH) of nucleotide and protein sequences is a new field that is showing a lot of promise in analysis of such sequences. While formulation and applications of GRANCH techniques for DNA/RNA sequences started just over a decade ago, analyses of protein sequences by these techniques are of more recent origin. The emphasis is still on developing the underlying technique, but significant results have been achieved in using these methods for protein phylogeny, mass spectral data of proteins and protein serum profiles in parasites, toxicoproteomics, determination of different indices for use in QSAR studies, among others. We briefly mention these in this chapter, with some details on protein phylogeny and viral diseases. In particular, we cover a systematic method developed in GRANCH to determine conserved surface exposed peptide segments in selected viral proteins that can be used for drug and vaccine targeting. The new GRANCH techniques and applications for DNAs and proteins are covered briefly to provide an overview to this nascent field.
Collapse
Affiliation(s)
- Ambarnil Ghosh
- Physics Department, Jadavpur University, Jadavpur, Kolkata, India
| | | |
Collapse
|
16
|
Analysis of protein pathway networks using hybrid properties. Molecules 2010; 15:8177-92. [PMID: 21076385 PMCID: PMC6259184 DOI: 10.3390/molecules15118177] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2010] [Revised: 11/11/2010] [Accepted: 11/12/2010] [Indexed: 12/20/2022] Open
Abstract
Given a protein-forming system, i.e., a system consisting of certain number of different proteins, can it form a biologically meaningful pathway? This is a fundamental problem in systems biology and proteomics. During the past decade, a vast amount of information on different organisms, at both the genetic and metabolic levels, has been accumulated and systematically stored in various specific databases, such as KEGG, ENZYME, BRENDA, EcoCyc and MetaCyc. These data have made it feasible to address such an essential problem. In this paper, we have analyzed known regulatory pathways in humans by extracting different (biological and graphic) features from each of the 17,069 protein-formed systems, of which 169 are positive pathways, i.e., known regulatory pathways taken from KEGG; while 16,900 were negative, i.e., not formed as a biologically meaningful pathway. Each of these protein-forming systems was represented by 352 features, of which 88 are graph features and 264 biological features. To analyze these features, the "Minimum Redundancy Maximum Relevance" and the "Incremental Feature Selection" techniques were utilized to select a set of 22 optimal features to query whether a protein-forming system is able to form a biologically meaningful pathway or not. It was found through cross-validation that the overall success rate thus obtained in identifying the positive pathways was 79.88%. It is anticipated that, this novel approach and encouraging result, although preliminary yet, may stimulate extensive investigations into this important topic.
Collapse
|
17
|
Xie G, Mo Z. Three 3D graphical representations of DNA primary sequences based on the classifications of DNA bases and their applications. J Theor Biol 2010; 269:123-30. [PMID: 20969878 PMCID: PMC7126940 DOI: 10.1016/j.jtbi.2010.10.018] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2010] [Revised: 09/15/2010] [Accepted: 10/09/2010] [Indexed: 11/01/2022]
Abstract
In this article, we introduce three 3D graphical representations of DNA primary sequences, which we call RY-curve, MK-curve and SW-curve, based on three classifications of the DNA bases. The advantages of our representations are that (i) these 3D curves are strictly non-degenerate and there is no loss of information when transferring a DNA sequence to its mathematical representation and (ii) the coordinates of every node on these 3D curves have clear biological implication. Two applications of these 3D curves are presented: (a) a simple formula is derived to calculate the content of the four bases (A, G, C and T) from the coordinates of nodes on the curves; and (b) a 12-component characteristic vector is constructed to compare similarity among DNA sequences from different species based on the geometrical centers of the 3D curves. As examples, we examine similarity among the coding sequences of the first exon of beta-globin gene from eleven species and validate similarity of cDNA sequences of beta-globin gene from eight species.
Collapse
Affiliation(s)
- Guosen Xie
- School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China
| | | |
Collapse
|
18
|
Stan C, Cristescu CP, Scarlat EI. Similarity analysis for DNA sequences based on chaos game representation. Case study: the albumin. J Theor Biol 2010; 267:513-8. [PMID: 20869369 DOI: 10.1016/j.jtbi.2010.09.027] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2010] [Revised: 09/16/2010] [Accepted: 09/17/2010] [Indexed: 11/16/2022]
Abstract
Using chaos game representation we introduce a novel and straightforward method for identifying similarities/dissimilarities between DNA sequences of the same type, from different organisms. A matrix is associated to each CGR pattern and the similarities result from the comparison between the matrices of the sequences of interest. Three different methods of analysis of the resulting difference matrix are considered: a 3-dimensional representation giving both local and global information, a numerical characterization by defining an n-letter word similarity measure and a statistical evaluation. The method is illustrated by implementation to the study of albumin nucleotides sequences from eight mammal species taking as reference the human albumin.
Collapse
Affiliation(s)
- Cristina Stan
- Department of Physics I, Faculty of Applied Sciences, Politehnica University of Bucharest, 313 Splaiul Independentei, RO-060042, Bucharest, Romania.
| | | | | |
Collapse
|
19
|
García-Remesal M, Cuevas A, López-Alonso V, López-Campos G, de la Calle G, de la Iglesia D, Pérez-Rey D, Crespo J, Martín-Sánchez F, Maojo V. A method for automatically extracting infectious disease-related primers and probes from the literature. BMC Bioinformatics 2010; 11:410. [PMID: 20682041 PMCID: PMC2923139 DOI: 10.1186/1471-2105-11-410] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2010] [Accepted: 08/03/2010] [Indexed: 11/21/2022] Open
Abstract
Background Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirically validated primer and probe sequences. Therefore, it is becoming increasingly important for researchers to navigate this important information. In this paper, we present a four-phase method for extracting and annotating primer/probe sequences from the literature. These phases are: (1) convert each document into a tree of paper sections, (2) detect the candidate sequences using a set of finite state machine-based recognizers, (3) refine problem sequences using a rule-based expert system, and (4) annotate the extracted sequences with their related organism/gene information. Results We tested our approach using a test set composed of 297 manuscripts. The extracted sequences and their organism/gene annotations were manually evaluated by a panel of molecular biologists. The results of the evaluation show that our approach is suitable for automatically extracting DNA sequences, achieving precision/recall rates of 97.98% and 95.77%, respectively. In addition, 76.66% of the detected sequences were correctly annotated with their organism name. The system also provided correct gene-related information for 46.18% of the sequences assigned a correct organism name. Conclusions We believe that the proposed method can facilitate routine tasks for biomedical researchers using molecular methods to diagnose and prescribe different infectious diseases. In addition, the proposed method can be expanded to detect and extract other biological sequences from the literature. The extracted information can also be used to readily update available primer/probe databases or to create new databases from scratch.
Collapse
Affiliation(s)
- Miguel García-Remesal
- Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Madrid, Spain.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Ghosh A, Nandy A, Nandy P. Computational analysis and determination of a highly conserved surface exposed segment in H5N1 avian flu and H1N1 swine flu neuraminidase. BMC STRUCTURAL BIOLOGY 2010; 10:6. [PMID: 20170556 PMCID: PMC2836360 DOI: 10.1186/1472-6807-10-6] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/07/2009] [Accepted: 02/22/2010] [Indexed: 01/30/2023]
Abstract
Background Catalytic activity of influenza neuraminidase (NA) facilitates elution of progeny virions from infected cells and prevents their self-aggregation mediated by the catalytic site located in the body region. Research on the active site of the molecule has led to development of effective inhibitors like oseltamivir, zanamivir etc, but the high rate of mutation and interspecies reassortment in viral sequences and the recent reports of oseltamivir resistant strains underlines the importance of determining additional target sites for developing future antiviral compounds. In a recent computational study of 173 H5N1 NA gene sequences we had identified a 50-base highly conserved region in 3'-terminal end of the NA gene. Results We extend the graphical and numerical analyses to a larger number of H5N1 NA sequences (514) and H1N1 swine flu sequences (425) accessed from GenBank. We use a 2D graphical representation model for the gene sequences and a Graphical Sliding Window Method (GSWM) for protein sequences scanning the sequences as a block of 16 amino acids at a time. Using a protein sequence descriptor defined in our model, the protein sliding scan method allowed us to compare the different strains for block level variability, which showed significant statistical correlation to average solvent accessibility of the residue blocks; single amino acid position variability results in no correlation, indicating the impact of stretch variability in chemical environment. Close to the C-terminal end the GSWM showed less descriptor-variability with increased average solvent accessibility (ASA) that is also supported by conserved predicted secondary structure of 3' terminal RNA and visual evidence from 3D crystallographic structure. Conclusion The identified terminal segment, strongly conserved in both RNA and protein sequences, is especially significant as it is surface exposed and structural chemistry reveals the probable role of this stretch in tetrameric stabilization. It could also participate in other biological processes associated with conserved surface residues. A RNA double hairpin secondary structure found in this segment in a majority of the H5N1 strains also supports this observation. In this paper we propose this conserved region as a probable site for designing inhibitors for broad-spectrum pandemic control of flu viruses with similar NA structure.
Collapse
Affiliation(s)
- Ambarnil Ghosh
- Physics Department, Jadavpur University, Kolkata, India.
| | | | | |
Collapse
|