1
|
Nandy A. Mapping Biomolecular Sequences: Graphical Representations - their Origins, Applications and Future Prospects. Comb Chem High Throughput Screen 2021; 25:354-364. [PMID: 33970841 DOI: 10.2174/1386207324666210510164743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 01/25/2021] [Accepted: 02/11/2021] [Indexed: 11/22/2022]
Abstract
The exponential growth in the depositories of biological sequence data have generated an urgent need to store, retrieve and analyse the data efficiently and effectively for which the standard practice of using alignment procedures are not adequate due to high demand on computing resources and time. Graphical representation of sequences has become one of the most popular alignment-free strategies to analyse the biological sequences where each basic unit of the sequences - the bases adenine, cytosine, guanine and thymine for DNA/RNA, and the 20 amino acids for proteins - are plotted on a multi-dimensional grid. The resulting curve in 2D and 3D space and the implied graph in higher dimensions provide a perception of the underlying information of the sequences through visual inspection; numerical analyses, in geometrical or matrix terms, of the plots provide a measure of comparison between sequences and thus enable study of sequence hierarchies. The new approach has also enabled studies of comparisons of DNA sequences over many thousands of bases and provided new insights into the structure of the base compositions of DNA sequences In this article we review in brief the origins and applications of graphical representations and highlight the future perspectives in this field.
Collapse
Affiliation(s)
- Ashesh Nandy
- Centre for Interdisciplinary Research and Education, Kolkata 700068, India
| |
Collapse
|
2
|
Balasubramanian K, Gupta SP. Quantum Molecular Dynamics, Topological, Group Theoretical and Graph Theoretical Studies of Protein-Protein Interactions. Curr Top Med Chem 2019; 19:426-443. [PMID: 30836919 DOI: 10.2174/1568026619666190304152704] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2018] [Revised: 11/08/2018] [Accepted: 11/28/2018] [Indexed: 12/21/2022]
Abstract
BACKGROUND Protein-protein interactions (PPIs) are becoming increasingly important as PPIs form the basis of multiple aggregation-related diseases such as cancer, Creutzfeldt-Jakob, and Alzheimer's diseases. This mini-review presents hybrid quantum molecular dynamics, quantum chemical, topological, group theoretical, graph theoretical, and docking studies of PPIs. We also show how these theoretical studies facilitate the discovery of some PPI inhibitors of therapeutic importance. OBJECTIVE The objective of this review is to present hybrid quantum molecular dynamics, quantum chemical, topological, group theoretical, graph theoretical, and docking studies of PPIs. We also show how these theoretical studies enable the discovery of some PPI inhibitors of therapeutic importance. METHODS This article presents a detailed survey of hybrid quantum dynamics that combines classical and quantum MD for PPIs. The article also surveys various developments pertinent to topological, graph theoretical, group theoretical and docking studies of PPIs and highlight how the methods facilitate the discovery of some PPI inhibitors of therapeutic importance. RESULTS It is shown that it is important to include higher-level quantum chemical computations for accurate computations of free energies and electrostatics of PPIs and Drugs with PPIs, and thus techniques that combine classical MD tools with quantum MD are preferred choices. Topological, graph theoretical and group theoretical techniques are shown to be important in studying large network of PPIs comprised of over 100,000 proteins where quantum chemical and other techniques are not feasible. Hence, multiple techniques are needed for PPIs. CONCLUSION Drug discovery and our understanding of complex PPIs require multifaceted techniques that involve several disciplines such as quantum chemistry, topology, graph theory, knot theory and group theory, thus demonstrating a compelling need for a multi-disciplinary approach to the problem.
Collapse
Affiliation(s)
- Krishnan Balasubramanian
- School of Molecular Sciences, Arizona State University, Tempe, Arizona, AZ 85287-1604, United States
| | - Satya P Gupta
- Department of Pharmaceutical Technology, Meerut Institute of Engineering Technology, Meerut-250002, India
| |
Collapse
|
3
|
Abstract
Global measurement of proteins and their many attributes in tissues and biofluids defines the field of proteomics. Toxicoproteomics, as part of the larger field of toxicogenomics, seeks to identify critical proteins and pathways in biological systems that are affected by and respond to adverse chemical and environmental exposures using global protein expression technologies. Toxicoproteomics integrates 3 disciplinary areas: traditional toxicology and pathology, differential protein and gene expression analysis, and systems biology. Key topics to be reviewed are the evolution of proteomics, proteomic technology platforms and their capabilities with exemplary studies from biology and medicine, a review of over 50 recent studies applying proteomic analysis to toxicological research, and the recent development of databases designed to integrate -Omics technologies with toxicology and pathology. Proteomics is examined for its potential in discovery of new biomarkers and toxicity signatures, in mapping serum, plasma, and other biofluid proteomes, and in parallel proteomic and transcriptomic studies. The new field of toxicoproteomics is uniquely positioned toward an expanded understanding of protein expression during toxicity and environmental disease for the advancement of public health.
Collapse
Affiliation(s)
- Barbara A Wetmore
- National Center for Toxicogenomics, National Institute of Environmental Health Sciences, Research Triangle Park, North Caroline 27709, USA
| | | |
Collapse
|
4
|
Zhang YP, Sheng YJ, Zheng W, He PA, Ruan JS. Novel numerical characterization of protein sequences based on individual amino acid and its application. Biomed Res Int 2015; 2015:909567. [PMID: 25705698 DOI: 10.1155/2015/909567] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2014] [Revised: 12/18/2014] [Accepted: 01/12/2015] [Indexed: 11/22/2022]
Abstract
The hydrophobicity and hydrophilicity of amino acids play a very important role in protein folding and its interaction with the environment and other molecules, as well as its catalytic mechanism. Based on the two physicochemical indexes, a 2D graphical representation of protein sequences is introduced; meanwhile, a new numerical characteristic has been proposed to compute the distance of different sequences for analysis of sequence similarity/dissimilarity on the basis of this graphical representation. Furthermore, we apply the new distance in the similarities/dissimilarities of ND5 proteins of nine species and predict the four major classes based on the dataset containing 639 domains. The results show that the method is simple and effective.
Collapse
|
5
|
Abstract
A new graphical description of the primary structure of protein sequences is introduced. First, a three-dimensional space discrete point set of a protein sequence is created based on the three main physicochemical properties of the amino acids. Secondly, a continuous cubic B-spline curve interpolating the amino acid points is constructed to represent the shape of the protein sequence. Then the geometric properties (curvature and torsion) of the continuous curve are extracted for the purpose of analyzing the similarity between protein sequences. Finally, an improved Canberra distance comparison is introduced for the similarity analysis of protein sequences with different lengths. Experimental results show that our method is effective for the similarity comparison of protein sequences.
Collapse
Affiliation(s)
- S C Xu
- a College of Science , Zhejiang Sci-Tech University , Hangzhou , China
| | | | | | | |
Collapse
|
6
|
|
7
|
Affiliation(s)
- Milan Randić
- National Institute of Chemistry, P.O. Box 3430, 1001 Ljubljana, Slovenia; NMR Center, Ruđer Bošković Institute, P.O. Box 180, HR-10002 Zagreb, Croatia; and Texas A&M University at Galveston, Galveston, Texas 77553
| | - Jure Zupan
- National Institute of Chemistry, P.O. Box 3430, 1001 Ljubljana, Slovenia; NMR Center, Ruđer Bošković Institute, P.O. Box 180, HR-10002 Zagreb, Croatia; and Texas A&M University at Galveston, Galveston, Texas 77553
| | - Alexandru T. Balaban
- National Institute of Chemistry, P.O. Box 3430, 1001 Ljubljana, Slovenia; NMR Center, Ruđer Bošković Institute, P.O. Box 180, HR-10002 Zagreb, Croatia; and Texas A&M University at Galveston, Galveston, Texas 77553
| | - Dražen Vikić-Topić
- National Institute of Chemistry, P.O. Box 3430, 1001 Ljubljana, Slovenia; NMR Center, Ruđer Bošković Institute, P.O. Box 180, HR-10002 Zagreb, Croatia; and Texas A&M University at Galveston, Galveston, Texas 77553
| | - Dejan Plavšić
- National Institute of Chemistry, P.O. Box 3430, 1001 Ljubljana, Slovenia; NMR Center, Ruđer Bošković Institute, P.O. Box 180, HR-10002 Zagreb, Croatia; and Texas A&M University at Galveston, Galveston, Texas 77553
| |
Collapse
|
8
|
|
9
|
|
10
|
|
11
|
Cruz-monteagudo M, Munteanu CR, Borges F, Cordeiro MND, Uriarte E, González-díaz H. Quantitative Proteome–Property Relationships (QPPRs). Part 1: Finding biomarkers of organic drugs with mean Markov connectivity indices of spiral networks of blood mass spectra. Bioorg Med Chem 2008; 16:9684-93. [DOI: 10.1016/j.bmc.2008.10.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2008] [Revised: 09/29/2008] [Accepted: 10/02/2008] [Indexed: 11/22/2022]
|
12
|
Affiliation(s)
- Ramón García-Domenech
- Unidad de Investigación de Diseño de Farmacos y Conectividad Molecular, Departamento de Química Fisica, Facultad de Farmacía, Universitat de València, 46100 Burjassot, València, Spain
| | | | | | | |
Collapse
|
13
|
|
14
|
Fernández M, Fernández L, Abreu JI, Garriga M. Classification of voltage-gated K(+) ion channels from 3D pseudo-folding graph representation of protein sequences using genetic algorithm-optimized support vector machines. J Mol Graph Model 2008; 26:1306-14. [PMID: 18289899 DOI: 10.1016/j.jmgm.2008.01.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2007] [Revised: 01/03/2008] [Accepted: 01/03/2008] [Indexed: 11/26/2022]
Abstract
Voltage-gated K(+) ion channels (VKCs) are membrane proteins that regulate the passage of potassium ions through membranes. This work reports a classification scheme of VKCs according to the signs of three electrophysiological variables: activation threshold voltage (V(t)), half-activation voltage (V(a50)) and half-inactivation voltage (V(h50)). A novel 3D pseudo-folding graph representation of protein sequences encoded the VKC sequences. Amino acid pseudo-folding 3D distances count (AAp3DC) descriptors, calculated from the Euclidean distances matrices (EDMs) were tested for building the classifiers. Genetic algorithm (GA)-optimized support vector machines (SVMs) with a radial basis function (RBF) kernel well discriminated between VKCs having negative and positive/zero V(t), V(a50) and V(h50) values with overall accuracies about 80, 90 and 86%, respectively, in crossvalidation test. We found contributions of the "pseudo-core" and "pseudo-surface" of the 3D pseudo-folded proteins to the discrimination between VKCs according to the three electrophysiological variables.
Collapse
Affiliation(s)
- Michael Fernández
- Molecular Modeling Group, Center for Biotechnological Studies, Faculty of Agronomy, University of Matanzas, 44740 Matanzas, Cuba.
| | | | | | | |
Collapse
|
15
|
Mauri A, Ballabio D. Chapter 5 Similarity/Diversity Measure for Sequential Data Based on Hasse Matrices. Data Handling in Science and Technology 2008. [DOI: 10.1016/s0922-3487(08)10005-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
16
|
Abstract
We consider the sensitivity of numerical characterizations of proteome on the number of proteins considered in the analysis. We examined data on proteomics maps belonging to the liver cells of mice subject to four proliferators. We varied the number of proteins considered for quantitative analysis from 25 up to 1000 proteins. For each case, we have compared the similarity/dissimilarity results when different number of proteins has been considered. We found that proteins maps based on a set of about 300 most abundant proteins spots suffice for satisfactory numerical characterization of corresponding proteome.
Collapse
Affiliation(s)
- Milan Randić
- National Institute of Chemistry, Ljubljana, Slovenia.
| |
Collapse
|
17
|
Abstract
DNA sequencing has resulted in an abundance of data on DNA sequences for various species. Hence, the characterization and comparison of sequences become more important but still difficult tasks. In this paper, we first give a 2-D ladderlike graphical representation for the characteristic sequences of a DNA sequence, and then construct a 3-component vector, in which the normalized ALE-indices extracted from such three 2-D graphs via D/D matrices are individual components, to characterize the DNA sequence. The examination of similarities/dissimilarities among sequences of the beta-globin genes of different species illustrates the utility of the approach.
Collapse
Affiliation(s)
- Chun Li
- Department of Mathematics, Bohai University, Jinzhou, PR China.
| | | |
Collapse
|
18
|
Balasubramanian K, Khokhani K, Basak SC. Complex Graph Matrix Representations and Characterizations of Proteomic Maps and Chemically Induced Changes to Proteomes. J Proteome Res 2006; 5:1133-42. [PMID: 16674102 DOI: 10.1021/pr050445s] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We have presented a complex graph matrix representation to characterize proteomics maps obtained from 2D-gel electrophoresis. In this method, each bubble in a 2D-gel proteomics map is represented by a complex number with components which are charge and mass. Then, a graph with complex weights is constructed by connecting the vertices in the relative order of abundance. This yields adjacency matrices and distance matrices of the proteomics graph with complex weights. We have computed the spectra, eigenvectors, and other properties of complex graphs and the Euclidian/graph distance obtained from the complex graphs. The leading eigenvalues and eigenvectors and, likewise, the smallest eigenvalues and eigenvectors, and the entire graph spectral patterns of the complex matrices derived from them yield novel weighted biodescriptors that characterize proteomics maps with information of charge and masses of proteins. We have also applied these eigenvector and eigenvalue maps to contrast the normal cells and cells exposed to four peroxisome proliferators, namely, clofibrate, diethylhexyl phthalate (DEHP), perfluorodecanoic acid (PFDA), and perfluoroctanoic acid (PFOA). Our complex eigenspectra show that the proteomic response induced by DEHP differs from the corresponding responses of other three chemicals consistent with their chemical structures and properties.
Collapse
Affiliation(s)
- Krishnan Balasubramanian
- Chemistry and Material Science Directorate, Lawrence Livermore National Laboratory, University of California, Livermore, California 94550, USA.
| | | | | |
Collapse
|
19
|
Randić M, Witzmann FA, Kodali V, Basak SC. On the Dependence of a Characterization of Proteomics Maps on the Number of Protein Spots Considered. J Chem Inf Model 2005; 46:116-22. [PMID: 16426047 DOI: 10.1021/ci050132h] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We have reexamined the numerical characterization of proteomics maps based on the construction of novel distance matrices associated with the nearest neighbor graph for the protein spots. In particular we consider dependence of a characterization of proteomics map on the number of proteins considered in the analysis. We examined a collection of proteomics maps in which we approximately doubled the number of spots to be used for quantitative analysis, considering cases of maps having 30, 50, 100, 250, 500, and 1054 protein spots. For each case we have compared the similarity-dissimilarity results for five proteomics maps of rat liver cells associated with the control group and four proliferators administrated by intraperitoneal injection. We found that proteins maps based on a set of about the 250 most abundant proteins spots suffice for a satisfactory numerical characterization of such maps.
Collapse
Affiliation(s)
- Milan Randić
- National Institute of Chemistry, Ljubljana, Slovenia.
| | | | | | | |
Collapse
|
20
|
Vracko M, Basak SC, Geiss K, Witzmann F. Proteomic Maps−Toxicity Relationship of Halocarbons Studied with Similarity Index and Genetic Algorithm. J Chem Inf Model 2005; 46:130-6. [PMID: 16426049 DOI: 10.1021/ci0502597] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
In this work we analyzed proteomic maps obtained from hepatocytes, which were treated with 14 halocarbons. A similarity index was introduced as a robust measure of similarity between two maps or between two selections of spots within the maps. A searching algorithm was used to identify the spots that may play an important role in toxicity mechanism. The highest correlation coefficients obtained between the similarity index and biological parameter were larger than 0.9.
Collapse
Affiliation(s)
- Marjan Vracko
- National Institute of Chemistry, Ljubljana, Slovenia.
| | | | | | | |
Collapse
|
21
|
Abstract
We propose a canonical labeling of proteome maps, which enables one to sort and catalog the maps in a simple way. The canonical label of a proteome map is based on the canonical labeling of vertexes of Hasse diagram embedded in the map resulting in the adjacency matrix, the rows of which when viewed as binary numbers are the smallest possible such numbers. The use of the approach in documentation is illustrated with the proteome maps of liver cells of healthy male Fisher F344 rats and the rats treated with different peroxisome proliferators.
Collapse
Affiliation(s)
- Milan Randić
- National Institute of Chemistry, P.O. Box 3430, 1001 Ljubljana, Slovenia, The Ruder Bosković Institute, P.O. Box 180, HR-10002 Zagreb, Croatia.
| | | | | | | | | | | |
Collapse
|
22
|
Abstract
We consider a characterization of proteomics maps based on an alternative kind of neighborhood graphs for the protein spots on 2-D gel. The novel approach considers for every protein spot only the nearest neighborhood consisting of protein spots of higher abundance. The approach has the simplicity and advantages of the recently introduced characterization of proteome maps based on considering the nearest neighborhoods of protein spots, but it also has important additional desirable computational features. The characterization of the nearest neighborhood graphs of 2-D gel proteomics maps is sensitive to the number of spots considered and may lead to changes in the degree of similarity of different maps when the number of points has been changed, thus imposing restrictions on the protocol used for comparison of maps. The novel approach presented in this work is less sensitive to the number of points used in the analysis because graphs are constructed in a stepwise process in which the role of more distant neighbors has been diminished by linking a new spot to the nearest spot that has been already part of the neighborhood graph. In this way a graph with N + 1 spots is obtained from the graph on N spots by adding a single new link, while in the case of the nearest neighborhood graphs adding a new spot introduces novel neighborhoods and generally results in a graph that may differ significantly from the neighborhood graph on N points.
Collapse
Affiliation(s)
- Milan Randić
- National Institute of Chemistry, Ljubljana, Hajdrihova 19, Slovenia
| | | | | |
Collapse
|
23
|
|
24
|
Abstract
For a DNA sequence with n bases, one can always associate it with an n x n nonnegative real symmetric matrix whose diagonal entries are zero. Once the matrix is given, its leading eigenvalue is usually calculated and used as an invariant to characterize the DNA sequence. Let M be such a matrix, and lambda1 its leading eigenvalue. Then (1/n)//M//m1 and sqrt [(n-1)/n]//M//F are the lower and upper bounds of lambda1, respectively. Since their arithmetic average is an approximate value of lambda1 and simpler for calculation, we can use it as an alternative invariant to characterize the DNA sequence. The utility of the new parameter is illustrated on the DNA sequences of five species: human, chimpanzee, mouse, rat, and gallus.
Collapse
Affiliation(s)
- Chun Li
- Department of Mathematics, Bohai University, Jinzhou 121000, PR China.
| | | |
Collapse
|
25
|
Abstract
Most 2D graphical representations of primary DNA sequences, while offering visual geometrical patterns for depicting sequences, do require considerable space if enough details of such representations are to be visible. In this contribution, we consider a highly compact graphical representation of DNA, which allows visual inspection and numerical characterization of DNA sequences having a large number of nucleic acid bases. The approach is illustrated on the DNA sequences of the first exon of human beta-globin. The same graphical approach not only allows one to depict differences in composition within a single DNA, but makes possible graphical representation of protein sequences, which have hitherto evaded similar 2D visual representations.
Collapse
Affiliation(s)
- M Randić
- National Institute of Chemistry, Hajdrihova 19, 1000 Ljubljana, Slovenia.
| | | |
Collapse
|
26
|
Abstract
We consider the problem of the construction of invariants for characterization of 2-D maps, such as 2-D proteome maps, 2-D NMR spectral maps, etc., that in addition to facilitating cataloguing such maps, can be used for comparison of maps and numerical evaluation of their degree of similarity. A novel approach, based on the concept that the nearest neighborhood of points (spots) on a map are sufficiently flexible to allow one not only to vary the number of points used for characterization of the map but also the density of information on their relative positions, is put forward. The method is illustrated with the Coomassie brilliant blue stained 2-D gel electrophoresis patterns of the proteomes from liver cells of healthy male Fisher F344 rats and the rats treated with four peroxisome proliferators.
Collapse
Affiliation(s)
- Milan Randić
- National Institute of Chemistry, 1001 Ljubljana, Slovenia.
| | | | | | | |
Collapse
|
27
|
|
28
|
Affiliation(s)
- Milan Randić
- National Institute of Chemistry, Ljubljana, Slovenia
| |
Collapse
|
29
|
|