1
|
Liu Y, Wu X, Wang Y. An integrated approach for copy number variation discovery in parent-offspring trios. Brief Bioinform 2021; 22:6306464. [PMID: 34151932 DOI: 10.1093/bib/bbab230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 04/27/2021] [Accepted: 05/25/2021] [Indexed: 11/14/2022] Open
Abstract
Whole-genome sequencing (WGS) of parent-offspring trios has become widely used to identify causal copy number variations (CNVs) in rare and complex diseases. Existing CNV detection approaches usually do not make effective use of Mendelian inheritance in parent-offspring trios and yield low accuracy. In this study, we propose a novel integrated approach, TrioCNV2, for jointly detecting CNVs from WGS data of the parent-offspring trio. TrioCNV2 first makes use of the read depth and discordant read pairs to infer approximate locations of CNVs and then employs the split read and local de novo assembly approaches to refine the breakpoints. We use the real WGS data of two parent-offspring trios to demonstrate TrioCNV2's performance and compare it with other CNV detection approaches. The software TrioCNV2 is implemented using a combination of Java and R and is freely available from the website at https://github.com/yongzhuang/TrioCNV2.
Collapse
Affiliation(s)
- Yongzhuang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Xiaoliang Wu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| |
Collapse
|
2
|
Huang X, Pearce R, Zhang Y. FASPR: an open-source tool for fast and accurate protein side-chain packing. Bioinformatics 2020; 36:3758-3765. [PMID: 32259206 DOI: 10.1093/bioinformatics/btaa234] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Revised: 03/30/2020] [Accepted: 04/01/2020] [Indexed: 01/04/2023] Open
Abstract
MOTIVATION Protein structure and function are essentially determined by how the side-chain atoms interact with each other. Thus, accurate protein side-chain packing (PSCP) is a critical step toward protein structure prediction and protein design. Despite the importance of the problem, however, the accuracy and speed of current PSCP programs are still not satisfactory. RESULTS We present FASPR for fast and accurate PSCP by using an optimized scoring function in combination with a deterministic searching algorithm. The performance of FASPR was compared with four state-of-the-art PSCP methods (CISRR, RASP, SCATD and SCWRL4) on both native and non-native protein backbones. For the assessment on native backbones, FASPR achieved a good performance by correctly predicting 69.1% of all the side-chain dihedral angles using a stringent tolerance criterion of 20°, compared favorably with SCWRL4, CISRR, RASP and SCATD which successfully predicted 68.8%, 68.6%, 67.8% and 61.7%, respectively. Additionally, FASPR achieved the highest speed for packing the 379 test protein structures in only 34.3 s, which was significantly faster than the control methods. For the assessment on non-native backbones, FASPR showed an equivalent or better performance on I-TASSER predicted backbones and the backbones perturbed from experimental structures. Detailed analyses showed that the major advantage of FASPR lies in the optimal combination of the dead-end elimination and tree decomposition with a well optimized scoring function, which makes FASPR of practical use for both protein structure modeling and protein design studies. AVAILABILITY AND IMPLEMENTATION The web server, source code and datasets are freely available at https://zhanglab.ccmb.med.umich.edu/FASPR and https://github.com/tommyhuangthu/FASPR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
3
|
Fine J, Lackner R, Samudrala R, Chopra G. Computational chemoproteomics to understand the role of selected psychoactives in treating mental health indications. Sci Rep 2019; 9:13155. [PMID: 31511563 PMCID: PMC6739337 DOI: 10.1038/s41598-019-49515-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Accepted: 07/31/2019] [Indexed: 12/17/2022] Open
Abstract
We have developed the Computational Analysis of Novel Drug Opportunities (CANDO) platform to infer homology of drug behaviour at a proteomic level by constructing and analysing structural compound-proteome interaction signatures of 3,733 compounds with 48,278 proteins in a shotgun manner. We applied the CANDO platform to predict putative therapeutic properties of 428 psychoactive compounds that belong to the phenylethylamine, tryptamine, and cannabinoid chemical classes for treating mental health indications. Our findings indicate that these 428 psychoactives are among the top-ranked predictions for a significant fraction of mental health indications, demonstrating a significant preference for treating such indications over non-mental health indications, relative to randomized controls. Also, we analysed the use of specific tryptamines for the treatment of sleeping disorders, bupropion for substance abuse disorders, and cannabinoids for epilepsy. Our innovative use of the CANDO platform may guide the identification and development of novel therapies for mental health indications and provide an understanding of their causal basis on a detailed mechanistic level. These predictions can be used to provide new leads for preclinical drug development for mental health and other neurological disorders.
Collapse
Affiliation(s)
- Jonathan Fine
- Department of Chemistry, Purdue University, West Lafayette, IN, USA
| | - Rachel Lackner
- Department of Chemistry, University of Pennsylvania, Philadelphia, PA, USA
| | - Ram Samudrala
- Department of Biomedical Informatics, SUNY, Buffalo, NY, USA.
| | - Gaurav Chopra
- Department of Chemistry, Purdue University, West Lafayette, IN, USA.
- Purdue Institute for Drug Discovery, Purdue Institute for Integrative Neuroscience, Purdue Institute for Integrative Neuroscience, Purdue Institute for Immunology, Inflammation and Infectious Disease, Integrative Data Science Initiative, Purdue Center for Cancer Research, West Lafayette, IN, USA.
| |
Collapse
|
4
|
Chopra G, Samudrala R. Exploring Polypharmacology in Drug Discovery and Repurposing Using the CANDO Platform. Curr Pharm Des 2017; 22:3109-23. [PMID: 27013226 DOI: 10.2174/1381612822666160325121943] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2016] [Accepted: 03/01/2015] [Indexed: 01/05/2023]
Abstract
BACKGROUND Traditional drug discovery approaches focus on a limited set of target molecules for treatment against specific indications/diseases. However, drug absorption, dispersion, metabolism, and excretion (ADME) involve interactions with multiple protein systems. Drugs approved for particular indication(s) may be repurposed as novel therapeutics for others. The severely declining rate of discovery and increasing costs of new drugs illustrate the limitations of the traditional reductionist paradigm in drug discovery. METHODS We developed the Computational Analysis of Novel Drug Opportunities (CANDO) platform based on a hypothesis that drugs function by interacting with multiple protein targets to create a molecular interaction signature that can be exploited for therapeutic repurposing and discovery. We compiled a library of compounds that are human ingestible with minimal side effects, followed by an 'all-compounds' vs 'all-proteins' fragment-based multitarget docking with dynamics screen to construct compound-proteome interaction matrices that were then analyzed to determine similarity of drug behavior. The proteomic signature similarity of drugs is then ranked to make putative drug predictions for all indications in a shotgun manner. RESULTS We have previously applied this platform with success in both retrospective benchmarking and prospective validation, and to understand the effect of druggable protein classes on repurposing accuracy. Here we use the CANDO platform to analyze and determine the contribution of multitargeting (polypharmacology) to drug repurposing benchmarking accuracy. Taken together with the previous work, our results indicate that a large number of protein structures with diverse fold space and a specific polypharmacological interactome is necessary for accurate drug predictions using our proteomic and evolutionary drug discovery and repurposing platform. CONCLUSION These results have implications for future drug development and repurposing in the context of polypharmacology.
Collapse
Affiliation(s)
- Gaurav Chopra
- Department of Chemistry, Purdue University, West Lafayette, IN, USA.
| | - Ram Samudrala
- Department of Biomedical Informatics, SUNY, Buffalo, NY, USA.
| |
Collapse
|
5
|
Abstract
Comparative protein structure modeling predicts the three-dimensional structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and how to use the ModBase database of such models, and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. © 2016 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Benjamin Webb
- University of California at San Francisco, San Francisco, California
| | - Andrej Sali
- University of California at San Francisco, San Francisco, California
| |
Collapse
|
6
|
Gaumont N, Magnien C, Latapy M. Finding remarkably dense sequences of contacts in link streams. SOCIAL NETWORK ANALYSIS AND MINING 2016. [DOI: 10.1007/s13278-016-0396-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
7
|
Webb B, Sali A. Comparative Protein Structure Modeling Using MODELLER. CURRENT PROTOCOLS IN BIOINFORMATICS 2016; 54:5.6.1-5.6.37. [PMID: 27322406 PMCID: PMC5031415 DOI: 10.1002/cpbi.3] [Citation(s) in RCA: 1820] [Impact Index Per Article: 227.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Comparative protein structure modeling predicts the three-dimensional structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and how to use the ModBase database of such models, and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. © 2016 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Benjamin Webb
- University of California at San Francisco, San Francisco, California
| | - Andrej Sali
- University of California at San Francisco, San Francisco, California
| |
Collapse
|
8
|
Coarse-grained modeling of RNA 3D structure. Methods 2016; 103:138-56. [PMID: 27125734 DOI: 10.1016/j.ymeth.2016.04.026] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2015] [Revised: 04/21/2016] [Accepted: 04/22/2016] [Indexed: 12/21/2022] Open
Abstract
Functional RNA molecules depend on three-dimensional (3D) structures to carry out their tasks within the cell. Understanding how these molecules interact to carry out their biological roles requires a detailed knowledge of RNA 3D structure and dynamics as well as thermodynamics, which strongly governs the folding of RNA and RNA-RNA interactions as well as a host of other interactions within the cellular environment. Experimental determination of these properties is difficult, and various computational methods have been developed to model the folding of RNA 3D structures and their interactions with other molecules. However, computational methods also have their limitations, especially when the biological effects demand computation of the dynamics beyond a few hundred nanoseconds. For the researcher confronted with such challenges, a more amenable approach is to resort to coarse-grained modeling to reduce the number of data points and computational demand to a more tractable size, while sacrificing as little critical information as possible. This review presents an introduction to the topic of coarse-grained modeling of RNA 3D structures and dynamics, covering both high- and low-resolution strategies. We discuss how physics-based approaches compare with knowledge based methods that rely on databases of information. In the course of this review, we discuss important aspects in the reasoning process behind building different models and the goals and pitfalls that can result.
Collapse
|
9
|
Zhang J, Barz B, Zhang J, Xu D, Kosztin I. Selective refinement and selection of near-native models in protein structure prediction. Proteins 2015; 83:1823-35. [PMID: 26214389 PMCID: PMC4700123 DOI: 10.1002/prot.24866] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2015] [Revised: 06/22/2015] [Accepted: 07/21/2015] [Indexed: 11/07/2022]
Abstract
In recent years in silico protein structure prediction reached a level where fully automated servers can generate large pools of near-native structures. However, the identification and further refinement of the best structures from the pool of models remain problematic. To address these issues, we have developed (i) a target-specific selective refinement (SR) protocol; and (ii) molecular dynamics (MD) simulation based ranking (SMDR) method. In SR the all-atom refinement of structures is accomplished via the Rosetta Relax protocol, subject to specific constraints determined by the size and complexity of the target. The best-refined models are selected with SMDR by testing their relative stability against gradual heating through all-atom MD simulations. Through extensive testing we have found that Mufold-MD, our fully automated protein structure prediction server updated with the SR and SMDR modules consistently outperformed its previous versions.
Collapse
Affiliation(s)
- Jiong Zhang
- Department of Physics and Astronomy, University of Missouri, Columbia, Missouri 65211
| | - Bagdan Barz
- Department of Physics and Astronomy, University of Missouri, Columbia, Missouri 65211
| | - Jingfen Zhang
- Department of Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211
| | - Dong Xu
- Department of Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211
| | - Ioan Kosztin
- Department of Physics and Astronomy, University of Missouri, Columbia, Missouri 65211
| |
Collapse
|
10
|
Sim J, Sim J, Park E, Lee J. Method for identification of rigid domains and hinge residues in proteins based on exhaustive enumeration. Proteins 2015; 83:1054-67. [DOI: 10.1002/prot.24799] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Revised: 02/28/2015] [Accepted: 03/10/2015] [Indexed: 11/07/2022]
Affiliation(s)
- Jaehyun Sim
- Department of Oral Microbiology and Immunology; School of Dentistry, Seoul National University; Seoul 110-749 Korea
| | - Jun Sim
- Department of Bioinformatics and Life Science; Soongsil University; Seoul 156-743 Korea
| | - Eunsung Park
- Administrative Service Division, Apsun Dental Hospital; Seoul 135-590 Korea
| | - Julian Lee
- Department of Oral Microbiology and Immunology; School of Dentistry, Seoul National University; Seoul 110-749 Korea
| |
Collapse
|
11
|
Abstract
Measuring protein structural similarity attempts to establish a relationship of equivalence between polymer structures based on their conformations. In several recent studies, researchers have explored protein-graph remodeling, instead of looking a minimum superimposition for pairwise proteins. When graphs are used to represent structured objects, the problem of measuring object similarity become one of computing the similarity between graphs. Graph theory provides an alternative perspective as well as efficiency. Once a protein graph has been created, its structural stability must be verified. Therefore, a criterion is needed to determine if a protein graph can be used for structural comparison. In this paper, we propose a measurement for protein graph remodeling based on graph entropy. We extend the concept of graph entropy to determine whether a graph is suitable for representing a protein. The experimental results suggest that when applied, graph entropy helps a conformational on protein graph modeling. Furthermore, it indirectly contributes to protein structural comparison if a protein graph is solid.
Collapse
|
12
|
Abstract
Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
Collapse
Affiliation(s)
- Benjamin Webb
- University of California at San Francisco, San Francisco, California
| | | |
Collapse
|
13
|
|
14
|
Webb B, Eswar N, Fan H, Khuri N, Pieper U, Dong G, Sali A. Comparative Modeling of Drug Target Proteins☆. REFERENCE MODULE IN CHEMISTRY, MOLECULAR SCIENCES AND CHEMICAL ENGINEERING 2014. [PMCID: PMC7157477 DOI: 10.1016/b978-0-12-409547-2.11133-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
In this perspective, we begin by describing the comparative protein structure modeling technique and the accuracy of the corresponding models. We then discuss the significant role that comparative prediction plays in drug discovery. We focus on virtual ligand screening against comparative models and illustrate the state-of-the-art by a number of specific examples.
Collapse
|
15
|
Mishra S, Saxena A, Sangwan RS. Fundamentals of Homology Modeling Steps and Comparison among Important Bioinformatics Tools: An Overview. ACTA ACUST UNITED AC 2013. [DOI: 10.17311/sciintl.2013.237.252] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
16
|
|
17
|
VISHVESHWARA SARASWATHI, BRINDA KV, KANNAN N. PROTEIN STRUCTURE: INSIGHTS FROM GRAPH THEORY. JOURNAL OF THEORETICAL & COMPUTATIONAL CHEMISTRY 2012. [DOI: 10.1142/s0219633602000117] [Citation(s) in RCA: 130] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The sequence and structure of a large body of proteins are becoming increasingly available. It is desirable to explore mathematical tools for efficient extraction of information from such sources. The principles of graph theory, which was earlier applied in fields such as electrical engineering and computer networks are now being adopted to investigate protein structure, folding, stability, function and dynamics. This review deals with a brief account of relevant graphs and graph theoretic concepts. The concepts of protein graph construction are discussed. The manner in which graphs are analyzed and parameters relevant to protein structure are extracted, are explained. The structural and biological information derived from protein structures using these methods is presented.
Collapse
Affiliation(s)
| | - K. V. BRINDA
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India
| | - N. KANNAN
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India
| |
Collapse
|
18
|
Koehl P, Orland H, Delarue M. Adapting Poisson-Boltzmann to the self-consistent mean field theory: application to protein side-chain modeling. J Chem Phys 2011; 135:055104. [PMID: 21823735 DOI: 10.1063/1.3621831] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We present an extension of the self-consistent mean field theory for protein side-chain modeling in which solvation effects are included based on the Poisson-Boltzmann (PB) theory. In this approach, the protein is represented with multiple copies of its side chains. Each copy is assigned a weight that is refined iteratively based on the mean field energy generated by the rest of the protein, until self-consistency is reached. At each cycle, the variational free energy of the multi-copy system is computed; this free energy includes the internal energy of the protein that accounts for vdW and electrostatics interactions and a solvation free energy term that is computed using the PB equation. The method converges in only a few cycles and takes only minutes of central processing unit time on a commodity personal computer. The predicted conformation of each residue is then set to be its copy with the highest weight after convergence. We have tested this method on a database of hundred highly refined NMR structures to circumvent the problems of crystal packing inherent to x-ray structures. The use of the PB-derived solvation free energy significantly improves prediction accuracy for surface side chains. For example, the prediction accuracies for χ(1) for surface cysteine, serine, and threonine residues improve from 68%, 35%, and 43% to 80%, 53%, and 57%, respectively. A comparison with other side-chain prediction algorithms demonstrates that our approach is consistently better in predicting the conformations of exposed side chains.
Collapse
Affiliation(s)
- Patrice Koehl
- Department of Biological Sciences, National University of Singapore, Singapore.
| | | | | |
Collapse
|
19
|
Dukka Bahadur KC, Tomita E, Suzuki J, Akutsu T. PROTEIN SIDE-CHAIN PACKING PROBLEM: A MAXIMUM EDGE-WEIGHT CLIQUE ALGORITHMIC APPROACH. J Bioinform Comput Biol 2011; 3:103-26. [PMID: 15751115 DOI: 10.1142/s0219720005000904] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2004] [Revised: 06/25/2004] [Accepted: 07/10/2004] [Indexed: 11/18/2022]
Abstract
"Protein Side-chain Packing" has an ever-increasing application in the field of bio-informatics, dating from the early methods of homology modeling to protein design and to the protein docking. However, this problem is computationally known to be NP-hard. In this regard, we have developed a novel approach to solve this problem using the notion of a maximum edge-weight clique. Our approach is based on efficient reduction of protein side-chain packing problem to a graph and then solving the reduced graph to find the maximum clique by applying an efficient clique finding algorithm developed by our co-authors. Since our approach is based on deterministic algorithms in contrast to the various existing algorithms based on heuristic approaches, our algorithm guarantees of finding an optimal solution. We have tested this approach to predict the side-chain conformations of a set of proteins and have compared the results with other existing methods. We have found that our results are favorably comparable or better than the results produced by the existing methods. As our test set contains a protein of 494 residues, we have obtained considerable improvement in terms of size of the proteins and in terms of the efficiency and the accuracy of prediction.
Collapse
Affiliation(s)
- K C Dukka Bahadur
- Graduate School of Informatics & Bioinformatics Center Kyoto University, Kyoto, 611-0001, Japan.
| | | | | | | |
Collapse
|
20
|
Abstract
There is a growing interest in the identification of proteins on the proteome wide scale. Among different kinds of protein structure identification methods, graph-theoretic methods are very sharp ones. Due to their lower costs, higher effectiveness and many other advantages, they have drawn more and more researchers' attention nowadays. Specifically, graph-theoretic methods have been widely used in homology identification, side-chain cluster identification, peptide sequencing and so on. This paper reviews several methods in solving protein structure identification problems using graph theory. We mainly introduce classical methods and mathematical models including homology modeling based on clique finding, identification of side-chain clusters in protein structures upon graph spectrum, and de novo peptide sequencing via tandem mass spectrometry using the spectrum graph model. In addition, concluding remarks and future priorities of each method are given.
Collapse
Affiliation(s)
- Yan Yan
- Department of Applied Mathematics, Northwestern Polytechnical University, Xi’an, Shaanxi 710072, P.R. China
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| | - Shenggui Zhang
- Department of Applied Mathematics, Northwestern Polytechnical University, Xi’an, Shaanxi 710072, P.R. China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| |
Collapse
|
21
|
Abstract
MOTIVATION Modeling of side chain conformations constitutes an indispensable effort in protein structure modeling, protein-protein docking and protein design. Thanks to an intensive attention to this field, many of the existing programs can achieve reasonably good and comparable prediction accuracy. Moreover, in our previous work on CIS-RR, we argued that the prediction with few atomic clashes can complement the current existing methods for subsequent analysis and refinement of protein structures. However, these recent efforts to enhance the quality of predicted side chains have been accompanied by a significant increase of computational cost. RESULTS In this study, by mainly focusing on improving the speed of side chain conformation prediction, we present a RApid Side-chain Predictor, called RASP. To achieve a much faster speed with a comparable accuracy to the best existing methods, we not only employ the clash elimination strategy of CIS-RR, but also carefully optimize energy terms and integrate different search algorithms. In comprehensive benchmark testings, RASP is over one order of magnitude faster (~ 40 times over CIS-RR) than the recently developed methods, while achieving comparable or even better accuracy.
Collapse
Affiliation(s)
- Zhichao Miao
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | | | | |
Collapse
|
22
|
Eppstein D, Strash D. Listing All Maximal Cliques in Large Sparse Real-World Graphs. EXPERIMENTAL ALGORITHMS 2011. [DOI: 10.1007/978-3-642-20662-7_31] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
23
|
Abstract
Functional characterization of a protein is often facilitated by its 3D structure. However, the fraction of experimentally known 3D models is currently less than 1% due to the inherently time-consuming and complicated nature of structure determination techniques. Computational approaches are employed to bridge the gap between the number of known sequences and that of 3D models. Template-based protein structure modeling techniques rely on the study of principles that dictate the 3D structure of natural proteins from the theory of evolution viewpoint. Strategies for template-based structure modeling will be discussed with a focus on comparative modeling, by reviewing techniques available for all the major steps involved in the comparative modeling pipeline.
Collapse
Affiliation(s)
- Andras Fiser
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY, USA
| |
Collapse
|
24
|
Eppstein D, Löffler M, Strash D. Listing All Maximal Cliques in Sparse Graphs in Near-Optimal Time. ALGORITHMS AND COMPUTATION 2010. [DOI: 10.1007/978-3-642-17517-6_36] [Citation(s) in RCA: 129] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
25
|
Rahman SA, Bashton M, Holliday GL, Schrader R, Thornton JM. Small Molecule Subgraph Detector (SMSD) toolkit. J Cheminform 2009; 1:12. [PMID: 20298518 PMCID: PMC2820491 DOI: 10.1186/1758-2946-1-12] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2009] [Accepted: 08/10/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Finding one small molecule (query) in a large target library is a challenging task in computational chemistry. Although several heuristic approaches are available using fragment-based chemical similarity searches, they fail to identify exact atom-bond equivalence between the query and target molecules and thus cannot be applied to complex chemical similarity searches, such as searching a complete or partial metabolic pathway.In this paper we present a new Maximum Common Subgraph (MCS) tool: SMSD (Small Molecule Subgraph Detector) to overcome the issues with current heuristic approaches to small molecule similarity searches. The MCS search implemented in SMSD incorporates chemical knowledge (atom type match with bond sensitive and insensitive information) while searching molecular similarity. We also propose a novel method by which solutions obtained by each MCS run can be ranked using chemical filters such as stereochemistry, bond energy, etc. RESULTS In order to benchmark and test the tool, we performed a 50,000 pair-wise comparison between KEGG ligands and PDB HET Group atoms. In both cases the SMSD was shown to be more efficient than the widely used MCS module implemented in the Chemistry Development Kit (CDK) in generating MCS solutions from our test cases. CONCLUSION Presently this tool can be applied to various areas of bioinformatics and chemo-informatics for finding exhaustive MCS matches. For example, it can be used to analyse metabolic networks by mapping the atoms between reactants and products involved in reactions. It can also be used to detect the MCS/substructure searches in small molecules reported by metabolome experiments, as well as in the screening of drug-like compounds with similar substructures.Thus, we present a robust tool that can be used for multiple applications, including the discovery of new drug molecules. This tool is freely available on http://www.ebi.ac.uk/thornton-srv/software/SMSD/
Collapse
Affiliation(s)
- Syed Asad Rahman
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | | | | | | | | |
Collapse
|
26
|
Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development. J Comput Aided Mol Des 2009; 23:773-84. [DOI: 10.1007/s10822-009-9273-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2008] [Accepted: 04/15/2009] [Indexed: 12/12/2022]
|
27
|
Kittichotirat W, Guerquin M, Bumgarner RE, Samudrala R. Protinfo PPC: a web server for atomic level prediction of protein complexes. Nucleic Acids Res 2009; 37:W519-25. [PMID: 19420059 PMCID: PMC2703994 DOI: 10.1093/nar/gkp306] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
‘Protinfo PPC’ (Prediction of Protein Complex) is a web server that predicts atomic level structures of interacting proteins from their amino-acid sequences. It uses the interolog method to search for experimental protein complex structures that are homologous to the input sequences submitted by a user. These structures are then used as starting templates to generate protein complex models, which are returned to the user in Protein Data Bank format via email. The server supports modeling of both homo and hetero multimers and generally produces full atomic level models (including insertion/deletion regions) of protein complexes as long as at least one putative homologous template for the query sequences is found. The modeling pipeline behind Protinfo PPC has been rigorously benchmarked and proven to produce highly accurate protein complex models. The fully automated all atom comparative modeling service for protein complexes provided by Protinfo PPC server offers wide capabilities ranging from prediction of protein complex interactions to identification of possible interaction sites, which will be useful for researchers studying these topics. The Protinfo PPC web server is available at http://protinfo.compbio.washington.edu/ppc/
Collapse
|
28
|
Vadivel K, Namasivayam G. An estimate of the numbers and density of low-energy structures (or decoys) in the conformational landscape of proteins. PLoS One 2009; 4:e5148. [PMID: 19357778 PMCID: PMC2663821 DOI: 10.1371/journal.pone.0005148] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2008] [Accepted: 03/02/2009] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND The conformational energy landscape of a protein, as calculated by known potential energy functions, has several minima, and one of these corresponds to its native structure. It is however difficult to comprehensively estimate the actual numbers of low energy structures (or decoys), the relationships between them, and how the numbers scale with the size of the protein. METHODOLOGY We have developed an algorithm to rapidly and efficiently identify the low energy conformers of oligo peptides by using mutually orthogonal Latin squares to sample the potential energy hyper surface. Using this algorithm, and the ECEPP/3 potential function, we have made an exhaustive enumeration of the low-energy structures of peptides of different lengths, and have extrapolated these results to larger polypeptides. CONCLUSIONS AND SIGNIFICANCE We show that the number of native-like structures for a polypeptide is, in general, an exponential function of its sequence length. The density of these structures in conformational space remains more or less constant and all the increase appears to come from an expansion in the volume of the space. These results are consistent with earlier reports that were based on other models and techniques.
Collapse
Affiliation(s)
- Kanagasabai Vadivel
- Centre of Advanced Study in Crystallography & Biophysics, University of Madras, Tamilnadu, India
| | - Gautham Namasivayam
- Centre of Advanced Study in Crystallography & Biophysics, University of Madras, Tamilnadu, India
- * E-mail:
| |
Collapse
|
29
|
Abstract
The Bioverse is a framework for creating, warehousing and presenting biological information based on hierarchical levels of organisation. The framework is guided by a deeper philosophy of desiring to represent all relationships between all components of biological systems towards the goal of a wholistic picture of organismal biology. Data from various sources are combined into a single repository and a uniform interface is exposed to access it. The power of the approach of the Bioverse is that, due to its inclusive nature, patterns emerge from the acquired data and new predictions are made. The implementation of this repository (beginning with acquisition of source data, processing in a pipeline, and concluding with storage in a relational database) and interfaces to the data contained in it, from a programmatic application interface to a user friendly web application, are discussed.
Collapse
|
30
|
Vicinity analysis: a methodology for the identification of similar protein active sites. J Mol Model 2008; 15:489-98. [DOI: 10.1007/s00894-008-0424-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2008] [Accepted: 11/17/2008] [Indexed: 10/21/2022]
|
31
|
Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A. Comparative protein structure modeling using MODELLER. ACTA ACUST UNITED AC 2008; Chapter 2:Unit 2.9. [PMID: 18429317 DOI: 10.1002/0471140864.ps0209s50] [Citation(s) in RCA: 750] [Impact Index Per Article: 46.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Functional characterization of a protein sequence is a common goal in biology, and is usually facilitated by having an accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
Collapse
Affiliation(s)
- Narayanan Eswar
- University of California at San Francisco, San Francisco, California, USA
| | | | | | | | | | | | | | | |
Collapse
|
32
|
Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A. Comparative protein structure modeling using Modeller. ACTA ACUST UNITED AC 2008; Chapter 5:Unit-5.6. [PMID: 18428767 DOI: 10.1002/0471250953.bi0506s15] [Citation(s) in RCA: 1758] [Impact Index Per Article: 109.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
Collapse
Affiliation(s)
- Narayanan Eswar
- University of California at San Francisco San Francisco, California
| | - Ben Webb
- University of California at San Francisco San Francisco, California
| | | | - M S Madhusudhan
- University of California at San Francisco San Francisco, California
| | - David Eramian
- University of California at San Francisco San Francisco, California
| | - Min-Yi Shen
- University of California at San Francisco San Francisco, California
| | - Ursula Pieper
- University of California at San Francisco San Francisco, California
| | - Andrej Sali
- University of California at San Francisco San Francisco, California
| |
Collapse
|
33
|
Liu T, Guerquin M, Samudrala R. Improving the accuracy of template-based predictions by mixing and matching between initial models. BMC STRUCTURAL BIOLOGY 2008; 8:24. [PMID: 18457597 PMCID: PMC2424052 DOI: 10.1186/1472-6807-8-24] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2007] [Accepted: 05/05/2008] [Indexed: 11/10/2022]
Abstract
BACKGROUND Comparative modeling is a technique to predict the three dimensional structure of a given protein sequence based primarily on its alignment to one or more proteins with experimentally determined structures. A major bottleneck of current comparative modeling methods is the lack of methods to accurately refine a starting initial model so that it approaches the resolution of the corresponding experimental structure. We investigate the effectiveness of a graph-theoretic clique finding approach to solve this problem. RESULTS Our method takes into account the information presented in multiple templates/alignments at the three-dimensional level by mixing and matching regions between different initial comparative models. This method enables us to obtain an optimized conformation ensemble representing the best combination of secondary structures, resulting in the refined models of higher quality. In addition, the process of mixing and matching accumulates near-native conformations, resulting in discriminating the native-like conformation in a more effective manner. In the seventh Critical Assessment of Structure Prediction (CASP7) experiment, the refined models produced are more accurate than the starting initial models. CONCLUSION This novel approach can be applied without any manual intervention to improve the quality of comparative predictions where multiple template/alignment combinations are available for modeling, producing conformational models of higher quality than the starting initial predictions.
Collapse
Affiliation(s)
- Tianyun Liu
- Department of Microbiology, University of Washington, School of Medicine, Seattle, WA 98195, USA
| | - Michal Guerquin
- Department of Microbiology, University of Washington, School of Medicine, Seattle, WA 98195, USA
| | - Ram Samudrala
- Department of Microbiology, University of Washington, School of Medicine, Seattle, WA 98195, USA
| |
Collapse
|
34
|
Bockhorst J, Lu F, Janes JH, Keebler J, Gamain B, Awadalla P, Su XZ, Samudrala R, Jojic N, Smith JD. Structural polymorphism and diversifying selection on the pregnancy malaria vaccine candidate VAR2CSA. Mol Biochem Parasitol 2007; 155:103-12. [PMID: 17669514 DOI: 10.1016/j.molbiopara.2007.06.007] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2007] [Revised: 06/11/2007] [Accepted: 06/12/2007] [Indexed: 01/08/2023]
Abstract
VAR2CSA is the main candidate for a pregnancy malaria vaccine, but vaccine development may be complicated by sequence polymorphism. Here, we obtained partial or full-length var2CSA sequences from 106 parasites and applied novel computational methods and three-dimensional modeling to investigate VAR2CSA geographic variation and selection pressure. Our analysis reveals structural patterns of VAR2CSA sequence variation in which polymorphic sites group into segments of limited diversity. Within these segments, two or three basic types characterize a substantial majority of the parasite samples. Comparison to the primate malaria Plasmodium reichenowi shows that these basic types have ancient origins. Globally, var2CSA genes are comprised of a mosaic of these ancestral polymorphic segments that have recombined extensively between var2CSA alleles. Three-dimensional modeling reveals that polymorphic segments concentrate in flexible loops at characteristic locations in the six VAR2CSA Duffy binding-like (DBL) adhesion domains. Individual DBL domain surfaces have distinct patterns of diversifying selection, suggesting that limited and differing portions of each DBL domain are targeted by host antibody. Since standard phylogenetic tree analysis is inadequate for highly recombining genes like var2CSA, we developed a novel phylogenetic approach that incorporates recombination and tracks new mutations in segment types. In the resulting tree, P. reichenowi is confirmed as an outlier and African and Asian P. falciparum isolates have slightly diverged. These findings validate a new approach to modeling protein evolution in the presence of frequent recombination and provide a clearer understanding of how var gene products function as immunoevasive binding ligands.
Collapse
MESH Headings
- Animals
- Antigens, Protozoan/chemistry
- Antigens, Protozoan/genetics
- Antigens, Protozoan/immunology
- Computational Biology/methods
- DNA, Protozoan/chemistry
- DNA, Protozoan/genetics
- Female
- Geography
- Humans
- Malaria/immunology
- Malaria/parasitology
- Malaria Vaccines/immunology
- Models, Molecular
- Molecular Sequence Data
- Phylogeny
- Plasmodium falciparum/genetics
- Plasmodium falciparum/isolation & purification
- Polymorphism, Genetic
- Pregnancy
- Pregnancy Complications, Parasitic/immunology
- Pregnancy Complications, Parasitic/prevention & control
- Protein Structure, Tertiary
- Selection, Genetic
- Sequence Analysis, DNA
- Sequence Homology, Amino Acid
Collapse
|
35
|
Berube PM, Samudrala R, Stahl DA. Transcription of all amoC copies is associated with recovery of Nitrosomonas europaea from ammonia starvation. J Bacteriol 2007; 189:3935-44. [PMID: 17384196 PMCID: PMC1913382 DOI: 10.1128/jb.01861-06] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2006] [Accepted: 03/14/2007] [Indexed: 11/20/2022] Open
Abstract
The chemolithotrophic ammonia-oxidizing bacterium Nitrosomonas europaea is known to be highly resistant to starvation conditions. The transcriptional response of N. europaea to ammonia addition following short- and long-term starvation was examined by primer extension and S1 nuclease protection analyses of genes encoding enzymes for ammonia oxidation (amoCAB operons) and CO(2) fixation (cbbLS), a third, lone copy of amoC (amoC(3)), and two representative housekeeping genes (glyA and rpsJ). Primer extension analysis of RNA isolated from growing, starved, and recovering cells revealed two differentially regulated promoters upstream of the two amoCAB operons. The distal sigma(70) type amoCAB promoter was constitutively active in the presence of ammonia, but the proximal promoter was only active when cells were recovering from ammonia starvation. The lone, divergent copy of amoC (amoC(3)) was expressed only during recovery. Both the proximal amoC(1,2) promoter and the amoC(3) promoter are similar to gram-negative sigma(E) promoters, thus implicating sigma(E) in the regulation of the recovery response. Although modeling of subunit interactions suggested that a nonconservative proline substitution in AmoC(3) may modify the activity of the holoenzyme, characterization of a DeltaamoC(3) strain showed no significant difference in starvation recovery under conditions evaluated. In contrast to the amo transcripts, a delayed appearance of transcripts for a gene required for CO(2) fixation (cbbL) suggested that its transcription is retarded until sufficient energy is available. Overall, these data revealed a programmed exit from starvation likely involving regulation by sigma(E) and the coordinated regulation of catabolic and anabolic genes.
Collapse
Affiliation(s)
- Paul M Berube
- Department of Microbiology, University of Washington, Seattle, WA 98195-2700, USA
| | | | | |
Collapse
|
36
|
Heath AP, Kavraki LE, Clementi C. From coarse-grain to all-atom: Toward multiscale analysis of protein landscapes. Proteins 2007; 68:646-61. [PMID: 17523187 DOI: 10.1002/prot.21371] [Citation(s) in RCA: 101] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Multiscale methods are becoming increasingly promising as a way to characterize the dynamics of large protein systems on biologically relevant time-scales. The underlying assumption in multiscale simulations is that it is possible to move reliably between different resolutions. We present a method that efficiently generates realistic all-atom protein structures starting from the C(alpha) atom positions, as obtained for instance from extensive coarse-grain simulations. The method, a reconstruction algorithm for coarse-grain structures (RACOGS), is validated by reconstructing ensembles of coarse-grain structures obtained during folding simulations of the proteins src-SH3 and S6. The results show that RACOGS consistently produces low energy, all-atom structures. A comparison of the free energy landscapes calculated using the coarse-grain structures versus the all-atom structures shows good correspondence and little distortion in the protein folding landscape.
Collapse
Affiliation(s)
- Allison P Heath
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| | | | | |
Collapse
|
37
|
Böde C, Kovács IA, Szalay MS, Palotai R, Korcsmáros T, Csermely P. Network analysis of protein dynamics. FEBS Lett 2007; 581:2776-82. [PMID: 17531981 DOI: 10.1016/j.febslet.2007.05.021] [Citation(s) in RCA: 172] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2007] [Revised: 04/30/2007] [Accepted: 05/08/2007] [Indexed: 11/20/2022]
Abstract
The network paradigm is increasingly used to describe the topology and dynamics of complex systems. Here, we review the results of the topological analysis of protein structures as molecular networks describing their small-world character, and the role of hubs and central network elements in governing enzyme activity, allosteric regulation, protein motor function, signal transduction and protein stability. We summarize available data how central network elements are enriched in active centers and ligand binding sites directing the dynamics of the entire protein. We assess the feasibility of conformational and energy networks to simplify the vast complexity of rugged energy landscapes and to predict protein folding and dynamics. Finally, we suggest that modular analysis, novel centrality measures, hierarchical representation of networks and the analysis of network dynamics will soon lead to an expansion of this field.
Collapse
Affiliation(s)
- Csaba Böde
- Department of Biophysics and Radiation Biology, Semmelweis University, Puskin Street 9, H-1088 Budapest, Hungary.
| | | | | | | | | | | |
Collapse
|
38
|
Abstract
In this perspective, we begin by describing the comparative protein structure modeling technique and the accuracy of the corresponding models. We then discuss the significant role that comparative prediction plays in drug discovery. We focus on virtual ligand screening against comparative models and illustrate the state of the art by a number of specific examples.
Collapse
|
39
|
Deng H, Chen G, Yang W, Yang JJ. Predicting calcium-binding sites in proteins - a graph theory and geometry approach. Proteins 2006; 64:34-42. [PMID: 16617426 DOI: 10.1002/prot.20973] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Identifying calcium-binding sites in proteins is one of the first steps towards predicting and understanding the role of calcium in biological systems for protein structure and function studies. Due to the complexity and irregularity of calcium-binding sites, a fast and accurate method for predicting and identifying calcium-binding protein is needed. Here we report our development of a new fast algorithm (GG) to detect calcium-binding sites. The GG algorithm uses a graph theory algorithm to find oxygen clusters of the protein and a geometric algorithm to identify the center of these clusters. A cluster of four or more oxygen atoms has a high potential for calcium binding. High performance with about 90% site sensitivity and 80% site selectivity has been obtained for three datasets containing a total of 123 proteins. The results suggest that a sphere of a certain size with four or more oxygen atoms on the surface and without other atoms inside is necessary and sufficient for quickly identifying the majority of the calcium-binding sites with high accuracy. Our finding opens a new avenue to visualize and analyze calcium-binding sites in proteins facilitating the prediction of functions from structural genomic information.
Collapse
Affiliation(s)
- Hai Deng
- Department of Computer Science, Georgia State University, Atlanta, Georgia 30302, USA
| | | | | | | |
Collapse
|
40
|
Liu T, Jenwitheesuk E, Teller DC, Samudrala R. Structural insights into the cellular retinaldehyde-binding protein (CRALBP). Proteins 2006; 61:412-22. [PMID: 16121400 DOI: 10.1002/prot.20621] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Cellular retinaldehyde-binding protein (CRALBP) is an essential protein in the human visual cycle without a known three-dimensional structure. Previous studies associate retinal pathologies to specific mutations in the CRALBP protein. Here we use homology modeling and molecular dynamics methods to investigate the structural mechanisms by which CRALBP functions in the visual cycle. We have constructed two conformations of CRALBP representing two states in the process of ligand association and dissociation. Notably, our homology models map the pathology-associated mutations either directly in or adjacent to the putative ligand-binding cavity. Furthermore, six novel residues have been identified to be crucial for the hinge movement of the lipid-exchange loop in CRALBP. We conclude that the binding and release of retinoid involve large conformational changes in the lipid-exchange loop at the entrance of the ligand-binding cavity.
Collapse
Affiliation(s)
- Tianyun Liu
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
| | | | | | | |
Collapse
|
41
|
Haynes T, Knisley D, Seier E, Zou Y. A quantitative analysis of secondary RNA structure using domination based parameters on trees. BMC Bioinformatics 2006; 7:108. [PMID: 16515683 PMCID: PMC1420337 DOI: 10.1186/1471-2105-7-108] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2005] [Accepted: 03/03/2006] [Indexed: 11/30/2022] Open
Abstract
Background It has become increasingly apparent that a comprehensive database of RNA motifs is essential in order to achieve new goals in genomic and proteomic research. Secondary RNA structures have frequently been represented by various modeling methods as graph-theoretic trees. Using graph theory as a modeling tool allows the vast resources of graphical invariants to be utilized to numerically identify secondary RNA motifs. The domination number of a graph is a graphical invariant that is sensitive to even a slight change in the structure of a tree. The invariants selected in this study are variations of the domination number of a graph. These graphical invariants are partitioned into two classes, and we define two parameters based on each of these classes. These parameters are calculated for all small order trees and a statistical analysis of the resulting data is conducted to determine if the values of these parameters can be utilized to identify which trees of orders seven and eight are RNA-like in structure. Results The statistical analysis shows that the domination based parameters correctly distinguish between the trees that represent native structures and those that are not likely candidates to represent RNA. Some of the trees previously identified as candidate structures are found to be "very" RNA like, while others are not, thereby refining the space of structures likely to be found as representing secondary RNA structure. Conclusion Search algorithms are available that mine nucleotide sequence databases. However, the number of motifs identified can be quite large, making a further search for similar motif computationally difficult. Much of the work in the bioinformatics arena is toward the development of better algorithms to address the computational problem. This work, on the other hand, uses mathematical descriptors to more clearly characterize the RNA motifs and thereby reduce the corresponding search space. These preliminary findings demonstrate that graph-theoretic quantifiers utilized in fields such as computer network design hold significant promise as an added tool for genomics and proteomics.
Collapse
Affiliation(s)
- Teresa Haynes
- Mathematics and Statistics Department, Box 70663, East Tennessee State University, Johnson City, TN, USA
| | - Debra Knisley
- Mathematics and Statistics Department, Box 70663, East Tennessee State University, Johnson City, TN, USA
| | - Edith Seier
- Mathematics and Statistics Department, Box 70663, East Tennessee State University, Johnson City, TN, USA
| | - Yue Zou
- Department of Biochemistry and Molecular Biology, Quillen College of Medicine, East Tennessee State University, Johnson City, TN, USA
| |
Collapse
|
42
|
Brinda K, Surolia A, Vishveshwara S. Insights into the quaternary association of proteins through structure graphs: a case study of lectins. Biochem J 2006; 391:1-15. [PMID: 16173917 PMCID: PMC1237133 DOI: 10.1042/bj20050434] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The unique three-dimensional structure of both monomeric and oligomeric proteins is encoded in their sequence. The biological functions of proteins are dependent on their tertiary and quaternary structures, and hence it is important to understand the determinants of quaternary association in proteins. Although a large number of investigations have been carried out in this direction, the underlying principles of protein oligomerization are yet to be completely understood. Recently, new insights into this problem have been gained from the analysis of structure graphs of proteins belonging to the legume lectin family. The legume lectins are an interesting family of proteins with very similar tertiary structures but varied quaternary structures. Hence they have become a very good model with which to analyse the role of primary structures in determining the modes of quaternary association. The present review summarizes the results of a legume lectin study as well as those obtained from a similar analysis carried out here on the animal lectins, namely galectins, pentraxins, calnexin, calreticulin and rhesus rotavirus Vp4 sialic-acid-binding domain. The lectin structure graphs have been used to obtain clusters of non-covalently interacting amino acid residues at the intersubunit interfaces. The present study, performed along with traditional sequence alignment methods, has provided the signature sequence motifs for different kinds of quaternary association seen in lectins. Furthermore, the network representation of the lectin oligomers has enabled us to detect the residues which make extensive interactions ('hubs') across the oligomeric interfaces that can be targetted for interface-destabilizing mutations. The present review also provides an overview of the methodology involved in representing oligomeric protein structures as connected networks of amino acid residues. Further, it illustrates the potential of such a representation in elucidating the structural determinants of protein-protein association in general and will be of significance to protein chemists and structural biologists.
Collapse
Affiliation(s)
- K. V. Brinda
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India 560012
| | - Avadhesha Surolia
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India 560012
- Correspondence can be addressed to either of these authors (email or )
| | - Sarawathi Vishveshwara
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India 560012
- Correspondence can be addressed to either of these authors (email or )
| |
Collapse
|
43
|
Summa CM, Levitt M, Degrado WF. An atomic environment potential for use in protein structure prediction. J Mol Biol 2005; 352:986-1001. [PMID: 16126228 DOI: 10.1016/j.jmb.2005.07.054] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2004] [Revised: 06/20/2005] [Accepted: 07/20/2005] [Indexed: 11/25/2022]
Abstract
We describe the derivation and testing of a knowledge-based atomic environment potential for the modeling of protein structural energetics. An analysis of the probabilities of atomic interactions in a dataset of high-resolution protein structures shows that the probabilities of non-bonded inter-atomic contacts are not statistically independent events, and that the multi-body contact frequencies are poorly predicted from pairwise contact potentials. A pseudo-energy function is defined that measures the preferences for protein atoms to be in a given microenvironment defined by the number of contacting atoms in the environment and its atomic composition. This functional form is tested for its ability to recognize native protein structures amongst an ensemble of decoy structures and a detailed relative performance comparison is made with a number of common functions used in protein structure prediction.
Collapse
Affiliation(s)
- Christopher M Summa
- Department of Biochemistry and Biophysics, The University of Pennsylvania Medical School, Philadelphia, PA 19104-6059, USA
| | | | | |
Collapse
|
44
|
Hung LH, Ngan SC, Liu T, Samudrala R. PROTINFO: new algorithms for enhanced protein structure predictions. Nucleic Acids Res 2005; 33:W77-80. [PMID: 15980581 PMCID: PMC1160164 DOI: 10.1093/nar/gki403] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
We describe new algorithms and modules for protein structure prediction available as part of the PROTINFO web server. The modules, comparative and de novo modelling, have significantly improved back-end algorithms that were rigorously evaluated at the sixth meeting on the Critical Assessment of Protein Structure Prediction methods. We were one of four server groups invited to make an oral presentation (only the best performing groups are asked to do so). These two modules allow a user to submit a protein sequence and return atomic coordinates representing the tertiary structure of that protein. The PROTINFO server is available at .
Collapse
Affiliation(s)
| | | | | | - Ram Samudrala
- To whom correspondence should be addressed. Tel: +1 206 732 6122; Fax: +1 206 732 6055;
| |
Collapse
|
45
|
Fogolari F, Tosatto SCE. Application of MM/PBSA colony free energy to loop decoy discrimination: toward correlation between energy and root mean square deviation. Protein Sci 2005; 14:889-901. [PMID: 15772305 PMCID: PMC2253447 DOI: 10.1110/ps.041004105] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Accurate free energy estimation is needed in many predictive tasks. The molecular mechanics/Poisson-Boltzmann solvent accessible surface area (MM/PBSA) approach has proven to be accurate. However, the correlation between the estimated free energy and the distance (e.g., root mean square deviation [RMSD]) from the most stable conformation is hindered by the strong free energy dependence on minor conformational variations. In this paper, a protocol for MM/PBSA free energy estimation is designed and tested on several loop decoy sets. We show that further integration of MM/PBSA free energy estimator with the colony energy approach makes the correlation between the free energy and RMSD from the native structure apparent, for the test sets on which it could be applied. Our results suggest that (1) the MM/PBSA free energy estimator is able to detect native-like structures for most decoy sets, and (2) application of the colony energy approach greatly hampers the MM/energy strong dependence on minor conformational changes.
Collapse
Affiliation(s)
- Federico Fogolari
- Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, Piazzale Kolbe 4, 33100 Udine, Italy.
| | | |
Collapse
|
46
|
Lancaster JL, Laird AR, Fox PM, Glahn DE, Fox PT. Automated analysis of meta-analysis networks. Hum Brain Mapp 2005; 25:174-84. [PMID: 15846809 PMCID: PMC6871705 DOI: 10.1002/hbm.20135] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
The high information content in large data sets from voxel-based meta-analyses is complex, making it hard to readily resolve details. Using the meta-analysis network as a standardized data structure, network analysis algorithms can examine complex interrelationships and resolve hidden details. Two new network analysis algorithms have been adapted for use with meta-analysis networks. The first, called replicator dynamics network analysis (RDNA), analyzes co-occurrence of activations, whereas the second, called fractional similarity network analysis (FSNA), uses binary pattern matching to form similarity subnets. These two network analysis methods were evaluated using data from activation likelihood estimation (ALE)-based meta-analysis of the Stroop paradigm. Two versions of these data were evaluated, one using a more strict ALE threshold (P < 0.01) with a 13-node meta-analysis network, and the other a more lax threshold (P < 0.05) with a 22-node network. Java-based applications were developed for both RDNA and FSNA. The RDNA algorithm was modified to provide multiple subnets or maximal cliques for meta-analysis networks. Three different similarity measures were evaluated with FSNA to form subsets of nodes and experiments. RDNA provides a means to gauge importance of metanalysis subnets and complements FSNA, which provides a more comprehensive assessment of node similarity subsets, experiment similarity subsets, and overall node-to-factors similarity. The need to use both presence and absence of activations was an important finding in similarity analyses. FSNA revealed details from the pooled Stroop meta-analysis that would otherwise require separate highly filtered meta-analyses. These new analysis tools demonstrate how network analysis strategies can simplify greatly and enhance voxel-based meta-analyses.
Collapse
Affiliation(s)
- Jack L Lancaster
- Research Imaging Center, University of Texas Health Science Center at San Antonio, San Antonio, Texas 78229-3900, USA.
| | | | | | | | | |
Collapse
|
47
|
Artymiuk PJ, Spriggs RV, Willett P. Graph theoretic methods for the analysis of structural relationships in biological macromolecules. ACTA ACUST UNITED AC 2005. [DOI: 10.1002/asi.20140] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
48
|
Reichmann D, Rahat O, Albeck S, Meged R, Dym O, Schreiber G. The modular architecture of protein-protein binding interfaces. Proc Natl Acad Sci U S A 2004; 102:57-62. [PMID: 15618400 PMCID: PMC544062 DOI: 10.1073/pnas.0407280102] [Citation(s) in RCA: 183] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Protein-protein interactions are essential for life. Yet, our understanding of the general principles governing binding is not complete. In the present study, we show that the interface between proteins is built in a modular fashion; each module is comprised of a number of closely interacting residues, with few interactions between the modules. The boundaries between modules are defined by clustering the contact map of the interface. We show that mutations in one module do not affect residues located in a neighboring module. As a result, the structural and energetic consequences of the deletion of entire modules are surprisingly small. To the contrary, within their module, mutations cause complex energetic and structural consequences. Experimentally, this phenomenon is shown on the interaction between TEM1-beta-lactamase and beta-lactamase inhibitor protein (BLIP) by using multiple-mutant analysis and x-ray crystallography. Replacing an entire module of five interface residues with Ala created a large cavity in the interface, with no effect on the detailed structure of the remaining interface. The modular architecture of binding sites, which resembles human engineering design, greatly simplifies the design of new protein interactions and provides a feasible view of how these interactions evolved.
Collapse
Affiliation(s)
- D Reichmann
- Departments of Biological Chemistry and Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | | | | | | | | | | |
Collapse
|
49
|
Kingsford CL, Chazelle B, Singh M. Solving and analyzing side-chain positioning problems using linear and integer programming. Bioinformatics 2004; 21:1028-36. [PMID: 15546935 DOI: 10.1093/bioinformatics/bti144] [Citation(s) in RCA: 117] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Side-chain positioning is a central component of homology modeling and protein design. In a common formulation of the problem, the backbone is fixed, side-chain conformations come from a rotamer library, and a pairwise energy function is optimized. It is NP-complete to find even a reasonable approximate solution to this problem. We seek to put this hardness result into practical context. RESULTS We present an integer linear programming (ILP) formulation of side-chain positioning that allows us to tackle large problem sizes. We relax the integrality constraint to give a polynomial-time linear programming (LP) heuristic. We apply LP to position side chains on native and homologous backbones and to choose side chains for protein design. Surprisingly, when positioning side chains on native and homologous backbones, optimal solutions using a simple, biologically relevant energy function can usually be found using LP. On the other hand, the design problem often cannot be solved using LP directly; however, optimal solutions for large instances can still be found using the computationally more expensive ILP procedure. While different energy functions also affect the difficulty of the problem, the LP/ILP approach is able to find optimal solutions. Our analysis is the first large-scale demonstration that LP-based approaches are highly effective in finding optimal (and successive near-optimal) solutions for the side-chain positioning problem.
Collapse
Affiliation(s)
- Carleton L Kingsford
- Department of Computer Science and the Lewis-Sigler Institute for Integrative Genomics, Princeton University Princeton, NJ 08544, USA
| | | | | |
Collapse
|
50
|
Comparative Protein Structure Modeling and its Applications to Drug Discovery. ANNUAL REPORTS IN MEDICINAL CHEMISTRY 2004. [DOI: 10.1016/s0065-7743(04)39020-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
|