1
|
Hsieh YC, Delarue M, Orland H, Koehl P. Analyzing the Geometry and Dynamics of Viral Structures: A Review of Computational Approaches Based on Alpha Shape Theory, Normal Mode Analysis, and Poisson-Boltzmann Theories. Viruses 2023; 15:1366. [PMID: 37376665 DOI: 10.3390/v15061366] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 06/05/2023] [Accepted: 06/09/2023] [Indexed: 06/29/2023] Open
Abstract
The current SARS-CoV-2 pandemic highlights our fragility when we are exposed to emergent viruses either directly or through zoonotic diseases. Fortunately, our knowledge of the biology of those viruses is improving. In particular, we have more and more structural information on virions, i.e., the infective form of a virus that includes its genomic material and surrounding protective capsid, and on their gene products. It is important to have methods that enable the analyses of structural information on such large macromolecular systems. We review some of those methods in this paper. We focus on understanding the geometry of virions and viral structural proteins, their dynamics, and their energetics, with the ambition that this understanding can help design antiviral agents. We discuss those methods in light of the specificities of those structures, mainly that they are huge. We focus on three of our own methods based on the alpha shape theory for computing geometry, normal mode analyses to study dynamics, and modified Poisson-Boltzmann theories to study the organization of ions and co-solvent and solvent molecules around biomacromolecules. The corresponding software has computing times that are compatible with the use of regular desktop computers. We show examples of their applications on some outer shells and structural proteins of the West Nile Virus.
Collapse
Affiliation(s)
- Yin-Chen Hsieh
- Institute for Arctic and Marine Biology, Department of Biosciences, Fisheries, and Economics, UiT The Arctic University of Norway, 9037 Tromso, Norway
| | - Marc Delarue
- Institut Pasteur, Université Paris-Cité and CNRS, UMR 3528, Unité Architecture et Dynamique des Macromolécules Biologiques, 75015 Paris, France
| | - Henri Orland
- Institut de Physique Théorique, CEA, CNRS, Université Paris-Saclay, 91191 Gif-sur-Yvette, France
| | - Patrice Koehl
- Department of Computer Science, University of California, Davis, CA 95616, USA
| |
Collapse
|
2
|
Koehl P, Akopyan A, Edelsbrunner H. Computing the Volume, Surface Area, Mean, and Gaussian Curvatures of Molecules and Their Derivatives. J Chem Inf Model 2023; 63:973-985. [PMID: 36638318 PMCID: PMC9930125 DOI: 10.1021/acs.jcim.2c01346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Geometry is crucial in our efforts to comprehend the structures and dynamics of biomolecules. For example, volume, surface area, and integrated mean and Gaussian curvature of the union of balls representing a molecule are used to quantify its interactions with the water surrounding it in the morphometric implicit solvent models. The Alpha Shape theory provides an accurate and reliable method for computing these geometric measures. In this paper, we derive homogeneous formulas for the expressions of these measures and their derivatives with respect to the atomic coordinates, and we provide algorithms that implement them into a new software package, AlphaMol. The only variables in these formulas are the interatomic distances, making them insensitive to translations and rotations. AlphaMol includes a sequential algorithm and a parallel algorithm. In the parallel version, we partition the atoms of the molecule of interest into 3D rectangular blocks, using a kd-tree algorithm. We then apply the sequential algorithm of AlphaMol to each block, augmented by a buffer zone to account for atoms whose ball representations may partially cover the block. The current parallel version of AlphaMol leads to a 20-fold speed-up compared to an independent serial implementation when using 32 processors. For instance, it takes 31 s to compute the geometric measures and derivatives of each atom in a viral capsid with more than 26 million atoms on 32 Intel processors running at 2.7 GHz. The presence of the buffer zones, however, leads to redundant computations, which ultimately limit the impact of using multiple processors. AlphaMol is available as an OpenSource software.
Collapse
Affiliation(s)
- Patrice Koehl
- Department
of Computer Science, University of California, Davis, California95616, United States,
| | | | | |
Collapse
|
3
|
Schuh A, Koehl P, Sesselmann S, Goyal T, Benditz A. INCIDENTAL INTRAOSSEOUS CALCANEAL LIPOMA IN A PATIENT SUFFERING FROM PLANTARFASZIITIS. Georgian Med News 2022:37-39. [PMID: 36427838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Intraosseous calcaneal lipoma is a rare benign bone tumor. The incidence of intraosseous lipoma involving the calcaneus has been noted to account for fewer than 8-15% of all intraosseous lipoma. The etiology of the lesion is unknown. A post-traumatic secondary bone reaction, healing bone infarct, and benign neoplasm have been discussed. The symptoms can be nonspecific, varying from dull, intermittent pain to activity-related plantar pain. This pain can predictably be misdiagnosed as plantar fasciitis. We present the case of a 49-year-old male patient suffering from plantar fasciitis for three months and incidental asymptomatic intraosseous calcaneal lipoma, which was diagnosed by x-ray and CT scan. As the patient was out of complaints, the typical CT findings we saw no indication for biopsy but recommended regular CT and MRI controls.;
Collapse
Affiliation(s)
- A Schuh
- 1Hospital of trauma surgery, Department of musculoskeletal research, Marktredwitz Hospital, Germany
| | - P Koehl
- 2Hospital of trauma surgery, Marktredwitz Hospital, Germany
| | - S Sesselmann
- 3Institute for Medical Engineering, OTH Technical University of Applied Sciences Amberg-Weiden, Germany
| | - T Goyal
- 4Department of Orthopaedics, All India Institute of Medical Sciences, Bathinda, Punjab, India
| | - A Benditz
- 5Hospital of trauma surgery, Department of orthopedics. Marktredwitz Hospital, Germany
| |
Collapse
|
4
|
Schuh A, Koehl P, Sesselmann S, Goyal T, Benditz A. INTRAMUSCULAR MYXOMA OF THE BUTTOCK- A CASE REPORT. Georgian Med News 2022:40-42. [PMID: 36427839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Intramuscular myxoma (IM) is a benign, soft tissue neoplasm of mesenchymal origin. IM is rare, with an incidence of between 0.1 and 0.13 in every 100,000 individuals. Onset is usually between the fourth and seventh decades of life, predominantly in women (70%). The thigh is the common site of involvement seen in 51% patients, followed by upper arm (9%), calf (7%), and rarely in buttocks. We present the case of a 63-year-old female patient with a 6-month history of a growing IM of the right buttock. Due to rapid tumor growth resection of the tumor was indicated to obtain histopathological examination and to rule out malignancy. Marginal surgical removal was performed. Histopathological examination brought the diagnosis of a big intramuscular myxoma. There is no recurrence at latest follow-up.
Collapse
Affiliation(s)
- A Schuh
- 1Hospital of trauma surgery, Department of musculoskeletal research, Marktredwitz Hospital, Germany
| | - P Koehl
- 2Hospital of trauma surgery, Marktredwitz Hospital, Germany
| | - S Sesselmann
- 3Institute for Medical Engineering, OTH Technical University of Applied Sciences Amberg-Weiden, Germany
| | - T Goyal
- 4Department of Orthopaedics, All India Institute of Medical Sciences, Bathinda, Punjab, India
| | - A Benditz
- 5Hospital of trauma surgery, Department of orthopedics. Marktredwitz Hospital, Germany
| |
Collapse
|
5
|
Koehl P, Orland H. Sampling constrained stochastic trajectories using Brownian bridges. J Chem Phys 2022; 157:054105. [DOI: 10.1063/5.0102295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
We present a new method to sample conditioned trajectories of a system evolving under Langevin dynamics, based on Brownian bridges. <p>The trajectories are conditioned to end at a certain point (or in a certain region) in space.</p> <p>The bridge equations can be recast exactly in the form of a non linear stochastic integro-differential equation.</p> <p>This equation can be very well approximated when the trajectories are closely bundled together in space, i.e. at low temperature, or for transition paths. The approximate equation can be solved iteratively, using a fixed point method.</p> <p>We discuss how to choose the initial trajectories and show some examples of the performance of this method on some simple problems.</p> <p>The method allows to generate conditioned trajectories with a high accuracy.
Collapse
Affiliation(s)
- Patrice Koehl
- Computer Science and Genome Center, University of California Davis, United States of America
| | - Henri Orland
- Institut de Physique Theorique, CEA, Saclay, France
| |
Collapse
|
6
|
Koehl P, Orland H, Delarue M. Parameterizing elastic network models to capture the dynamics of proteins. J Comput Chem 2021; 42:1643-1661. [PMID: 34117647 DOI: 10.1002/jcc.26701] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 12/14/2020] [Accepted: 05/23/2021] [Indexed: 11/09/2022]
Abstract
Coarse-grained normal mode analyses of protein dynamics rely on the idea that the geometry of a protein structure contains enough information for computing its fluctuations around its equilibrium conformation. This geometry is captured in the form of an elastic network (EN), namely a network of edges between its residues. The normal modes of a protein are then identified with the normal modes of its EN. Different approaches have been proposed to construct ENs, focusing on the choice of the edges that they are comprised of, and on their parameterizations by the force constants associated with those edges. Here we propose new tools to guide choices on these two facets of EN. We study first different geometric models for ENs. We compare cutoff-based ENs, whose edges have lengths that are smaller than a cutoff distance, with Delaunay-based ENs and find that the latter provide better representations of the geometry of protein structures. We then derive an analytical method for the parameterization of the EN such that its dynamics leads to atomic fluctuations that agree with experimental B-factors. To limit overfitting, we attach a parameter referred to as flexibility constant to each atom instead of to each edge in the EN. The parameterization is expressed as a non-linear optimization problem whose parameters describe both rigid-body and internal motions. We show that this parameterization leads to improved ENs, whose dynamics mimic MD simulations better than ENs with uniform force constants, and reduces the number of normal modes needed to reproduce functional conformational changes.
Collapse
Affiliation(s)
- Patrice Koehl
- Department of Computer Sciences and Genome Center, University of California, Davis, California, USA
| | - Henri Orland
- Institut de Physique Théorique, Université Paris-Saclay, Gif sur Yvette, France
| | - Marc Delarue
- Unité de Dynamique Structurale des Macromolécules, Institut Pasteur, UMR 3528 du CNRS, Paris, France
| |
Collapse
|
7
|
Koehl P, Delarue M, Orland H. Simultaneous Identification of Multiple Binding Sites in Proteins: A Statistical Mechanics Approach. J Phys Chem B 2021; 125:5052-5067. [PMID: 33973782 DOI: 10.1021/acs.jpcb.1c02658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We present an extension of the Poisson-Boltzmann model in which the solute of interest is immersed in an assembly of self-orienting Langevin water dipoles, anions, cations, and hydrophobic molecules, all of variable densities. Interactions between charges are controlled by electrostatics, while hydrophobic interactions are modeled with a Yukawa potential. We impose steric constraints by assuming that the system is represented on a cubic lattice. We also assume incompressibility; i.e., all sites of the lattice are occupied. This model, which we refer to as the Hydrophobic Dipolar Poisson-Boltzmann Langevin (HDPBL) model, leads to a system of two equations whose solutions give the water dipole, salt, and hydrophobic molecule densities, all of them in the presence of the others in a self-consistent way. We use those to study the organization of the ions, cosolvent, and solvent molecules around proteins. In particular, peaks of densities are expected to reveal, simultaneously, the presence of compatible binding sites of different kinds on a protein. We have tested and validated the ability of HDPBL to detect pockets in proteins that bind to hydrophobic ligands, polar ligands, and charged small probes as well as to characterize the binding sites of lipids for membrane proteins.
Collapse
Affiliation(s)
- Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, California 95616, United States
| | - Marc Delarue
- Architecture et Dynamique des Macromolécules Biologiques, Département de Biologie Structurale et Chimie, UMR 3528 du CNRS, Institut Pasteur, 75015 Paris, France
| | - Henri Orland
- Institut de Physique Théorique, Université Paris-Saclay, CEA, 91191 Gif/Yvette Cedex, France
| |
Collapse
|
8
|
Koehl P, Orland H. Fast computation of exact solutions of generic and degenerate assignment problems. Phys Rev E 2021; 103:042101. [PMID: 34005932 DOI: 10.1103/physreve.103.042101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 03/01/2021] [Indexed: 11/07/2022]
Abstract
The linear assignment problem is a fundamental problem in combinatorial optimization with a wide range of applications, from operational research to data science. It consists of assigning "agents" to "tasks" on a one-to-one basis, while minimizing the total cost associated with the assignment. While many exact algorithms have been developed to identify such an optimal assignment, most of these methods are computationally prohibitive for large size problems. In this paper, we propose an alternative approach to solving the assignment problem using techniques adapted from statistical physics. Our first contribution is to fully describe this formalism, including all the proofs of its main claims. In particular we derive a strongly concave effective free-energy function that captures the constraints of the assignment problem at a finite temperature. We prove that this free energy decreases monotonically as a function of β, the inverse of temperature, to the optimal assignment cost, providing a robust framework for temperature annealing. We prove also that for large enough β values the exact solution to the generic assignment problem can be derived using simple roundoff to the nearest integer of the elements of the computed assignment matrix. Our second contribution is to derive a provably convergent method to handle degenerate assignment problems, with a characterization of those problems. We describe computer implementations of our framework that are optimized for parallel architectures, one based on CPU, the other based on GPU. We show that the latter enables solving large assignment problems (of the orders of a few 10 000s) in computing clock times of the orders of minutes.
Collapse
Affiliation(s)
- Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, California 95616, USA
| | - Henri Orland
- Institut de Physique Théorique, Université Paris-Saclay, CNRS, CEA, 91191 Gif/Yvette Cedex, France
| |
Collapse
|
9
|
Abstract
Optimal transport (OT) has become a discipline by itself that offers solutions to a wide range of theoretical problems in probability and mathematics with applications in several applied fields such as imaging sciences, machine learning, and in data sciences in general. The traditional OT problem suffers from a severe limitation: its balance condition imposes that the two distributions to be compared be normalized and have the same total mass. However, it is important for many applications to be able to relax this constraint and allow for mass creation and/or destruction. This is true, for example, in all problems requiring partial matching. In this paper, we propose an approach to solving a generalized version of the OT problem, which we refer to as the discrete variable-mass optimal-transport (VMOT) problem, using techniques adapted from statistical physics. Our first contribution is to fully describe this formalism, including all the proofs of its main claims. In particular, we derive a strongly concave effective free-energy function that captures the constraints of the VMOT problem at a finite temperature. From its maximum we derive a weak distance (i.e., a divergence) between possibly unbalanced distribution functions. The temperature-dependent OT distance decreases monotonically to the standard variable-mass OT distance, providing a robust framework for temperature annealing. Our second contribution is to show that the implementation of this formalism has the same properties as the regularized OT algorithms in time complexity, making it a competitive approach to solving the VMOT problem. We illustrate applications of the framework to the problem of partial two- and three-dimensional shape-matching problems.
Collapse
Affiliation(s)
- Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, California 95616, USA
| | - Marc Delarue
- Unité de Dynamique Structurale des Macromolécules, Department of Structural Biology and Chemistry, UMR 3528 du CNRS, Institut Pasteur, 75015 Paris, France
| | - Henri Orland
- Institut de Physique Théorique, Université Paris-Saclay, CEA, 91191 Gif/Yvette Cedex, France
| |
Collapse
|
10
|
Koehl P, Delarue M, Orland H. Statistical Physics Approach to the Optimal Transport Problem. Phys Rev Lett 2019; 123:040603. [PMID: 31491256 DOI: 10.1103/physrevlett.123.040603] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 06/03/2019] [Indexed: 06/10/2023]
Abstract
Originally defined for the optimal allocation of resources, optimal transport (OT) has found many theoretical and practical applications in multiple domains of science and physics. In this Letter we develop a new method for solving the discrete version of this problem using techniques derived from statistical physics. We derive a strongly concave free energy function that captures the constraints of the OT problem at a finite temperature. Its maximum defines an optimal transport plan, or registration between the two discrete probability measures that are compared, as well as a pseudodistance between those measures that satisfies the triangular inequalities. The computation of this pseudodistance is fast and numerically stable. The temperature dependent OT pseudodistance is shown to decrease monotonically with respect to the inverse of the temperature and to converge to the standard OT distance at zero temperature, providing a robust framework for temperature annealing. We illustrate applications of this framework to the problem of image comparison.
Collapse
Affiliation(s)
- Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, California 95616, USA
| | - Marc Delarue
- Unité de Dynamique Structurale des Macromolécules, Department of Structural Biology and Chemistry, UMR 3528 du CNRS, Institut Pasteur, 75015 Paris, France
| | - Henri Orland
- Institut de Physique Théorique, CEA-Saclay, 91191 Gif/Yvette Cedex, France
| |
Collapse
|
11
|
Abstract
Optimal transport (OT) has become a discipline by itself that offers solutions to a wide range of theoretical problems in probability and mathematics. Despite its appealing theoretical properties, solving the OT problem involves the resolution of a linear program whose computational cost can quickly become prohibitive whenever the size of the problem exceeds a few hundred points. The recent introduction of entropy regularization, however, has led to the development of fast algorithms for solving an approximate OT problem. The successes of those algorithms have resulted in a popularization of the applications of OT in several applied fields such as imaging sciences and machine learning, and in data sciences in general. Problems remain, however, as to the numerical convergence of those regularized approximations towards the actual OT solution. In addition, the physical meaning of this regularization is unclear. In this paper, we propose an approach to solving the discrete OT problem using techniques adapted from statistical physics. Our first contribution is to fully describe this formalism, including all the proofs of its main claims. In particular we derive a strongly concave effective free energy function that captures the constraints of the optimal transport problem at a finite temperature. Its maximum defines a pseudo distance between the two set of weighted points that are compared, which satisfies the triangular inequalities. The temperature dependent OT pseudo distance decreases monotonically to the standard OT distance, providing a robust framework for temperature annealing. Our second contribution is to show that the implementation of this formalism has the same properties as the regularized OT algorithms in time complexity, making it a competitive approach to solving the OT problem. We illustrate applications of the framework to the problem of protein fold recognition based on sequence information only.
Collapse
Affiliation(s)
- Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, California 95616, USA
| | - Marc Delarue
- Unité de Dynamique Structurale des Macromolécules, Department of Structural Biology and Chemistry, UMR 3528 du CNRS, Institut Pasteur, 75015 Paris, France
| | - Henri Orland
- Institut de Physique Théorique, CEA-Saclay, 91191 Gif/Yvette Cedex, France
| |
Collapse
|
12
|
Abstract
Clustering large and complex data sets whose partitions may adopt arbitrary shapes remains a difficult challenge. Part of this challenge comes from the difficulty in defining a similarity measure between the data points that captures the underlying geometry of those data points. In this paper, we propose an algorithm, DCG++ that generates such a similarity measure that is data-driven and ultrametric. DCG++ uses Markov Chain Random Walks to capture the intrinsic geometry of data, scans possible scales, and combines all this information using a simple procedure that is shown to generate an ultrametric. We validate the effectiveness of this similarity measure within the context of clustering on synthetic data with complex geometry, on a real-world data set containing segmented audio records of frog calls described by mel-frequency cepstral coefficients, as well as on an image segmentation problem. The experimental results show a significant improvement on performance with the DCG-based ultrametric compared to using an empirical distance measure.
Collapse
Affiliation(s)
- Jiahui Guan
- Department of Statistics, University of California Davis, Davis, CA, United States of America
| | - Fushing Hsieh
- Department of Statistics, University of California Davis, Davis, CA, United States of America
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California Davis, Davis, CA, United States of America
- * E-mail:
| |
Collapse
|
13
|
Delarue M, Koehl P. Combined approaches from physics, statistics, and computer science for ab initio protein structure prediction: ex unitate vires (unity is strength)? F1000Res 2018; 7. [PMID: 30079234 PMCID: PMC6058471 DOI: 10.12688/f1000research.14870.1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/19/2018] [Indexed: 11/20/2022] Open
Abstract
Connecting the dots among the amino acid sequence of a protein, its structure, and its function remains a central theme in molecular biology, as it would have many applications in the treatment of illnesses related to misfolding or protein instability. As a result of high-throughput sequencing methods, biologists currently live in a protein sequence-rich world. However, our knowledge of protein structure based on experimental data remains comparatively limited. As a consequence, protein structure prediction has established itself as a very active field of research to fill in this gap. This field, once thought to be reserved for theoretical biophysicists, is constantly reinventing itself, borrowing ideas informed by an ever-increasing assembly of scientific domains, from biology, chemistry, (statistical) physics, mathematics, computer science, statistics, bioinformatics, and more recently data sciences. We review the recent progress arising from this integration of knowledge, from the development of specific computer architecture to allow for longer timescales in physics-based simulations of protein folding to the recent advances in predicting contacts in proteins based on detection of coevolution using very large data sets of aligned protein sequences.
Collapse
Affiliation(s)
- Marc Delarue
- Unité Dynamique Structurale des Macromolécules, Institut Pasteur, and UMR 3528 du CNRS, Paris, France
| | - Patrice Koehl
- Department of Computer Science, Genome Center, University of California, Davis, Davis, California, USA
| |
Collapse
|
14
|
Affiliation(s)
- Patrice Koehl
- Department of Computer Sciences and Genome Center, University of California, Davis, California 95616, United States
| |
Collapse
|
15
|
Abstract
Quantitative reasoning and techniques are increasingly ubiquitous across the life sciences. However, new graduate researchers with a biology background are often not equipped with the skills that are required to utilize such techniques correctly and efficiently. In parallel, there are increasing numbers of engineers, mathematicians, and physical scientists interested in studying problems in biology with only basic knowledge of this field. Students from such varied backgrounds can struggle to engage proactively together to tackle problems in biology. There is therefore a need to establish bridges between those disciplines. It is our proposal that the beginning of graduate school is the appropriate time to initiate those bridges through an interdisciplinary short course. We have instigated an intensive 10-day course that brought together new graduate students in the life sciences from across departments within the National University of Singapore. The course aimed at introducing biological problems as well as some of the quantitative approaches commonly used when tackling those problems. We have run the course for three years with over 100 students attending. Building on this experience, we share 11 quick tips on how to run such an effective, interdisciplinary short course for new graduate students in the biosciences.
Collapse
Affiliation(s)
- Timothy E. Saunders
- Department of Biological Sciences, National University of Singapore, Singapore
- Mechanobiology Institute, National University of Singapore, Singapore
- * E-mail:
| | - Cynthia Y. He
- Department of Biological Sciences, National University of Singapore, Singapore
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California Davis, Davis, California, United States of America
| | - L. L. Sharon Ong
- Singapore MIT Alliance for Research and Technology Centre, Singapore
| | - Peter T. C. So
- Singapore MIT Alliance for Research and Technology Centre, Singapore
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, United States of America
| |
Collapse
|
16
|
Abstract
Background Alignment-free methods for comparing protein sequences have proved to be viable alternatives to approaches that first rely on an alignment of the sequences to be compared. Much work however need to be done before those methods provide reliable fold recognition for proteins whose sequences share little similarity. We have recently proposed an alignment-free method based on the concept of string kernels, SeqKernel (Nojoomi and Koehl, BMC Bioinformatics, 2017, 18:137). In this previous study, we have shown that while Seqkernel performs better than standard alignment-based methods, its applications are potentially limited, because of biases due mostly to sequence length effects. Methods In this study, we propose improvements to SeqKernel that follows two directions. First, we developed a weighted version of the kernel, WSeqKernel. Second, we expand the concept of string kernels into a novel framework for deriving information on amino acids from protein sequences. Results Using a dataset that only contains remote homologs, we have shown that WSeqKernel performs remarkably well in fold recognition experiments. We have shown that with the appropriate weighting scheme, we can remove the length effects on the kernel values. WSeqKernel, just like any alignment-based sequence comparison method, depends on a substitution matrix. We have shown that this matrix can be optimized so that sequence similarity scores correlate well with structure similarity scores. Starting from no information on amino acid similarity, we have shown that we can derive a scoring matrix that echoes the physico-chemical properties of amino acids. Conclusion We have made progress in characterizing and parametrizing string kernels as alignment-based methods for comparing protein sequences, and we have shown that they provide a framework for extracting sequence information from structure. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1795-5) contains supplementary material, which is available to authorized users.
Collapse
|
17
|
Abstract
We propose a novel stochastic method to generate Brownian paths conditioned to start at an initial point and end at a given final point during a fixed time tf under a given potential U(x). These paths are sampled with a probability given by the overdamped Langevin dynamics. We show that these paths can be exactly generated by a local stochastic partial differential equation. This equation cannot be solved in general but we present several approximations that are valid either in the low temperature regime or in the presence of barrier crossing. We show that this method warrants the generation of statistically independent transition paths. It is computationally very efficient. We illustrate the method first on two simple potentials, the two-dimensional Mueller potential and the Mexican hat potential, and then on the multi-dimensional problem of conformational transitions in proteins using the "Mixed Elastic Network Model" as a benchmark.
Collapse
Affiliation(s)
- Marc Delarue
- Unité de Dynamique Structurale des Macromolécules, UMR 3528 du CNRS, Institut Pasteur, 75015 Paris, France
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, California 95616, USA
| | - Henri Orland
- Institut de Physique Théorique, CEA, URA 2306 du CNRS, F-91191 Gif-sur-Yvette, France and Beijing Computational Science Research Center, Building 9, East Zone, ZPark II, No.10 East Xibeiwang Road, Haidian District, Beijing 100193, China
| |
Collapse
|
18
|
|
19
|
Koehl P, Ling C, Lefèvre JF. Improving the performance of linear prediction on magnetic resonance signals by oversampling. ACTA ACUST UNITED AC 2017. [DOI: 10.1051/jcp/1994910595] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
|
20
|
Abstract
In this paper, we propose a new method for computing a distance between two shapes embedded in three-dimensional space. Instead of comparing directly the geometric properties of the two shapes, we measure the cost of deforming one of the two shapes into the other. The deformation is computed as the geodesic between the two shapes in the space of shapes. The geodesic is found as a minimizer of the Onsager-Machlup action, based on an elastic energy for shapes that we define. Its length is set to be the integral of the action along that path; it defines an intrinsic quasi-metric on the space of shapes. We illustrate applications of our method to geometric morphometrics using three datasets representing bones and teeth of primates. Experiments on these datasets show that the variational quasi-metric we have introduced performs remarkably well both in shape recognition and in identifying evolutionary patterns, with success rates similar to, and in some cases better than, those obtained by expert observers.
Collapse
Affiliation(s)
- Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, CA 95616, USA
| |
Collapse
|
21
|
Abstract
BACKGROUND The amino acid sequence of a protein is the blueprint from which its structure and ultimately function can be derived. Therefore, sequence comparison methods remain essential for the determination of similarity between proteins. Traditional approaches for comparing two protein sequences begin with strings of letters (amino acids) that represent the sequences, before generating textual alignments between these strings and providing scores for each alignment. When the similitude between the two protein sequences to be compared is low however, the quality of the corresponding sequence alignment is usually poor, leading to poor performance for the recognition of similarity. RESULTS In this study, we develop an alignment free alternative to these methods that is based on the concept of string kernels. Starting from recently proposed kernels on the discrete space of protein sequences (Shen et al, Found. Comput. Math., 2013,14:951-984), we introduce our own version, SeqKernel. Its implementation depends on two parameters, a coefficient that tunes the substitution matrix and the maximum length of k-mers that it includes. We provide an exhaustive analysis of the impacts of these two parameters on the performance of SeqKernel for fold recognition. We show that with the right choice of parameters, use of the SeqKernel similarity measure improves fold recognition compared to the use of traditional alignment-based methods. We illustrate the application of SeqKernel to inferring phylogeny on RNA polymerases and show that it performs as well as methods based on multiple sequence alignments. CONCLUSION We have presented and characterized a new alignment free method based on a mathematical kernel for scoring the similarity of protein sequences. We discuss possible improvements of this method, as well as an extension of its applications to other modeling methods that rely on sequence comparison.
Collapse
Affiliation(s)
- Saghi Nojoomi
- Biotechnology program, University of California, Davis, 1, Shields Avenue, Davis, CA, 95616 USA
| | - Patrice Koehl
- Department of Computer Science and Genome Center, 1, Shields Avenue, Davis, CA, 95616 USA
| |
Collapse
|
22
|
Abstract
Dynamics is essential to the biological functions of many bio-molecules, yet our knowledge of dynamics remains fragmented. Experimental techniques for studying bio-molecules either provide high resolution information on static conformations of the molecule or provide low-resolution, ensemble information that does not shed light on single molecule dynamics. In parallel, bio-molecular dynamics occur at time scale that are not yet attainable through detailed simulation methods. These limitations are especially noticeable when studying transition paths. To address this issue, we report in this paper two methods that derive meaningful trajectories for proteins between two of their conformations. The first method, MinActionPath, uses approximations of the potential energy surface for the molecule to derive an analytical solution of the equations of motion related to the concept of minimum action path. The second method, RelaxPath, follows the same principle of minimum action path but implements a more sophisticated potential, including a mixed elastic potential and a collision term to alleviate steric clashes. Using this new potential, the equations of motion cannot be solved analytically. We have introduced a relaxation method for solving those equations. We describe both the theories behind the two methods and their implementations, focusing on the specific techniques we have used that make those implementations amenable to study large molecular systems. We have illustrated the performance of RelaxPath on simple 2D systems. We have also compared MinActionPath and RelaxPath to other methods for generating transition paths on a well suited test set of large proteins, for which the end points of the trajectories as well as an intermediate conformation between those end points are known. We have shown that RelaxPath outperforms those other methods, including MinActionPath, in its ability to generate trajectories that get close to the known intermediates. We have also shown that the structures along the RelaxPath trajectories remain protein-like. Open source versions of the two programs MinActionPath and RelaxPath are available by request.
Collapse
Affiliation(s)
- Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, California 95616, USA
| |
Collapse
|
23
|
Koehl P, Poitevin F, Navaza R, Delarue M. The Renormalization Group and Its Applications to Generating Coarse-Grained Models of Large Biological Molecular Systems. J Chem Theory Comput 2017; 13:1424-1438. [PMID: 28170254 DOI: 10.1021/acs.jctc.6b01136] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Understanding the dynamics of biomolecules is the key to understanding their biological activities. Computational methods ranging from all-atom molecular dynamics simulations to coarse-grained normal-mode analyses based on simplified elastic networks provide a general framework to studying these dynamics. Despite recent successes in studying very large systems with up to a 100,000,000 atoms, those methods are currently limited to studying small- to medium-sized molecular systems due to computational limitations. One solution to circumvent these limitations is to reduce the size of the system under study. In this paper, we argue that coarse-graining, the standard approach to such size reduction, must define a hierarchy of models of decreasing sizes that are consistent with each other, i.e., that each model contains the information of the dynamics of its predecessor. We propose a new method, Decimate, for generating such a hierarchy within the context of elastic networks for normal-mode analysis. This method is based on the concept of the renormalization group developed in statistical physics. We highlight the details of its implementation, with a special focus on its scalability to large systems of up to millions of atoms. We illustrate its application on two large systems, the capsid of a virus and the ribosome translation complex. We show that highly decimated representations of those systems, containing down to 1% of their original number of atoms, still capture qualitatively and quantitatively their dynamics. Decimate is available as an OpenSource resource.
Collapse
Affiliation(s)
- Patrice Koehl
- Department of Computer Sciences and Genome Center, University of California, Davis , Davis, California 95616, United States
| | - Frédéric Poitevin
- Department of Structural Biology, Stanford University , Stanford, California 94305, United States.,Stanford PULSE Institute, SLAC National Accelerator Laboratory, Standford University , Menlo Park, California 94025, United States
| | - Rafael Navaza
- Platform of Crystallogenesis and Crystallography, CiTech, Institut Pasteur , 75015 Paris, France
| | - Marc Delarue
- Unité de Dynamique Structurale des Macromolécules, UMR 3528 du CNRS, Institut Pasteur , 75015 Paris, France
| |
Collapse
|
24
|
Fourati Z, Ruza RR, Laverty D, Drege E, Delarue-Cochin S, Joseph D, Koehl P, Smart T, Delarue M. Barbiturates Bind in the GLIC Ion Channel Pore and Cause Inhibition by Stabilizing a Shut State. Biophys J 2017. [DOI: 10.1016/j.bpj.2016.11.2984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
25
|
Abstract
Computational protein sequence design is the rational design based on computer simulation of new protein molecules to fold to target three-dimensional structures, with the ultimate goal of designing novel functions. It requires a good understanding of the thermodynamic equilibrium properties of the protein of interest. Here, we consider the contribution of the solvent to the stability of the protein. We describe implicit solvent models, focusing on approximations of their nonpolar components using geometric potentials. We consider the surface area (SA) model in which the nonpolar solvation free energy is expressed as a sum of the contributions of all atoms, assumed to be proportional to their accessible surface areas (ASAs). We briefly review existing numerical and analytical approaches that compute the ASA. We describe in more detail the alpha shape theory as it provides a unifying mathematical framework that enables the analytical calculations of the surface area of a macromolecule represented as a union of balls.
Collapse
Affiliation(s)
- Jie Li
- Computational and Systems Biology Group, Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, CA, 95616, USA.
| |
Collapse
|
26
|
Hsieh YC, Poitevin F, Delarue M, Koehl P. Comparative Normal Mode Analysis of the Dynamics of DENV and ZIKV Capsids. Front Mol Biosci 2016; 3:85. [PMID: 28083537 PMCID: PMC5187361 DOI: 10.3389/fmolb.2016.00085] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2016] [Accepted: 12/12/2016] [Indexed: 11/13/2022] Open
Abstract
Key steps in the life cycle of a virus, such as the fusion event as the virus infects a host cell and its maturation process, relate to an intricate interplay between the structure and the dynamics of its constituent proteins, especially those that define its capsid, much akin to an envelope that protects its genomic material. We present a comprehensive, comparative analysis of such interplay for the capsids of two viruses from the flaviviridae family, Dengue (DENV) and Zika (ZIKV). We use for that purpose our own software suite, DD-NMA, which is based on normal mode analysis. We describe the elements of DD-NMA that are relevant to the analysis of large systems, such as virus capsids. In particular, we introduce our implementation of simplified elastic networks and justify their parametrization. Using DD-NMA, we illustrate the importance of packing interactions within the virus capsids on the dynamics of the E proteins of DENV and ZIKV. We identify differences between the computed atomic fluctuations of the E proteins in DENV and ZIKV and relate those differences to changes observed in their high resolution structures. We conclude with a discussion on additional analyses that are needed to fully characterize the dynamics of the two viruses.
Collapse
Affiliation(s)
- Yin-Chen Hsieh
- Department of Computer Science and Genome Center, University of California, Davis Davis, CA, USA
| | - Frédéric Poitevin
- Department of Structural Biology, Stanford UniversityStanford, CA, USA; SLAC National Accelerator Laboratory, Stanford PULSE InstituteMenlo Park, CA, USA
| | - Marc Delarue
- Unit of Structural Dynamics of Macromolecules, UMR 3528 du Centre National de la Recherche Scientifique, Institut Pasteur Paris, France
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis Davis, CA, USA
| |
Collapse
|
27
|
Fourati Z, Ruza RR, Laverty D, Drège E, Delarue-Cochin S, Joseph D, Koehl P, Smart T, Delarue M. Barbiturates Bind in the GLIC Ion Channel Pore and Cause Inhibition by Stabilizing a Closed State. J Biol Chem 2016; 292:1550-1558. [PMID: 27986812 DOI: 10.1074/jbc.m116.766964] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2016] [Revised: 12/06/2016] [Indexed: 12/12/2022] Open
Abstract
Barbiturates induce anesthesia by modulating the activity of anionic and cationic pentameric ligand-gated ion channels (pLGICs). Despite more than a century of use in clinical practice, the prototypic binding site for this class of drugs within pLGICs is yet to be described. In this study, we present the first X-ray structures of barbiturates bound to GLIC, a cationic prokaryotic pLGIC with excellent structural homology to other relevant channels sensitive to general anesthetics and, as shown here, to barbiturates, at clinically relevant concentrations. Several derivatives of barbiturates containing anomalous scatterers were synthesized, and these derivatives helped us unambiguously identify a unique barbiturate binding site within the central ion channel pore in a closed conformation. In addition, docking calculations around the observed binding site for all three states of the receptor, including a model of the desensitized state, showed that barbiturates preferentially stabilize the closed state. The identification of this pore binding site sheds light on the mechanism of barbiturate inhibition of cationic pLGICs and allows the rationalization of several structural and functional features previously observed for barbiturates.
Collapse
Affiliation(s)
- Zaineb Fourati
- From the Unité de Dynamique Structurale des Macromolécules, UMR 3528 du CNRS, Institut Pasteur, 75015 Paris, France
| | - Reinis Reinholds Ruza
- From the Unité de Dynamique Structurale des Macromolécules, UMR 3528 du CNRS, Institut Pasteur, 75015 Paris, France
| | - Duncan Laverty
- the Department of Neuroscience, Physiology and Pharmacology, University College London, London WC1E 6BT, United Kingdom
| | - Emmanuelle Drège
- the UMR 8076 du CNRS, BioCIS, Faculté de Pharmacie, Université Paris Sud, 92296 Chatenay-Malabry, France
| | - Sandrine Delarue-Cochin
- the UMR 8076 du CNRS, BioCIS, Faculté de Pharmacie, Université Paris Sud, 92296 Chatenay-Malabry, France
| | - Delphine Joseph
- the UMR 8076 du CNRS, BioCIS, Faculté de Pharmacie, Université Paris Sud, 92296 Chatenay-Malabry, France
| | - Patrice Koehl
- the Department of Computer Science, University of California, Davis, California 95616
| | - Trevor Smart
- the Department of Neuroscience, Physiology and Pharmacology, University College London, London WC1E 6BT, United Kingdom.
| | - Marc Delarue
- From the Unité de Dynamique Structurale des Macromolécules, UMR 3528 du CNRS, Institut Pasteur, 75015 Paris, France.
| |
Collapse
|
28
|
Fushing H, Hsueh CH, Heitkamp C, Matthews MA, Koehl P. Unravelling the geometry of data matrices: effects of water stress regimes on winemaking. J R Soc Interface 2016; 12:20150753. [PMID: 26468072 DOI: 10.1098/rsif.2015.0753] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
A new method is proposed for unravelling the patterns between a set of experiments and the features that characterize those experiments. The aims are to extract these patterns in the form of a coupling between the rows and columns of the corresponding data matrix and to use this geometry as a support for model testing. These aims are reached through two key steps, namely application of an iterative geometric approach to couple the metric spaces associated with the rows and columns, and use of statistical physics to generate matrices that mimic the original data while maintaining their inherent structure, thereby providing the basis for hypothesis testing and statistical inference. The power of this new method is illustrated on the study of the impact of water stress conditions on the attributes of 'Cabernet Sauvignon' Grapes, Juice, Wine and Bottled Wine from two vintages. The first step, named data mechanics, de-convolutes the intrinsic effects of grape berries and wine attributes due to the experimental irrigation conditions from the extrinsic effects of the environment. The second step provides an analysis of the associations of some attributes of the bottled wine with characteristics of either the matured grape berries or the resulting juice, thereby identifying statistically significant associations between the juice pH, yeast assimilable nitrogen, and sugar content and the bottled wine alcohol level.
Collapse
Affiliation(s)
- Hsieh Fushing
- Department of Statistics, University of California, Davis, CA 95616, USA
| | - Chih-Hsin Hsueh
- Department of Statistics, University of California, Davis, CA 95616, USA
| | - Constantin Heitkamp
- Department of Viticulture and Enology, University of California, Davis, CA 95616, USA
| | - Mark A Matthews
- Department of Viticulture and Enology, University of California, Davis, CA 95616, USA
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, CA 95616, USA
| |
Collapse
|
29
|
Abstract
Assembling fragments from known protein structures is a widely used approach to construct structural models for new proteins. We describe an application of this idea to an important inverse kinematics problem in structural biology: the loop closure problem. We have developed an algorithm for generating the conformations of candidate loops that fit in a gap of given length in a protein structure framework. Our method proceeds by concatenating small fragments of protein chosen from small libraries of representative fragments. Our approach has the advantages of ab initio methods since we are able to enumerate all candidate loops in the discrete approximation of the conformational space accessible to the loop, as well as the advantages of database search approach since the use of fragments of known protein structures guarantees that the backbone conformations are physically reasonable. We test our approach on a set of 427 loops, varying in length from four residues to 14 residues. The quality of the candidate loops is evaluated in terms of global coordinate root mean square (cRMS). The top predictions vary between 0.3 and 4.2 Å for four-residue loops and between 1.5 and 3.1 Å for 14-residue loops, respectively.
Collapse
Affiliation(s)
- Rachel Kolodny
- Department of Structural Biology and Computer Science Department, Stanford University, Stanford, CA 94305, USA,
| | - Leonidas Guibas
- Computer Science Department, Stanford University, Stanford, CA 94305, USA
| | - Michael Levitt
- Department of Structural Biology, Stanford University, Stanford, CA 94305, USA
| | - Patrice Koehl
- Department of Structural Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
30
|
Abstract
In this paper, we propose a new approach for computing a distance between two shapes embedded in three-dimensional space. We take as input a pair of triangulated genus zero surfaces that are topologically equivalent to spheres with no holes or handles, and construct a discrete conformal map f between the surfaces. The conformal map is chosen to minimize a symmetric deformation energy Esd(f) which we introduce. This measures the distance of f from an isometry, i.e. a non-distorting correspondence. We show that the energy of the minimizing map gives a well-behaved metric on the space of genus zero surfaces. In contrast to most methods in this field, our approach does not rely on any assignment of landmarks on the two surfaces. We illustrate applications of our approach to geometric morphometrics using three datasets representing the bones and teeth of primates. Experiments on these datasets show that our approach performs remarkably well both in shape recognition and in identifying evolutionary patterns, with success rates similar to, and in some cases better than, those obtained by expert observers.
Collapse
Affiliation(s)
- Patrice Koehl
- Department of Computer Science and Genome Center, University of California Davis, Davis, CA 95616, USA
| | - Joel Hass
- Department of Mathematics, University of California Davis, Davis, CA 95616, USA
| |
Collapse
|
31
|
Carlsen M, Koehl P, Røgen P. On the importance of the distance measures used to train and test knowledge-based potentials for proteins. PLoS One 2014; 9:e109335. [PMID: 25411785 PMCID: PMC4239004 DOI: 10.1371/journal.pone.0109335] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2014] [Accepted: 08/31/2014] [Indexed: 12/15/2022] Open
Abstract
Knowledge-based potentials are energy functions derived from the analysis of databases of protein structures and sequences. They can be divided into two classes. Potentials from the first class are based on a direct conversion of the distributions of some geometric properties observed in native protein structures into energy values, while potentials from the second class are trained to mimic quantitatively the geometric differences between incorrectly folded models and native structures. In this paper, we focus on the relationship between energy and geometry when training the second class of knowledge-based potentials. We assume that the difference in energy between a decoy structure and the corresponding native structure is linearly related to the distance between the two structures. We trained two distance-based knowledge-based potentials accordingly, one based on all inter-residue distances (PPD), while the other had the set of all distances filtered to reflect consistency in an ensemble of decoys (PPE). We tested four types of metric to characterize the distance between the decoy and the native structure, two based on extrinsic geometry (RMSD and GTD-TS*), and two based on intrinsic geometry (Q* and MT). The corresponding eight potentials were tested on a large collection of decoy sets. We found that it is usually better to train a potential using an intrinsic distance measure. We also found that PPE outperforms PPD, emphasizing the benefits of capturing consistent information in an ensemble. The relevance of these results for the design of knowledge-based potentials is discussed.
Collapse
Affiliation(s)
- Martin Carlsen
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California Davis, Davis, CA, United States of America
| | - Peter Røgen
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby, Denmark
- * E-mail:
| |
Collapse
|
32
|
Abstract
The amino acid sequence of a protein is the key to understanding its structure and ultimately its function in the cell. This paper addresses the fundamental issue of encoding amino acids in ways that the representation of such a protein sequence facilitates the decoding of its information content. We show that a feature-based representation in a three-dimensional (3D) space derived from amino acid substitution matrices provides an adequate representation that can be used for direct comparison of protein sequences based on geometry. We measure the performance of such a representation in the context of the protein structural fold prediction problem. We compare the results of classifying different sets of proteins belonging to distinct structural folds against classifications of the same proteins obtained from sequence alone or directly from structural information. We find that sequence alone performs poorly as a structure classifier. We show in contrast that the use of the three dimensional representation of the sequences significantly improves the classification accuracy. We conclude with a discussion of the current limitations of such a representation and with a description of potential improvements.
Collapse
Affiliation(s)
- Jie Li
- Genome Center, University of California, Davis, 451 Health Sciences Drive, Davis, CA 95616, United States
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, One Shields Ave, Davis, CA 95616, United States
| |
Collapse
|
33
|
Abstract
We propose a new method inspired from statistical mechanics for extracting geometric information from undirected binary networks and generating random networks that conform to this geometry. In this method an undirected binary network is perceived as a thermodynamic system with a collection of permuted adjacency matrices as its states. The task of extracting information from the network is then reformulated as a discrete combinatorial optimization problem of searching for its ground state. To solve this problem, we apply multiple ensembles of temperature regulated Markov chains to establish an ultrametric geometry on the network. This geometry is equipped with a tree hierarchy that captures the multiscale community structure of the network. We translate this geometry into a Parisi adjacency matrix, which has a relative low energy level and is in the vicinity of the ground state. The Parisi adjacency matrix is then further optimized by making block permutations subject to the ultrametric geometry. The optimal matrix corresponds to the macrostate of the original network. An ensemble of random networks is then generated such that each of these networks conforms to this macrostate; the corresponding algorithm also provides an estimate of the size of this ensemble. By repeating this procedure at different scales of the ultrametric geometry of the network, it is possible to compute its evolution entropy, i.e. to estimate the evolution of its complexity as we move from a coarse to a ne description of its geometric structure. We demonstrate the performance of this method on simulated as well as real data networks.
Collapse
Affiliation(s)
- Hsieh Fushing
- Department of Statistics, University of California, Davis, 1 Shields Ave, Davis, CA 95616
| | - Chen Chen
- Department of Statistics, University of California, Davis, 1 Shields Ave, Davis, CA 95616
| | - Shan-Yu Liu
- Department of Statistics, University of California, Davis, 1 Shields Ave, Davis, CA 95616
| | - Patrice Koehl
- Department of Computer Science, University of California, Davis, 1 Shields Ave, Davis, CA 95616
| |
Collapse
|
34
|
Chen CP, Fushing H, Atwill R, Koehl P. biDCG: a new method for discovering global features of DNA microarray data via an iterative re-clustering procedure. PLoS One 2014; 9:e102445. [PMID: 25047553 PMCID: PMC4105625 DOI: 10.1371/journal.pone.0102445] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2013] [Accepted: 06/19/2014] [Indexed: 02/02/2023] Open
Abstract
Biclustering techniques have become very popular in cancer genetics studies, as they are tools that are expected to connect phenotypes to genotypes, i.e. to identify subgroups of cancer patients based on the fact that they share similar gene expression patterns as well as to identify subgroups of genes that are specific to these subtypes of cancer and therefore could serve as biomarkers. In this paper we propose a new approach for identifying such relationships or biclusters between patients and gene expression profiles. This method, named biDCG, rests on two key concepts. First, it uses a new clustering technique, DCG-tree [Fushing et al, PLos One, 8, e56259 (2013)] that generates ultrametric topological spaces that capture the geometries of both the patient data set and the gene data set. Second, it optimizes the definitions of bicluster membership through an iterative two-way reclustering procedure in which patients and genes are reclustered in turn, based respectively on subsets of genes and patients defined in the previous round. We have validated biDCG on simulated and real data. Based on the simulated data we have shown that biDCG compares favorably to other biclustering techniques applied to cancer genomics data. The results on the real data sets have shown that biDCG is able to retrieve relevant biological information.
Collapse
Affiliation(s)
- Chia-Pei Chen
- Department of Statistics, University of California Davis, Davis, California, United States of America
| | - Hsieh Fushing
- Department of Statistics, University of California Davis, Davis, California, United States of America
| | - Rob Atwill
- Department of Population, Health and Reproduction/Vet Med Extension, University of California Davis, Davis, California, United States of America
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California Davis, Davis, California, United States of America
| |
Collapse
|
35
|
Abstract
Methods for computing electrostatic interactions often account implicitly for the solvent, due to the much smaller number of degrees of freedom involved. In the Poisson–Boltzmann (PB) approach the electrostatic potential is obtained by solving the Poisson–Boltzmann equation (PBE), where the solvent region is modeled as a homogeneous medium with a high dielectric constant. PB however is not exempt of problems. It does not take into account for example the sizes of the ions in the atmosphere surrounding the solute, nor does it take into account the inhomogeneous dielectric response of water due to the presence of a highly charged surface. In this paper we review two major modifications of PB that circumvent these problems, namely the size-modified PB (SMPB) equation and the Dipolar Poisson–Boltzmann Langevin (DPBL) model. In SMPB, steric effects between ions are accounted for with a lattice gas model. In DPBL, the solvent region is no longer modeled as a homogeneous dielectric media but rather as an assembly of self-orienting interacting dipoles of variable density. This model results in a dielectric profile that transits smoothly from the solute to the solvent region as well as in a variable solvent density that depends on the charges of the solute. We show successful applications of the DPBL formalism to computing the solvation free energies of isolated ions in water. Further developments of more accurately modified PB models are discussed.
Collapse
Affiliation(s)
- Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, CA 95616, USA
| | - Frederic Poitevin
- Unité de Dynamique Structurale des Macromolécules, UMR 3528 du CNRS, Institut Pasteur, 75015 Paris, France
| | - Henri Orland
- Service de Physique Théorique, CEA-Saclay, 91191 Gif/Yvette Cedex, France
| | - Marc Delarue
- Unité de Dynamique Structurale des Macromolécules, UMR 3528 du CNRS, Institut Pasteur, 75015 Paris, France
| |
Collapse
|
36
|
Francis-Lyon P, Koehl P. Protein side-chain modeling with a protein-dependent optimized rotamer library. Proteins 2014; 82:2000-17. [PMID: 24623614 DOI: 10.1002/prot.24555] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2013] [Revised: 02/28/2014] [Accepted: 03/07/2014] [Indexed: 12/16/2022]
Abstract
Despite years of effort, the problem of predicting the conformations of protein side chains remains a subject of inquiry. This problem has three major issues, namely defining the conformations that a side chain may adopt within a protein, developing a sampling procedure for generating possible side-chain packings, and defining a scoring function that can rank these possible packings. To solve the former of these issues, most procedures rely on a rotamer library derived from databases of known protein structures. We introduce an alternative method that is free of statistics. We begin with a rotamer library that is based only on stereochemical considerations; this rotamer library is then optimized independently for each protein under study. We show that this optimization step restores the diversity of conformations observed in native proteins. We combine this protein-dependent rotamer library (PDRL) method with the self-consistent mean field (SCMF) sampling approach and a physics-based scoring function into a new side-chain prediction method, SCMF-PDRL. Using two large test sets of 831 and 378 proteins, respectively, we show that this new method compares favorably with competing methods such as SCAP, OPUS-Rota, and SCWRL4 for energy-minimized structures.
Collapse
Affiliation(s)
- Patricia Francis-Lyon
- Department of Computer Science, University of San Francisco, San Francisco, California, 94117
| | | |
Collapse
|
37
|
Abstract
A new algorithm is presented that provides a constructive way to conformally warp a triangular mesh of genus zero to a destination surface with minimal metric deformation, as well as a means to compute automatically a measure of the geometric difference between two surfaces of genus zero. The algorithm takes as input a pair of surfaces that are topological 2-spheres, each surface given by a distinct triangulation. The algorithm then constructs a map $(f)$ between the two surfaces. First, each of the two triangular meshes is mapped to the unit sphere using a discrete conformal mapping algorithm. The two mappings are then composed with a Möbius transformation to generate the function $(f)$. The Möbius transformation is chosen by minimizing an energy that measures the distance of $(f)$ from an isometry. We illustrate our approach using several "real life" data sets. We show first that the algorithm allows for accurate, automatic, and landmark-free nonrigid registration of brain surfaces. We then validate our approach by comparing shapes of proteins. We provide numerical experiments to demonstrate that the distances computed with our algorithm between low-resolution, surface-based representations of proteins are highly correlated with the corresponding distances computed between high-resolution, atomistic models for the same proteins.
Collapse
Affiliation(s)
| | - Joel Hass
- University of California, Davis, Davis
| |
Collapse
|
38
|
Weinreb V, Li L, Chandrasekaran SN, Koehl P, Delarue M, Carter CW. Enhanced amino acid selection in fully evolved tryptophanyl-tRNA synthetase, relative to its urzyme, requires domain motion sensed by the D1 switch, a remote dynamic packing motif. J Biol Chem 2014; 289:4367-76. [PMID: 24394410 DOI: 10.1074/jbc.m113.538660] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
We previously showed (Li, L., and Carter, C. W., Jr. (2013) J. Biol. Chem. 288, 34736-34745) that increased specificity for tryptophan versus tyrosine by contemporary Bacillus stearothermophilus tryptophanyl-tRNA synthetase (TrpRS) over that of TrpRS Urzyme results entirely from coupling between the anticodon-binding domain and an insertion into the Rossmann-fold known as Connecting Peptide 1. We show that this effect is closely related to a long range catalytic effect, in which side chain repacking in a region called the D1 Switch, accounts fully for the entire catalytic contribution of the catalytic Mg(2+) ion. We report intrinsic and higher order interaction effects on the specificity ratio, (kcat/Km)Trp/(kcat/Km)Tyr, of 15 combinatorial mutants from a previous study (Weinreb, V., Li, L., and Carter, C. W., Jr. (2012) Structure 20, 128-138) of the catalytic role of the D1 Switch. Unexpectedly, the same four-way interaction both activates catalytic assist by Mg(2+) ion and contributes -4.4 kcal/mol to the free energy of the specificity ratio. A minimum action path computed for the induced-fit and catalytic conformation changes shows that repacking of the four residues precedes a decrease in the volume of the tryptophan-binding pocket. We suggest that previous efforts to alter amino acid specificities of TrpRS and glutaminyl-tRNA synthetase (GlnRS) by mutagenesis without extensive, modular substitution failed because mutations were incompatible with interdomain motions required for catalysis.
Collapse
Affiliation(s)
- Violetta Weinreb
- From the Department of Biochemistry and Biophysics, CB 7260, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7260
| | | | | | | | | | | |
Collapse
|
39
|
Affiliation(s)
- Patrice Koehl
- Department of Computer Science and Genome Center, University of California at Davis Davis, CA, USA
| |
Collapse
|
40
|
Hass J, Koehl P. How round is a protein? Exploring protein structures for globularity using conformal mapping. Front Mol Biosci 2014; 1:26. [PMID: 25988167 PMCID: PMC4428355 DOI: 10.3389/fmolb.2014.00026] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Accepted: 11/21/2014] [Indexed: 11/20/2022] Open
Abstract
We present a new algorithm that automatically computes a measure of the geometric difference between the surface of a protein and a round sphere. The algorithm takes as input two triangulated genus zero surfaces representing the protein and the round sphere, respectively, and constructs a discrete conformal map f between these surfaces. The conformal map is chosen to minimize a symmetric elastic energy ES(f) that measures the distance of f from an isometry. We illustrate our approach on a set of basic sample problems and then on a dataset of diverse protein structures. We show first that ES(f) is able to quantify the roundness of the Platonic solids and that for these surfaces it replicates well traditional measures of roundness such as the sphericity. We then demonstrate that the symmetric elastic energy ES(f) captures both global and local differences between two surfaces, showing that our method identifies the presence of protruding regions in protein structures and quantifies how these regions make the shape of a protein deviate from globularity. Based on these results, we show that ES(f) serves as a probe of the limits of the application of conformal mapping to parametrize protein shapes. We identify limitations of the method and discuss its extension to achieving automatic registration of protein structures based on their surface geometry.
Collapse
Affiliation(s)
- Joel Hass
- Department of Mathematics, University of California, Davis Davis, CA, USA
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis Davis, CA, USA
| |
Collapse
|
41
|
Abstract
Morphing was initially developed as a cinematic effect, where one image is seamlessly transformed into another image. The technique was widely adopted by biologists to visualize the transition between protein conformational states, generating an interpolated pathway from an initial to a final protein structure. Geometric morphing seeks to create visually suggestive movies that illustrate structural changes between conformations but do not necessarily represent a biologically relevant pathway, while minimum energy path (MEP) interpolations aim at describing the true transition state between the crystal structure minima in the energy landscape.
Collapse
Affiliation(s)
- Dahlia R Weiss
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA, USA
| | | |
Collapse
|
42
|
Li J, Mach P, Koehl P. Measuring the shapes of macromolecules - and why it matters. Comput Struct Biotechnol J 2013; 8:e201309001. [PMID: 24688748 PMCID: PMC3962087 DOI: 10.5936/csbj.201309001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2013] [Revised: 11/22/2013] [Accepted: 11/22/2013] [Indexed: 11/22/2022] Open
Abstract
The molecular basis of life rests on the activity of biological macromolecules, mostly nucleic acids and proteins. A perhaps surprising finding that crystallized over the last handful of decades is that geometric reasoning plays a major role in our attempt to understand these activities. In this paper, we address this connection between geometry and biology, focusing on methods for measuring and characterizing the shapes of macromolecules. We briefly review existing numerical and analytical approaches that solve these problems. We cover in more details our own work in this field, focusing on the alpha shape theory as it provides a unifying mathematical framework that enable the analytical calculations of the surface area and volume of a macromolecule represented as a union of balls, the detection of pockets and cavities in the molecule, and the quantification of contacts between the atomic balls. We have shown that each of these quantities can be related to physical properties of the molecule under study and ultimately provides insight on its activity. We conclude with a brief description of new challenges for the alpha shape theory in modern structural biology.
Collapse
Affiliation(s)
- Jie Li
- Genome Center, University of California, Davis, 451 Health Sciences Drive, Davis, CA 95616, United States
| | - Paul Mach
- Graduate Group of Applied Mathematics, University of California, Davis, 1, Shields Ave, Davis, CA, 95616, United States
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, 1, Shields Ave, Davis, CA, 95616, United States
| |
Collapse
|
43
|
Kabasakal BV, Gae DD, Li J, Lagarias JC, Koehl P, Fisher AJ. His74 conservation in the bilin reductase PcyA family reflects an important role in protein-substrate structure and dynamics. Arch Biochem Biophys 2013; 537:233-42. [DOI: 10.1016/j.abb.2013.07.021] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2013] [Accepted: 07/19/2013] [Indexed: 10/26/2022]
|
44
|
Smaoui MR, Poitevin F, Delarue M, Koehl P, Orland H, Waldispühl J. Computational assembly of polymorphic amyloid fibrils reveals stable aggregates. Biophys J 2013; 104:683-93. [PMID: 23442919 DOI: 10.1016/j.bpj.2012.12.037] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2012] [Revised: 11/26/2012] [Accepted: 12/10/2012] [Indexed: 11/27/2022] Open
Abstract
Amyloid proteins aggregate into polymorphic fibrils that damage tissues of the brain, nerves, and heart. Experimental and computational studies have examined the structural basis and the nucleation of short fibrils, but the ability to predict and precisely quantify the stability of larger aggregates has remained elusive. We established a complete classification of fibril shapes and developed a tool called CreateFibril to build such complex, polymorphic, modular structures automatically. We applied stability landscapes, a technique we developed to reveal reliable fibril structural parameters, to assess fibril stability. CreateFibril constructed HET-s, Aβ, and amylin fibrils up to 17 nm in length, and utilized a novel dipolar solvent model that captured the effect of dipole-dipole interactions between water and very large molecular systems to assess their aqueous stability. Our results validate experimental data for HET-s and Aβ, and suggest novel (to our knowledge) findings for amylin. In particular, we predicted the correct structural parameters (rotation angles, packing distances, hydrogen bond lengths, and helical pitches) for the one and three predominant HET-s protofilaments. We reveal and structurally characterize all known Aβ polymorphic fibrils, including structures recently classified as wrapped fibrils. Finally, we elucidate the predominant amylin fibrils and assert that native amylin is more stable than its amyloid form. CreateFibril and a database of all stable polymorphic fibril models we tested, along with their structural energy landscapes, are available at http://amyloid.cs.mcgill.ca.
Collapse
|
45
|
Mach P, Koehl P. Capturing protein sequence-structure specificity using computational sequence design. Proteins 2013; 81:1556-70. [DOI: 10.1002/prot.24307] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2012] [Revised: 03/28/2013] [Accepted: 04/11/2013] [Indexed: 02/05/2023]
Affiliation(s)
- Paul Mach
- Department of Applied Mathematics; Genome Center; University of California; Davis 95616 California
| | - Patrice Koehl
- Department of Computer Science; Genome Center; University of California; Davis 95616 California
| |
Collapse
|
46
|
Smaoui M, Poitevin F, Delarue M, Koehl P, Orland H, Waldispuhl J. Mortal Kombat: modeling amyloid fibrils and health implications. FASEB J 2013. [DOI: 10.1096/fasebj.27.1_supplement.996.16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
| | | | | | - Patrice Koehl
- Department of Computer ScienceUniversity of California DavisDavisCA
| | - Henri Orland
- Institut de Physique TheoriqueCEA‐SaclayGif/Yvette CedexFrance
| | | |
Collapse
|
47
|
Røgen P, Koehl P. Extracting knowledge from protein structure geometry. Proteins 2013; 81:841-51. [PMID: 23280479 DOI: 10.1002/prot.24242] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2012] [Revised: 11/28/2012] [Accepted: 12/08/2012] [Indexed: 11/06/2022]
Abstract
Protein structure prediction techniques proceed in two steps, namely the generation of many structural models for the protein of interest, followed by an evaluation of all these models to identify those that are native-like. In theory, the second step is easy, as native structures correspond to minima of their free energy surfaces. It is well known however that the situation is more complicated as the current force fields used for molecular simulations fail to recognize native states from misfolded structures. In an attempt to solve this problem, we follow an alternate approach and derive a new potential from geometric knowledge extracted from native and misfolded conformers of protein structures. This new potential, Metric Protein Potential (MPP), has two main features that are key to its success. Firstly, it is composite in that it includes local and nonlocal geometric information on proteins. At the short range level, it captures and quantifies the mapping between the sequences and structures of short (7-mer) fragments of protein backbones through the introduction of a new local energy term. The local energy term is then augmented with a nonlocal residue-based pairwise potential, and a solvent potential. Secondly, it is optimized to yield a maximized correlation between the energy of a structural model and its root mean square (RMS) to the native structure of the corresponding protein. We have shown that MPP yields high correlation values between RMS and energy and that it is able to retrieve the native structure of a protein from a set of high-resolution decoys.
Collapse
Affiliation(s)
- Peter Røgen
- Department of Mathematics, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark.
| | | |
Collapse
|
48
|
Fushing H, Wang H, VanderWaal K, McCowan B, Koehl P. Multi-scale clustering by building a robust and self correcting ultrametric topology on data points. PLoS One 2013; 8:e56259. [PMID: 23424653 PMCID: PMC3570468 DOI: 10.1371/journal.pone.0056259] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2012] [Accepted: 01/07/2013] [Indexed: 11/19/2022] Open
Abstract
The advent of high-throughput technologies and the concurrent advances in information sciences have led to an explosion in size and complexity of the data sets collected in biological sciences. The biggest challenge today is to assimilate this wealth of information into a conceptual framework that will help us decipher biological functions. A large and complex collection of data, usually called a data cloud, naturally embeds multi-scale characteristics and features, generically termed geometry. Understanding this geometry is the foundation for extracting knowledge from data. We have developed a new methodology, called data cloud geometry-tree (DCG-tree), to resolve this challenge. This new procedure has two main features that are keys to its success. Firstly, it derives from the empirical similarity measurements a hierarchy of clustering configurations that captures the geometric structure of the data. This hierarchy is then transformed into an ultrametric space, which is then represented via an ultrametric tree or a Parisi matrix. Secondly, it has a built-in mechanism for self-correcting clustering membership across different tree levels. We have compared the trees generated with this new algorithm to equivalent trees derived with the standard Hierarchical Clustering method on simulated as well as real data clouds from fMRI brain connectivity studies, cancer genomics, giraffe social networks, and Lewis Carroll's Doublets network. In each of these cases, we have shown that the DCG trees are more robust and less sensitive to measurement errors, and that they provide a better quantification of the multi-scale geometric structures of the data. As such, DCG-tree is an effective tool for analyzing complex biological data sets.
Collapse
Affiliation(s)
- Hsieh Fushing
- Department of Statistics, University of California Davis, Davis, California, United States of America
| | - Hui Wang
- Department of Statistics, University of California Davis, Davis, California, United States of America
| | - Kimberly VanderWaal
- Animal Behavior Graduate Group, University of California Davis, Davis, California, United States of America
| | - Brenda McCowan
- Department of Population Health and Reproduction and California National Primate Research Center, University of California Davis, Davis, California, United States of America
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California Davis, Davis, California, United States of America
| |
Collapse
|
49
|
Tsui A, Fenton D, Vuong P, Hass J, Koehl P, Amenta N, Coeurjolly D, DeCarli C, Carmichael O. Globally Optimal Cortical Surface Matching with Exact Landmark Correspondence. Lecture Notes in Computer Science 2013; 23:487-98. [DOI: 10.1007/978-3-642-38868-2_41] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
50
|
Abstract
Background Drug discovery typically starts with the identification of a potential target that is then tested and validated either through high-throughput screening against a library of drug compounds or by rational drug design. When the putative target is a protein, the latter approach requires the knowledge of its structure. Finding the structure of a protein is however a difficult task. Significant progress has come from high-resolution techniques such as X-ray crystallography and NMR; there are many proteins however whose structure have not yet been solved. Computational techniques for structure prediction are viable alternatives to experimental techniques for these cases. However, the proper validation of the structural models they generate remains an issue. Findings In this report, we focus on homology modeling techniques and introduce the H-factor, a new indicator for assessing the quality of protein structure models generated with these techniques. The H-factor is meant to mimic the R-factor used in X-ray crystallography. The method for computing the H-factor is fully described with a demonstration of its effectiveness on a test set of target proteins. Conclusions We have developed a web service for computing the H-factor for models of a protein structure. This service is freely accessible at http://koehllab.genomecenter.ucdavis.edu/toolkit/h-factor.
Collapse
Affiliation(s)
- Eric di Luccio
- Computer Science Department, University of California Davis, 451 East Health Sciences Drive, Room 4337, Genome Center, GBSF, Davis, CA, 95616, USA.
| | | |
Collapse
|