1
|
Müller R, Han JP, Chandrasekaran S, Bogdan P. Deep Learning for Reintegrating Biology. Integr Comp Biol 2021; 61:2276-2281. [PMID: 33881520 DOI: 10.1093/icb/icab015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The goal of this vision paper is to investigate the possible role that advanced machine learning techniques, especially deep learning, could play in the reintegration of various biological disciplines. To achieve this goal, a series of operational, but admittedly very simplistic, conceptualizations have been introduced: Life has been taken as a multidimensional phenomenon that inhabits three physical dimensions (time, space, and scale) and biological research as establishing connection between different points in the domain of life. Each of these points hence denotes a position in time, space, and scale at which a life phenomenon of interest takes place. Using these conceptualizations, fragmentation of biology can be seen as the result of too few and especially too short-ranged connections. Reintegrating biology could then be accomplished by establishing more, longer ranged connections. Deep learning methods appear to be very well suited for addressing this particular need at this particular time. Not withstanding the numerous unsubstantiated claims regarding the capabilities of AI, deep learning networks represent a major advance in the ability to find complex relationships inside large data sets that would have not been accessible with traditional data analytic methods or to a human observer. In addition, ongoing advances in the automation of taking measurements from phenomena on all levels of biological organization, continue to increase the number of large quantitative data sets that are available. These increasingly common data sets could serve as anchor points for making long-range connections by virtue of deep learning. However, connections within the domain of life are likely to be structured in a highly nonuniform fashion and hence it is necessary to develop methods, e.g., theoretical, computational, and experimental, to determine linkage of biological data sets most likely to provide useful insights on a biological problem using deep learning. Finally, specific deep learning approaches and architectures should be developed to match the needs of reintegrating biology.
Collapse
Affiliation(s)
- Rolf Müller
- Department of Mechanical Engineering, Virginia Tech, 1075 Life Science Circle, Blacksburg, Virginia 24061, USA
| | - Jin-Ping Han
- T.J. Watson Research Center, IBM, 1101 Kitchawan Road, Yorktown Heights, New York 10598, USA
| | - Sriram Chandrasekaran
- Department of Biomedical Engineering, University of Michigan, 1600 Huron Parkway, Ann Arbor, Michigan 48109, USA
| | - Paul Bogdan
- Department of Electrical and Computer Engineering, University of Southern California, 3740 McClintock Avenue, Los Angeles, California 90089, USA
| |
Collapse
|
2
|
Ansari A, Kuznetsov SV. Is Hairpin Formation in Single-Stranded Polynucleotide Diffusion-Controlled? J Phys Chem B 2005; 109:12982-9. [PMID: 16852611 DOI: 10.1021/jp044838a] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
An intriguing puzzle in biopolymer science is the observation that single-stranded DNA and RNA oligomers form hairpin structures on time scales of tens of microseconds, considerably slower than the estimated time for loop formation for a semiflexible polymer of similar length. To address the origin of the slow kinetics and to determine whether hairpin dynamics are diffusion-controlled, the effect of solvent viscosity (eta) on hairpin kinetics was investigated using laser temperature-jump techniques. The viscosity was varied by addition of glycerol, which significantly destabilizes hairpins. A previous study on the viscosity dependence of hairpin dynamics, in which all the changes in the measured rates were attributed to a change in solvent viscosity, reported an apparent scaling of relaxation times (tau(r)) on eta as tau(r) approximately eta(0.8). In this study, we demonstrate that if the effect of viscosity on the measured rates is not deconvoluted from the inevitable effect of change in stability, then separation of tau(r) into opening (tau(o)) and closing (tau(c)) times yields erroneous behavior, with different values (and opposite signs) of the apparent scaling exponents, tau(o) approximately eta(-0.4) and tau(c) approximately eta(1.5). Under isostability conditions, obtained by varying the temperature to compensate for the destabilizing effect of glycerol, both tau(o) and tau(c) scale as approximately eta(1.1+/-0.1). Thus, hairpin dynamics are strongly coupled to solvent viscosity, indicating that diffusion of the polynucleotide chain through the solvent is involved in the rate-determining step.
Collapse
Affiliation(s)
- Anjum Ansari
- Department of Physics and Department of Bioengineering, University of Illinois at Chicago, 845 West Taylor Street, Chicago, Illinois 60607, USA.
| | | |
Collapse
|
3
|
Fernández A, Berry RS. Self-organization and mismatch tolerance in protein folding: General theory and an application. J Chem Phys 2000. [DOI: 10.1063/1.481076] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
4
|
Abstract
During protein folding, many of the events leading to secondary and tertiary structure occur in milliseconds or faster. Modern nuclear magnetic resonance and laser detection techniques, coupled with fast initiation of the folding reaction, are probing these events in great detail. Theory, ranging from analytical models to molecular dynamics calculations, is beginning to match up with experiment. As a result, timescales, from such elementary steps as the addition of a residue to a helix to strange kinetics of collapsing protein backbones, can now be measured and interpreted.
Collapse
Affiliation(s)
- M Gruebele
- Department of Chemistry and Beckman Institute for Advanced Science and Technology, University of Illinois, Urbana, IL 61801, USA.
| |
Collapse
|
5
|
Tuchscherer G, Grell D, Mathieu M, Mutter M. Extending the concept of template-assembled synthetic proteins. THE JOURNAL OF PEPTIDE RESEARCH : OFFICIAL JOURNAL OF THE AMERICAN PEPTIDE SOCIETY 1999; 54:185-94. [PMID: 10517155 DOI: 10.1034/j.1399-3011.1999.00120.x] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
The creation of native-like macromolecules in copying nature's way represents a fascinating challenge in protein chemistry today. In the absence of a detailed knowledge of the complex folding pathway the ultimate goal in protein de novo design, the construction of artificial proteins with predetermined three-dimensional structure and tailor-made functions based on a defined, generally valid set of rules, appears to be still out of reach. With progress in synthesis strategies and biostructural characterization methods, topological templates have become a versatile tool for inducing and stabilizing secondary and tertiary structures, such as protein loops, beta-turns, alpha-helices, beta-sheets and a variety of folding motifs. In this article, we extend the concept of template-assembled synthetic proteins for the construction of protein-like topologies with multiply bridged, oligocyclic chain architectures termed locked-in tertiary folds that exhibit unique physicochemical and folding properties because of the highly confined conformational space. Furthermore, we show that some fundamental questions in protein assembly can be approached applying the template concept. Using covalent template trapping of self-associated peptide assemblies in aqueous solution the structural and physical forces guiding protein folding, supramolecular assembly and molecular recognition processes can be studied on a molecular level.
Collapse
Affiliation(s)
- G Tuchscherer
- Institute of Organic Chemistry, University of Lausanne, Switzerland.
| | | | | | | |
Collapse
|
6
|
Fernández A, Salthú R, Cendra H. Discretized torsional dynamics and the folding of an RNA chain. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 1999; 60:2105-19. [PMID: 11970003 DOI: 10.1103/physreve.60.2105] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/1998] [Revised: 11/30/1998] [Indexed: 04/18/2023]
Abstract
The aim of this work is to implement a discrete coarse codification of local torsional states of the RNA chain backbone in order to explore the long-time limit dynamics and ultimately obtain a coarse solution to the RNA folding problem. A discrete representation of the soft-mode dynamics is turned into an algorithm for a rough structure prediction. The algorithm itself is inherently parallel, as it evaluates concurrent folding possibilities by pattern recognition, but it may be implemented in a personal computer as a chain of perturbation-translation-renormalization cycles performed on a binary matrix of local topological constraints. This requires suitable representational tools and a periodic quenching of the dynamics for system renormalization. A binary coding of local topological constraints associated with each structural motif is introduced, with each local topological constraint corresponding to a local torsional state. This treatment enables us to adopt a computation time step far larger than hydrodynamic drag time scales. Accordingly, the solvent is no longer treated as a hydrodynamic drag medium. Instead we incorporate its capacity for forming local conformation-dependent dielectric domains. Each translation of the matrix of local topological constraints (LTM's) depends on the conformation-dependent local dielectric created by a confined solvent. Folding pathways are resolved as transitions between patterns of locally encoded structural signals which change within the 1 ns-100 ms time scale range. These coarse folding pathways are generated by a search at regular intervals for structural patterns in the LTM. Each pattern is recorded as a base-pairing pattern (BPP) matrix, a consensus-evaluation operation subject to a renormalization feedback loop. Since several mutually conflicting consensus evaluations might occur at a given time, the need arises for a probabilistic approach appropriate for an ensemble of RNA molecules. Thus, a statistical dynamics of consensus formation is determined by the time evolution of the base pairing probability matrix. These dynamics are generated for a functional RNA molecule, a representative of the so-called group I ribozymes, in order to test the model. The resulting ensemble of conformations is sharply peaked and the most probable structure features the predominance of all phylogenetically conserved intrachain helices tantamount to ribozyme function. Furthermore, the magnesium-aided cooperativity that leads to the shaping of the catalytic core is elucidated. Once the predictive folding algorithm has been implemented, the validity of the so-called "adiabatic approximation" is tested. This approximation requires that conformational microstates be lumped up into BPP's which are treated as quasiequilibrium states, while folding pathways are coarsely represented as sequences of BPP transitions. To test the validity of this adiabatic ansatz, a computation of the coarse Shannon information entropy sigma associated to the specific partition of conformation space into BPP's is performed taking into account the LTM evolution and contrasted with the adiabatic computation. The results reveal a subordination of torsional microstate dynamics to BPP transitions within time scales relevant to folding. This adiabatic entrainment in the long-time limit is thus identified as responsible for the expediency of the folding process.
Collapse
Affiliation(s)
- A Fernández
- Instituto de Matemática, Universidad Nacional del Sur, Consejo Nacional de Investigaciones Científicas y Técnicas, Avenida Alem 1253, Bahía Blanca 8000, Argentina.
| | | | | |
Collapse
|
7
|
Fernández A. Folding a protein by discretizing its backbone torsional dynamics. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 1999; 59:5928-39. [PMID: 11969574 DOI: 10.1103/physreve.59.5928] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/1998] [Revised: 11/12/1998] [Indexed: 04/18/2023]
Abstract
The aim of this work is to provide a coarse codification of local conformational constraints associated with each folding motif of a peptide chain in order to obtain a rough solution to the protein folding problem. This is accomplished by implementing a discretized version of the soft-mode dynamics on a personal computer (PC). Our algorithm mimics a parallel process as it evaluates concurrent folding possibilities by pattern recognition. It may be implemented in a PC as a sequence of perturbation-translation-renormalization (p-t-r) cycles performed on a matrix of local topological constraints (LTM). This requires suitable representational tools and a periodic quenching of the dynamics required for renormalization. We introduce a description of the peptide chain based on a local discrete variable the values of which label the basins of attraction of the Ramachandran map for each residue. Thus, the local variable indicates the basin in which the torsional coordinates of each residue lie at a given time. In addition, a coding of local topological constraints associated with each secondary and tertiary structural motif is introduced. Our treatment enables us to adopt a computation time step of 81 ps, a value far larger than hydrodynamic drag time scales. Folding pathways are resolved as transitions between patterns of locally encoded structural signals that change within the 10 micros-100 ms time scale range. These coarse folding pathways are generated by the periodic search for structural patterns in the time-evolving LTM. Each pattern is recorded as a contact matrix, an operation subject to a renormalization feedback loop. The validity of our approach is tested vis-a-vis experimentally-probed folding pathways eventually generating tertiary interactions in proteins which recover their active structure under in vitro renaturation conditions. As an illustration, we focus on determining significant folding intermediates and late kinetic bottlenecks that occur within the first 10 ms of the bovine pancreatic trypsin inhibitor renaturation process. The probed cooperativity and nucleation effects, as well as diffusion-collision stabilization of secondary structure are shown to result from the persistence of relatively stable patterns through successive (p-t-r) cycles, thus acting as seeding patterns for further growth or hierarchical development.
Collapse
Affiliation(s)
- A Fernández
- Instituto de Matemática (INMABB), Consejo Nacional de Investigaciones Científicas y Técnicas, Universidad Nacional del Sur, Avenida Alem 1253, Bahía Blanca 8000, Argentina
| |
Collapse
|
8
|
Abstract
We have calculated the free energy of a spherical model of a protein or part of a protein generated in the way of protein folding. Two spherical models are examined; one is a homogeneous model consisting of only one residue type--hydrophobic. The other is a heterogeneous model consisting of two residue types--strong hydrophobic and weak hydrophobic. Both models show a folding transition state, and the latter model reproduces the trend of the experimental folded-unfolded energy change. The heterogeneous model suggests that in the folding process of a protein of more than 70 residues, a specific region of the protein folds first to form a stable region, then the other residues follow the folding process. The energy landscape of folding of a small protein is approximately a funnel model, whereas a flatter energy landscape is suggested for larger proteins of more than 55-70 residues.
Collapse
Affiliation(s)
- Y Fukunishi
- Department of Chemistry, Rutgers, the State University of New Jersey, Piscataway, USA.
| |
Collapse
|
9
|
Abstract
The folding pathway of apomyoglobin has been experimentally shown to have early kinetic intermediates involving the A, B, G, and H helices. The earliest detected kinetic events occur on a ns to micros time scale. We show that the early folding kinetics of apomyoglobin may be understood as the association of nascent helices through a network of diffusion-collision-coalescence steps G + H <--> GH + A <--> AGH + B <--> ABGH obtained by solving the diffusion-collision model in a chemical kinetics approximation. Our reproduction of the experimental results indicates that the model is a useful way to analyze folding data. One prediction from our fit is that the nascent A and H helices should be relatively more helix-like before coalescence than the other apomyoglobin helices.
Collapse
Affiliation(s)
- R V Pappu
- Department of Biochemistry & Molecular Biophysics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | | |
Collapse
|
10
|
Abstract
We use two simple models and the energy landscape perspective to study protein folding kinetics. A major challenge has been to use the landscape perspective to interpret experimental data, which requires ensemble averaging over the microscopic trajectories usually observed in such models. Here, because of the simplicity of the model, this can be achieved. The kinetics of protein folding falls into two classes: multiple-exponential and two-state (single-exponential) kinetics. Experiments show that two-state relaxation times have "chevron plot" dependences on denaturant and non-Arrhenius dependences on temperature. We find that HP and HP+ models can account for these behaviors. The HP model often gives bumpy landscapes with many kinetic traps and multiple-exponential behavior, whereas the HP+ model gives more smooth funnels and two-state behavior. Multiple-exponential kinetics often involves fast collapse into kinetic traps and slower barrier climbing out of the traps. Two-state kinetics often involves entropic barriers where conformational searching limits the folding speed. Transition states and activation barriers need not define a single conformation; they can involve a broad ensemble of the conformations searched on the way to the native state. We find that unfolding is not always a direct reversal of the folding process.
Collapse
Affiliation(s)
- H S Chan
- Department of Pharmaceutical Chemistry, University of California, San Francisco 94143-1204, USA.
| | | |
Collapse
|
11
|
Hagen SJ, Hofrichter J, Eaton WA. Rate of Intrachain Diffusion of Unfolded Cytochrome c. J Phys Chem B 1997. [DOI: 10.1021/jp9622997] [Citation(s) in RCA: 101] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Stephen J. Hagen
- Laboratory of Chemical Physics, Building 5, National Institutes of Health, Bethesda, Maryland 20892-0520
| | - James Hofrichter
- Laboratory of Chemical Physics, Building 5, National Institutes of Health, Bethesda, Maryland 20892-0520
| | - William A. Eaton
- Laboratory of Chemical Physics, Building 5, National Institutes of Health, Bethesda, Maryland 20892-0520
| |
Collapse
|
12
|
Abstract
A change in the perception of the protein folding problem has taken place recently. The nature of the change is outlined and the reasons for it are presented. An essential element is the recognition that a bias toward the native state over much of the effective energy surface may govern the folding process. This has replaced the random search paradigm of Levinthal and suggests that there are many ways of reaching the native state in a reasonable time so that a specific pathway does not have to be postulated. The change in perception is due primarily to the application of statistical mechanical models and lattice simulations to protein folding. Examples of lattice model results on protein folding are presented. It is pointed out that the new optimism about the protein folding problem must be complemented by more detailed studies to determine the structural and energetic factors that introduce the biases which make possible the folding of real proteins.
Collapse
Affiliation(s)
- M Karplus
- Laboratoire de Chimie Biophysique, Institute le Bel, Universite Louis Pasteur, Strasbourg, France.
| |
Collapse
|
13
|
Munson M, Anderson KS, Regan L. Speeding up protein folding: mutations that increase the rate at which Rop folds and unfolds by over four orders of magnitude. FOLDING & DESIGN 1997; 2:77-87. [PMID: 9080201 DOI: 10.1016/s1359-0278(97)00008-4] [Citation(s) in RCA: 49] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
BACKGROUND The dimeric four-helix-bundle protein Rop folds and unfolds extremely slowly. To understand the molecular basis for the slow kinetics, we have studied the folding and unfolding of wild-type Rop and a series of hydrophobic core mutants. RESULTS Mutation of the hydrophobic core creates stable, dimeric, and wild-type-like proteins with dramatically increased rates of both folding and unfolding. The increases in rates are dependent upon the number and position of repacked residues within the hydrophobic core. CONCLUSIONS Rop folds by a rapid collision of monomers to form a dimeric intermediate with substantial helical content, followed by a slow rearrangement to the final native structure. Rop unfolding is a single extremely slow kinetic phase. The slow steps of both folding and unfolding are dramatically increased by hydrophobic core replacements, suggesting that their main effect is to substantially decrease the energy of the transition state.
Collapse
Affiliation(s)
- M Munson
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | | | | |
Collapse
|
14
|
Hagen SJ, Hofrichter J, Szabo A, Eaton WA. Diffusion-limited contact formation in unfolded cytochrome c: estimating the maximum rate of protein folding. Proc Natl Acad Sci U S A 1996; 93:11615-7. [PMID: 8876184 PMCID: PMC38106 DOI: 10.1073/pnas.93.21.11615] [Citation(s) in RCA: 322] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
How fast can a protein fold? The rate of polypeptide collapse to a compact state sets an upper limit to the rate of folding. Collapse may in turn be limited by the rate of intrachain diffusion. To address this question, we have determined the rate at which two regions of an unfolded protein are brought into contact by diffusion. Our nanosecond-resolved spectroscopy shows that under strongly denaturing conditions, regions of unfolded cytochrome separated by approximately 50 residues diffuse together in 35-40 microseconds. This result leads to an estimate of approximately (1 microsecond)-1 as the upper limit for the rate of protein folding.
Collapse
Affiliation(s)
- S J Hagen
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestic and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892-0520, USA
| | | | | | | |
Collapse
|
15
|
Affiliation(s)
- J A McCammon
- Department of Chemistry, University of California at San Diego, La Jolla 92093-0365, USA
| |
Collapse
|