1
|
Zhang N, Sood D, Guo SC, Chen N, Antoszewski A, Marianchuk T, Dey S, Xiao Y, Hong L, Peng X, Baxa M, Partch C, Wang LP, Sosnick TR, Dinner AR, LiWang A. Temperature-dependent fold-switching mechanism of the circadian clock protein KaiB. Proc Natl Acad Sci U S A 2024; 121:e2412327121. [PMID: 39671178 DOI: 10.1073/pnas.2412327121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Accepted: 10/24/2024] [Indexed: 12/14/2024] Open
Abstract
The oscillator of the cyanobacterial circadian clock relies on the ability of the KaiB protein to switch reversibly between a stable ground-state fold (gsKaiB) and an unstable fold-switched fold (fsKaiB). Rare fold-switching events by KaiB provide a critical delay in the negative feedback loop of this posttranslational oscillator. In this study, we experimentally and computationally investigate the temperature dependence of fold switching and its mechanism. We demonstrate that the stability of gsKaiB increases with temperature compared to fsKaiB and that the Q10 value for the gsKaiB → fsKaiB transition is nearly three times smaller than that for the reverse transition in a construct optimized for NMR studies. Simulations and native-state hydrogen-deuterium exchange NMR experiments suggest that fold switching can involve both partially and completely unfolded intermediates. The simulations predict that the transition state for fold switching coincides with isomerization of conserved prolines in the most rapidly exchanging region, and we confirm experimentally that proline isomerization is a rate-limiting step for fold switching. We explore the implications of our results for temperature compensation, a hallmark of circadian clocks, through a kinetic model.
Collapse
Affiliation(s)
- Ning Zhang
- Department of Chemistry and Biochemistry, University of California, Merced, CA 95343
| | - Damini Sood
- Department of Chemistry and Biochemistry, University of California, Merced, CA 95343
| | - Spencer C Guo
- Department of Chemistry and James Franck Institute, University of Chicago, Chicago, IL 60637
| | - Nanhao Chen
- Department of Chemistry, University of California, Davis, CA 95616
| | - Adam Antoszewski
- Department of Chemistry and James Franck Institute, University of Chicago, Chicago, IL 60637
| | - Tegan Marianchuk
- Graduate Program in Biophysical Sciences, University of Chicago, Chicago, IL 60637
| | - Supratim Dey
- Department of Chemistry and Biochemistry, University of California, Merced, CA 95343
| | - Yunxian Xiao
- Department of Chemistry and Biochemistry, University of California, Merced, CA 95343
| | - Lu Hong
- Graduate Program in Biophysical Sciences, University of Chicago, Chicago, IL 60637
| | - Xiangda Peng
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL 60637
| | - Michael Baxa
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL 60637
| | - Carrie Partch
- Department of Chemistry and Biochemistry, University of California, Santa Cruz, CA 95064
| | - Lee-Ping Wang
- Department of Chemistry, University of California, Davis, CA 95616
| | - Tobin R Sosnick
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL 60637
| | - Aaron R Dinner
- Department of Chemistry and James Franck Institute, University of Chicago, Chicago, IL 60637
| | - Andy LiWang
- Department of Chemistry and Biochemistry, University of California, Merced, CA 95343
- Center for Cellular and Biomolecular Machines, University of California, Merced, CA 95343
| |
Collapse
|
2
|
Harteveld Z, Van Hall-Beauvais A, Morozova I, Southern J, Goverde C, Georgeon S, Rosset S, Defferrard M, Loukas A, Vandergheynst P, Bronstein MM, Correia BE. Exploring "dark-matter" protein folds using deep learning. Cell Syst 2024; 15:898-910.e5. [PMID: 39383860 DOI: 10.1016/j.cels.2024.09.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Revised: 06/13/2024] [Accepted: 09/16/2024] [Indexed: 10/11/2024]
Abstract
De novo protein design explores uncharted sequence and structure space to generate novel proteins not sampled by evolution. A main challenge in de novo design involves crafting "designable" structural templates to guide the sequence searches toward adopting target structures. We present a convolutional variational autoencoder that learns patterns of protein structure, dubbed Genesis. We coupled Genesis with trRosetta to design sequences for a set of protein folds and found that Genesis is capable of reconstructing native-like distance and angle distributions for five native folds and three novel, the so-called "dark-matter" folds as a demonstration of generalizability. We used a high-throughput assay to characterize the stability of the designs through protease resistance, obtaining encouraging success rates for folded proteins. Genesis enables exploration of the protein fold space within minutes, unrestricted by protein topologies. Our approach addresses the backbone designability problem, showing that small neural networks can efficiently learn structural patterns in proteins. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Zander Harteveld
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Alexandra Van Hall-Beauvais
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Irina Morozova
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | | | - Casper Goverde
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | | | - Stéphane Rosset
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | | | - Andreas Loukas
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland; Prescient Design, gRED, Roche, Basel, Switzerland
| | | | | | - Bruno E Correia
- École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
| |
Collapse
|
3
|
Draizen EJ, Veretnik S, Mura C, Bourne PE. Deep generative models of protein structure uncover distant relationships across a continuous fold space. Nat Commun 2024; 15:8094. [PMID: 39294145 PMCID: PMC11410806 DOI: 10.1038/s41467-024-52020-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Accepted: 08/23/2024] [Indexed: 09/20/2024] Open
Abstract
Our views of fold space implicitly rest upon many assumptions that impact how we analyze, interpret and understand protein structure, function and evolution. For instance, is there an optimal granularity in viewing protein structural similarities (e.g., architecture, topology or some other level)? Similarly, the discrete/continuous dichotomy of fold space is central, but remains unresolved. Discrete views of fold space bin similar folds into distinct, non-overlapping groups; unfortunately, such binning can miss remote relationships. While hierarchical systems like CATH are indispensable resources, less heuristic and more conceptually flexible approaches could enable more nuanced explorations of fold space. Building upon an Urfold model of protein structure, here we present a deep generative modeling framework, termed DeepUrfold, for analyzing protein relationships at scale. DeepUrfold's learned embeddings occupy high-dimensional latent spaces that can be distilled for a given protein in terms of an amalgamated representation uniting sequence, structure and biophysical properties. This approach is structure-guided, versus being purely structure-based, and DeepUrfold learns representations that, in a sense, define superfamilies. Deploying DeepUrfold with CATH reveals evolutionarily-remote relationships that evade existing methodologies, and suggests a mostly-continuous view of fold space-a view that extends beyond simple geometric similarity, towards the realm of integrated sequence ↔ structure ↔ function properties.
Collapse
Affiliation(s)
- Eli J Draizen
- School of Data Science, University of Virginia, Charlottesville, VA, USA.
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, USA.
| | - Stella Veretnik
- School of Data Science, University of Virginia, Charlottesville, VA, USA
| | - Cameron Mura
- School of Data Science, University of Virginia, Charlottesville, VA, USA.
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, USA.
| | - Philip E Bourne
- School of Data Science, University of Virginia, Charlottesville, VA, USA
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, USA
| |
Collapse
|
4
|
Zhang N, Sood D, Guo SC, Chen N, Antoszewski A, Marianchuk T, Chavan A, Dey S, Xiao Y, Hong L, Peng X, Baxa M, Partch C, Wang LP, Sosnick TR, Dinner AR, LiWang A. Temperature-Dependent Fold-Switching Mechanism of the Circadian Clock Protein KaiB. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.21.594594. [PMID: 38826295 PMCID: PMC11142059 DOI: 10.1101/2024.05.21.594594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
The oscillator of the cyanobacterial circadian clock relies on the ability of the KaiB protein to switch reversibly between a stable ground-state fold (gsKaiB) and an unstable fold-switched fold (fsKaiB). Rare fold-switching events by KaiB provide a critical delay in the negative feedback loop of this post-translational oscillator. In this study, we experimentally and computationally investigate the temperature dependence of fold switching and its mechanism. We demonstrate that the stability of gsKaiB increases with temperature compared to fsKaiB and that the Q10 value for the gsKaiB → fsKaiB transition is nearly three times smaller than that for the reverse transition. Simulations and native-state hydrogen-deuterium exchange NMR experiments suggest that fold switching can involve both subglobally and near-globally unfolded intermediates. The simulations predict that the transition state for fold switching coincides with isomerization of conserved prolines in the most rapidly exchanging region, and we confirm experimentally that proline isomerization is a rate-limiting step for fold switching. We explore the implications of our results for temperature compensation, a hallmark of circadian clocks, through a kinetic model.
Collapse
|
5
|
Greener JG, Jamali K. Fast protein structure searching using structure graph embeddings. BIOINFORMATICS ADVANCES 2024; 5:vbaf042. [PMID: 40196750 PMCID: PMC11974391 DOI: 10.1093/bioadv/vbaf042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2025] [Revised: 02/11/2025] [Accepted: 03/03/2025] [Indexed: 04/09/2025]
Abstract
Comparing and searching protein structures independent of primary sequence has proved useful for remote homology detection, function annotation, and protein classification. Fast and accurate methods to search with structures will be essential to make use of the vast databases that have recently become available, in the same way that fast protein sequence searching underpins much of bioinformatics. We train a simple graph neural network using supervised contrastive learning to learn a low-dimensional embedding of protein domains. Availability and implementation The method, called Progres, is available as software at https://github.com/greener-group/progres and as a web server at https://progres.mrc-lmb.cam.ac.uk. It has accuracy comparable to the best current methods and can search the AlphaFold database TED domains in a 10th of a second per query on CPU.
Collapse
Affiliation(s)
- Joe G Greener
- Medical Research Council Laboratory of Molecular Biology, Cambridge, CB2 0QH, United Kingdom
| | - Kiarash Jamali
- Medical Research Council Laboratory of Molecular Biology, Cambridge, CB2 0QH, United Kingdom
| |
Collapse
|
6
|
Schaeffer RD, Zhang J, Medvedev KE, Kinch LN, Cong Q, Grishin NV. ECOD domain classification of 48 whole proteomes from AlphaFold Structure Database using DPAM2. PLoS Comput Biol 2024; 20:e1011586. [PMID: 38416793 PMCID: PMC10927120 DOI: 10.1371/journal.pcbi.1011586] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 03/11/2024] [Accepted: 02/20/2024] [Indexed: 03/01/2024] Open
Abstract
Protein structure prediction has now been deployed widely across several different large protein sets. Large-scale domain annotation of these predictions can aid in the development of biological insights. Using our Evolutionary Classification of Protein Domains (ECOD) from experimental structures as a basis for classification, we describe the detection and cataloging of domains from 48 whole proteomes deposited in the AlphaFold Database. On average, we can provide positive classification (either of domains or other identifiable non-domain regions) for 90% of residues in all proteomes. We classified 746,349 domains from 536,808 proteins comprised of over 226,424,000 amino acid residues. We examine the varying populations of homologous groups in both eukaryotes and bacteria. In addition to containing a higher fraction of disordered regions and unassigned domains, eukaryotes show a higher proportion of repeated proteins, both globular and small repeats. We enumerate those highly populated domains that are shared in both eukaryotes and bacteria, such as the Rossmann domains, TIM barrels, and P-loop domains. Additionally, we compare the sampling of homologous groups from this whole proteome set against our stable ECOD reference and discuss groups that have been enriched by structure predictions. Finally, we discuss the implication of these results for protein target selection for future classification strategies for very large protein sets.
Collapse
Affiliation(s)
- R. Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Jing Zhang
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Kirill E. Medvedev
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Lisa N. Kinch
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Qian Cong
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Nick V. Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| |
Collapse
|
7
|
Bordin N, Lau AM, Orengo C. Large-scale clustering of AlphaFold2 3D models shines light on the structure and function of proteins. Mol Cell 2023; 83:3950-3952. [PMID: 37977115 DOI: 10.1016/j.molcel.2023.10.039] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 10/27/2023] [Accepted: 10/27/2023] [Indexed: 11/19/2023]
Abstract
Two recent studies exploited ultra-fast structural aligners and deep-learning approaches to cluster the protein structure space in the AlphaFold Database. Barrio-Hernandez et al.1 and Durairaj et al.2 uncovered fascinating new protein functions and structural features previously unknown.
Collapse
Affiliation(s)
- Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK.
| | - Andy M Lau
- Department of Computer Science, University College London, London WC1E 6BT, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK.
| |
Collapse
|
8
|
Bruley A, Bitard-Feildel T, Callebaut I, Duprat E. A sequence-based foldability score combined with AlphaFold2 predictions to disentangle the protein order/disorder continuum. Proteins 2023; 91:466-484. [PMID: 36306150 DOI: 10.1002/prot.26441] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 10/14/2022] [Accepted: 10/18/2022] [Indexed: 11/11/2022]
Abstract
Order and disorder govern protein functions, but there is a great diversity in disorder, from regions that are-and stay-fully disordered to conditional order. This diversity is still difficult to decipher even though it is encoded in the amino acid sequences. Here, we developed an analytic Python package, named pyHCA, to estimate the foldability of a protein segment from the only information of its amino acid sequence and based on a measure of its density in regular secondary structures associated with hydrophobic clusters, as defined by the hydrophobic cluster analysis (HCA) approach. The tool was designed by optimizing the separation between foldable segments from databases of disorder (DisProt) and order (SCOPe [soluble domains] and OPM [transmembrane domains]). It allows to specify the ratio between order, embodied by regular secondary structures (either participating in the hydrophobic core of well-folded 3D structures or conditionally formed in intrinsically disordered regions) and disorder. We illustrated the relevance of pyHCA with several examples and applied it to the sequences of the proteomes of 21 species ranging from prokaryotes and archaea to unicellular and multicellular eukaryotes, for which structure models are provided in the AlphaFold protein structure database. Cases of low-confidence scores related to disorder were distinguished from those of sequences that we identified as foldable but are still excluded from accurate modeling by AlphaFold2 due to a lack of sequence homologs or to compositional biases. Overall, our approach is complementary to AlphaFold2, providing guides to map structural innovations through evolutionary processes, at proteome and gene scales.
Collapse
Affiliation(s)
- Apolline Bruley
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Tristan Bitard-Feildel
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Elodie Duprat
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| |
Collapse
|
9
|
Bordin N, Dallago C, Heinzinger M, Kim S, Littmann M, Rauer C, Steinegger M, Rost B, Orengo C. Novel machine learning approaches revolutionize protein knowledge. Trends Biochem Sci 2023; 48:345-359. [PMID: 36504138 PMCID: PMC10570143 DOI: 10.1016/j.tibs.2022.11.001] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 10/24/2022] [Accepted: 11/17/2022] [Indexed: 12/10/2022]
Abstract
Breakthrough methods in machine learning (ML), protein structure prediction, and novel ultrafast structural aligners are revolutionizing structural biology. Obtaining accurate models of proteins and annotating their functions on a large scale is no longer limited by time and resources. The most recent method to be top ranked by the Critical Assessment of Structure Prediction (CASP) assessment, AlphaFold 2 (AF2), is capable of building structural models with an accuracy comparable to that of experimental structures. Annotations of 3D models are keeping pace with the deposition of the structures due to advancements in protein language models (pLMs) and structural aligners that help validate these transferred annotations. In this review we describe how recent developments in ML for protein science are making large-scale structural bioinformatics available to the general scientific community.
Collapse
Affiliation(s)
- Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, Gower St, WC1E 6BT London, UK
| | - Christian Dallago
- Technical University of Munich (TUM) Department of Informatics, Bioinformatics and Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany; VantAI, 151 W 42nd Street, New York, NY 10036, USA
| | - Michael Heinzinger
- Technical University of Munich (TUM) Department of Informatics, Bioinformatics and Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748 Garching, Germany
| | - Stephanie Kim
- School of Biological Sciences, Seoul National University, Seoul, South Korea; Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Maria Littmann
- Technical University of Munich (TUM) Department of Informatics, Bioinformatics and Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
| | - Clemens Rauer
- Institute of Structural and Molecular Biology, University College London, Gower St, WC1E 6BT London, UK
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, South Korea; Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Burkhard Rost
- Technical University of Munich (TUM) Department of Informatics, Bioinformatics and Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany; Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748 Garching/Munich, Germany; TUM School of Life Sciences Weihenstephan (TUM-WZW), Alte Akademie 8, Freising, Germany
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, Gower St, WC1E 6BT London, UK.
| |
Collapse
|
10
|
Chou HH, Hsu CT, Hsu CW, Yao KH, Wang HC, Hsieh SY. Novel Algorithm for Improved Protein Classification Using Graph Similarity. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3135-3143. [PMID: 34748498 DOI: 10.1109/tcbb.2021.3125836] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Considerable sequence data are produced in genome annotation projects that relate to molecular levels, structural similarities, and molecular and biological functions. In structural genomics, the most essential task involves resolving protein structures efficiently with hardware or software, understanding these structures, and assigning their biological functions. Understanding the characteristics and functions of proteins enables the exploration of the molecular mechanisms of life. In this paper, we examine the problems of protein classification. Because they perform similar biological functions, proteins in the same family usually share similar structural characteristics. We employed this premise in designing a classification algorithm. In this algorithm, auxiliary graphs are used to represent proteins, with every amino acid in a protein to a vertex in a graph. Moreover, the links between amino acids correspond to the edges between the vertices. The proposed algorithm classifies proteins according to the similarities in their graphical structures. The proposed algorithm is efficient and accurate in distinguishing proteins from different families and outperformed related algorithms experimentally.
Collapse
|
11
|
Uzoeto HO, Cosmas S, Ajima JN, Arazu AV, Didiugwu CM, Ekpo DE, Ibiang GO, Durojaye OA. Computer-aided molecular modeling and structural analysis of the human centromere protein–HIKM complex. BENI-SUEF UNIVERSITY JOURNAL OF BASIC AND APPLIED SCIENCES 2022. [DOI: 10.1186/s43088-022-00285-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
Protein–peptide and protein–protein interactions play an essential role in different functional and structural cellular organizational aspects. While Cryo-EM and X-ray crystallography generate the most complete structural characterization, most biological interactions exist in biomolecular complexes that are neither compliant nor responsive to direct experimental analysis. The development of computational docking approaches is therefore necessary. This starts from component protein structures to the prediction of their complexes, preferentially with precision close to complex structures generated by X-ray crystallography.
Results
To guarantee faithful chromosomal segregation, there must be a proper assembling of the kinetochore (a protein complex with multiple subunits) at the centromere during the process of cell division. As an important member of the inner kinetochore, defects in any of the subunits making up the CENP-HIKM complex lead to kinetochore dysfunction and an eventual chromosomal mis-segregation and cell death. Previous studies in an attempt to understand the assembly and mechanism devised by the CENP-HIKM in promoting the functionality of the kinetochore have reconstituted the protein complex from different organisms including fungi and yeast. Here, we present a detailed computational model of the physical interactions that exist between each component of the human CENP-HIKM, while validating each modeled structure using orthologs with existing crystal structures from the protein data bank.
Conclusions
Results from this study substantiate the existing hypothesis that the human CENP-HIK complex shares a similar architecture with its fungal and yeast orthologs, and likewise validate the binding mode of CENP-M to the C-terminus of the human CENP-I based on existing experimental reports.
Graphical abstract
Collapse
|
12
|
Torgasheva NA, Diatlova EA, Grin IR, Endutkin AV, Mechetin GV, Vokhtantsev IP, Yudkina AV, Zharkov DO. Noncatalytic Domains in DNA Glycosylases. Int J Mol Sci 2022; 23:ijms23137286. [PMID: 35806289 PMCID: PMC9266487 DOI: 10.3390/ijms23137286] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 06/28/2022] [Accepted: 06/29/2022] [Indexed: 02/04/2023] Open
Abstract
Many proteins consist of two or more structural domains: separate parts that have a defined structure and function. For example, in enzymes, the catalytic activity is often localized in a core fragment, while other domains or disordered parts of the same protein participate in a number of regulatory processes. This situation is often observed in many DNA glycosylases, the proteins that remove damaged nucleobases thus initiating base excision DNA repair. This review covers the present knowledge about the functions and evolution of such noncatalytic parts in DNA glycosylases, mostly concerned with the human enzymes but also considering some unique members of this group coming from plants and prokaryotes.
Collapse
Affiliation(s)
- Natalia A. Torgasheva
- SB RAS Institute of Chemical Biology and Fundamental Medicine, 8 Lavrentieva Avenue, 630090 Novosibirsk, Russia; (N.A.T.); (E.A.D.); (I.R.G.); (A.V.E.); (G.V.M.); (I.P.V.); (A.V.Y.)
| | - Evgeniia A. Diatlova
- SB RAS Institute of Chemical Biology and Fundamental Medicine, 8 Lavrentieva Avenue, 630090 Novosibirsk, Russia; (N.A.T.); (E.A.D.); (I.R.G.); (A.V.E.); (G.V.M.); (I.P.V.); (A.V.Y.)
- Department of Natural Sciences, Novosibirsk State University, 2 Pirogova Street, 630090 Novosibirsk, Russia
| | - Inga R. Grin
- SB RAS Institute of Chemical Biology and Fundamental Medicine, 8 Lavrentieva Avenue, 630090 Novosibirsk, Russia; (N.A.T.); (E.A.D.); (I.R.G.); (A.V.E.); (G.V.M.); (I.P.V.); (A.V.Y.)
| | - Anton V. Endutkin
- SB RAS Institute of Chemical Biology and Fundamental Medicine, 8 Lavrentieva Avenue, 630090 Novosibirsk, Russia; (N.A.T.); (E.A.D.); (I.R.G.); (A.V.E.); (G.V.M.); (I.P.V.); (A.V.Y.)
| | - Grigory V. Mechetin
- SB RAS Institute of Chemical Biology and Fundamental Medicine, 8 Lavrentieva Avenue, 630090 Novosibirsk, Russia; (N.A.T.); (E.A.D.); (I.R.G.); (A.V.E.); (G.V.M.); (I.P.V.); (A.V.Y.)
| | - Ivan P. Vokhtantsev
- SB RAS Institute of Chemical Biology and Fundamental Medicine, 8 Lavrentieva Avenue, 630090 Novosibirsk, Russia; (N.A.T.); (E.A.D.); (I.R.G.); (A.V.E.); (G.V.M.); (I.P.V.); (A.V.Y.)
- Department of Natural Sciences, Novosibirsk State University, 2 Pirogova Street, 630090 Novosibirsk, Russia
| | - Anna V. Yudkina
- SB RAS Institute of Chemical Biology and Fundamental Medicine, 8 Lavrentieva Avenue, 630090 Novosibirsk, Russia; (N.A.T.); (E.A.D.); (I.R.G.); (A.V.E.); (G.V.M.); (I.P.V.); (A.V.Y.)
| | - Dmitry O. Zharkov
- SB RAS Institute of Chemical Biology and Fundamental Medicine, 8 Lavrentieva Avenue, 630090 Novosibirsk, Russia; (N.A.T.); (E.A.D.); (I.R.G.); (A.V.E.); (G.V.M.); (I.P.V.); (A.V.Y.)
- Department of Natural Sciences, Novosibirsk State University, 2 Pirogova Street, 630090 Novosibirsk, Russia
- Correspondence:
| |
Collapse
|
13
|
Srivastava J, Balaji PV. Clues to reaction specificity in
PLP
‐dependent fold type I aminotransferases of monosaccharide biosynthesis. Proteins 2022; 90:1247-1258. [DOI: 10.1002/prot.26305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Accepted: 01/20/2022] [Indexed: 11/10/2022]
Affiliation(s)
- Jaya Srivastava
- Department of Biosciences and Bioengineering Indian Institute of Technology Bombay Mumbai India
| | - Petety V. Balaji
- Department of Biosciences and Bioengineering Indian Institute of Technology Bombay Mumbai India
| |
Collapse
|
14
|
Villegas-Morcillo A, Gomez AM, Sanchez V. An analysis of protein language model embeddings for fold prediction. Brief Bioinform 2022; 23:6571527. [PMID: 35443054 DOI: 10.1093/bib/bbac142] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 03/21/2022] [Accepted: 03/28/2022] [Indexed: 11/13/2022] Open
Abstract
The identification of the protein fold class is a challenging problem in structural biology. Recent computational methods for fold prediction leverage deep learning techniques to extract protein fold-representative embeddings mainly using evolutionary information in the form of multiple sequence alignment (MSA) as input source. In contrast, protein language models (LM) have reshaped the field thanks to their ability to learn efficient protein representations (protein-LM embeddings) from purely sequential information in a self-supervised manner. In this paper, we analyze a framework for protein fold prediction using pre-trained protein-LM embeddings as input to several fine-tuning neural network models, which are supervisedly trained with fold labels. In particular, we compare the performance of six protein-LM embeddings: the long short-term memory-based UniRep and SeqVec, and the transformer-based ESM-1b, ESM-MSA, ProtBERT and ProtT5; as well as three neural networks: Multi-Layer Perceptron, ResCNN-BGRU (RBG) and Light-Attention (LAT). We separately evaluated the pairwise fold recognition (PFR) and direct fold classification (DFC) tasks on well-known benchmark datasets. The results indicate that the combination of transformer-based embeddings, particularly those obtained at amino acid level, with the RBG and LAT fine-tuning models performs remarkably well in both tasks. To further increase prediction accuracy, we propose several ensemble strategies for PFR and DFC, which provide a significant performance boost over the current state-of-the-art results. All this suggests that moving from traditional protein representations to protein-LM embeddings is a very promising approach to protein fold-related tasks.
Collapse
Affiliation(s)
- Amelia Villegas-Morcillo
- Department of Signal Theory, Telematics and Communications, University of Granada, Granada, Spain
| | - Angel M Gomez
- Department of Signal Theory, Telematics and Communications, University of Granada, Granada, Spain
| | - Victoria Sanchez
- Department of Signal Theory, Telematics and Communications, University of Granada, Granada, Spain
| |
Collapse
|
15
|
Lin E, Lin CH, Lane HY. De Novo Peptide and Protein Design Using Generative Adversarial Networks: An Update. J Chem Inf Model 2022; 62:761-774. [DOI: 10.1021/acs.jcim.1c01361] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Eugene Lin
- Department of Biostatistics, University of Washington, Seattle, Washington 98195, United States
- Department of Electrical & Computer Engineering, University of Washington, Seattle, Washington 98195, United States
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung 40402, Taiwan
| | - Chieh-Hsin Lin
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung 40402, Taiwan
- Department of Psychiatry, Kaohsiung Chang Gung Memorial Hospital, Chang Gung University College of Medicine, Kaohsiung 83301, Taiwan
- School of Medicine, Chang Gung University, Taoyuan 33302, Taiwan
| | - Hsien-Yuan Lane
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung 40402, Taiwan
- Department of Psychiatry, China Medical University Hospital, Taichung 40447, Taiwan
- Brain Disease Research Center, China Medical University Hospital, Taichung 40447, Taiwan
- Department of Psychology, College of Medical and Health Sciences, Asia University, Taichung 41354, Taiwan
| |
Collapse
|
16
|
Li DD, Wang JL, Liu Y, Li YZ, Zhang Z. Expanded analyses of the functional correlations within structural classifications of glycoside hydrolases. Comput Struct Biotechnol J 2021; 19:5931-5942. [PMID: 34849197 PMCID: PMC8602953 DOI: 10.1016/j.csbj.2021.10.039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 10/30/2021] [Accepted: 10/30/2021] [Indexed: 01/01/2023] Open
Abstract
Glycoside hydrolases (GHs) are greatly diverse in sequences and functions, but systematic studies of GH relationships based on structural information are lacking. Here, we report that GHs have multiple evolutionary origins and are structurally derived from 27 homologous superfamilies and 16 folds, but GHs are highly biased to distribute in a few superfamilies and folds. Six of these superfamilies are widely encoded by archaea, bacteria, and eukaryotes, indicating that they may be the most ancient in origin. Most superfamilies vary in enzyme function, and some, such as the superfamilies of (β/α)8-barrel and (α/α)6-barrel structures, exhibit extreme functional diversity; this is highly positively correlated with sequence diversity. More than one-third of glycosidase activities show a phenomenon of convergent evolution, especially the degradation functions of GHs on polysaccharides. The GHs of most superfamilies have relatively narrow environmental distributions, normally with the highest abundance in host-associated environments and a distribution preference for moderate low-temperature and acidic environments. Overall, our expanded analysis facilitates an understanding of complex GH sequence-structure-function relationships and may guide our screening and engineering of GHs.
Collapse
Affiliation(s)
- Dan-Dan Li
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China
| | - Jin-Lan Wang
- National Administration of Health Data, Jinan 250002, China
| | - Ya Liu
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China
| | - Yue-Zhong Li
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China
| | - Zheng Zhang
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China.,Suzhou Research Institute, Shandong University, Suzhou 215123, China
| |
Collapse
|
17
|
Villegas-Morcillo A, Gomez AM, Morales-Cordovilla JA, Sanchez V. Protein Fold Recognition From Sequences Using Convolutional and Recurrent Neural Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2848-2854. [PMID: 32750896 DOI: 10.1109/tcbb.2020.3012732] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The identification of a protein fold type from its amino acid sequence provides important insights about the protein 3D structure. In this paper, we propose a deep learning architecture that can process protein residue-level features to address the protein fold recognition task. Our neural network model combines 1D-convolutional layers with gated recurrent unit (GRU) layers. The GRU cells, as recurrent layers, cope with the processing issues associated to the highly variable protein sequence lengths and so extract a fold-related embedding of fixed size for each protein domain. These embeddings are then used to perform the pairwise fold recognition task, which is based on transferring the fold type of the most similar template structure. We compare our model with several template-based and deep learning-based methods from the state-of-the-art. The evaluation results over the well-known LINDAHL and SCOP_TEST sets, along with a proposed LINDAHL test set updated to SCOP 1.75, show that our embeddings perform significantly better than these methods, specially at the fold level. Supplementary material, which can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TCBB.2020.3012732, source code and trained models are available at http://sigmat.ugr.es/~amelia/CNN-GRU-RF+/.
Collapse
|
18
|
Affiliation(s)
- Andy LiWang
- University of California, Merced, California, USA
| | - Lauren L Porter
- National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA.,National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, USA
| | | |
Collapse
|
19
|
Villegas-Morcillo A, Sanchez V, Gomez AM. FoldHSphere: deep hyperspherical embeddings for protein fold recognition. BMC Bioinformatics 2021; 22:490. [PMID: 34641786 PMCID: PMC8507389 DOI: 10.1186/s12859-021-04419-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 09/29/2021] [Indexed: 12/01/2022] Open
Abstract
Background Current state-of-the-art deep learning approaches for protein fold recognition learn protein embeddings that improve prediction performance at the fold level. However, there still exists aperformance gap at the fold level and the (relatively easier) family level, suggesting that it might be possible to learn an embedding space that better represents the protein folds. Results In this paper, we propose the FoldHSphere method to learn a better fold embedding space through a two-stage training procedure. We first obtain prototype vectors for each fold class that are maximally separated in hyperspherical space. We then train a neural network by minimizing the angular large margin cosine loss to learn protein embeddings clustered around the corresponding hyperspherical fold prototypes. Our network architectures, ResCNN-GRU and ResCNN-BGRU, process the input protein sequences by applying several residual-convolutional blocks followed by a gated recurrent unit-based recurrent layer. Evaluation results on the LINDAHL dataset indicate that the use of our hyperspherical embeddings effectively bridges the performance gap at the family and fold levels. Furthermore, our FoldHSpherePro ensemble method yields an accuracy of 81.3% at the fold level, outperforming all the state-of-the-art methods. Conclusions Our methodology is efficient in learning discriminative and fold-representative embeddings for the protein domains. The proposed hyperspherical embeddings are effective at identifying the protein fold class by pairwise comparison, even when amino acid sequence similarities are low. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04419-7.
Collapse
Affiliation(s)
- Amelia Villegas-Morcillo
- Department of Signal Theory, Telematics and Communications, University of Granada, Periodista Daniel Saucedo Aranda, 18071, Granada, Spain.
| | - Victoria Sanchez
- Department of Signal Theory, Telematics and Communications, University of Granada, Periodista Daniel Saucedo Aranda, 18071, Granada, Spain
| | - Angel M Gomez
- Department of Signal Theory, Telematics and Communications, University of Granada, Periodista Daniel Saucedo Aranda, 18071, Granada, Spain
| |
Collapse
|
20
|
LiWang PJ, Wang LP, LiWang A. Resurrected Ancestors Reveal Origins of Metamorphism in XCL1. Trends Biochem Sci 2021; 46:433-434. [PMID: 33752957 DOI: 10.1016/j.tibs.2021.03.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 03/06/2021] [Accepted: 03/09/2021] [Indexed: 10/21/2022]
Abstract
In a recent study, Dishman et al. resurrected ancestors of the metamorphic chemokine, XCL1, inferred through phylogenetics, and found that metamorphism arose in the XCL1 lineage ~150 million years ago. A zigzagging evolutionary path suggests that the metamorphic properties are adaptive and reveals three design principles that could be used for technological applications.
Collapse
Affiliation(s)
- Patricia J LiWang
- Department of Molecular and Cell Biology, University of California, Merced, CA 95343, USA.
| | - Lee-Ping Wang
- Department of Chemistry, University of California, Davis, CA 95616, USA.
| | - Andy LiWang
- Department of Chemistry and Biochemistry, University of California, Merced, CA 95343, USA.
| |
Collapse
|
21
|
Wang CK, Craik DJ. Linking molecular evolution to molecular grafting. J Biol Chem 2021; 296:100425. [PMID: 33600801 PMCID: PMC8005815 DOI: 10.1016/j.jbc.2021.100425] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 02/09/2021] [Accepted: 02/13/2021] [Indexed: 12/01/2022] Open
Abstract
Molecular grafting is a strategy for the engineering of molecular scaffolds into new functional agents, such as next-generation therapeutics. Despite its wide use, studies so far have focused almost exclusively on demonstrating its utility rather than understanding the factors that lead to either poor or successful grafting outcomes. Here, we examine protein evolution and identify parallels between the natural process of protein functional diversification and the artificial process of molecular grafting. We discuss features of natural proteins that are correlated to innovability-the capacity to acquire new functions-and describe their implications to molecular grafting scaffolds. Disulfide-rich peptides are used as exemplars because they are particularly promising scaffolds onto which new functions can be grafted. This article provides a perspective on why some scaffolds are more suitable for grafting than others, identifying opportunities on how molecular grafting might be improved.
Collapse
Affiliation(s)
- Conan K Wang
- Institute for Molecular Bioscience and Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, The University of Queensland, Brisbane, Queensland, Australia.
| | - David J Craik
- Institute for Molecular Bioscience and Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, The University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
22
|
Runthala A. Probabilistic divergence of a template-based modelling methodology from the ideal protocol. J Mol Model 2021; 27:25. [PMID: 33411019 DOI: 10.1007/s00894-020-04640-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 12/09/2020] [Indexed: 12/27/2022]
Abstract
Protein structural information is essential for the detailed mapping of a functional protein network. For a higher modelling accuracy and quicker implementation, template-based algorithms have been extensively deployed and redefined. The methods only assess the predicted structure against its native state/template and do not estimate the accuracy for each modelling step. A divergence measure is therefore postulated to estimate the modelling accuracy against its theoretical optimal benchmark. By freezing the domain boundaries, the divergence measures are predicted for the most crucial steps of a modelling algorithm. To precisely refine the score using weighting constants, big data analysis could further be deployed.
Collapse
Affiliation(s)
- Ashish Runthala
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh, 522502, India.
| |
Collapse
|
23
|
Karimi M, Zhu S, Cao Y, Shen Y. De Novo Protein Design for Novel Folds Using Guided Conditional Wasserstein Generative Adversarial Networks. J Chem Inf Model 2020; 60:5667-5681. [PMID: 32945673 PMCID: PMC7775287 DOI: 10.1021/acs.jcim.0c00593] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Although massive data is quickly accumulating on protein sequence and structure, there is a small and limited number of protein architectural types (or structural folds). This study is addressing the following question: how well could one reveal underlying sequence-structure relationships and design protein sequences for an arbitrary, potentially novel, structural fold? In response to the question, we have developed novel deep generative models, namely, semisupervised gcWGAN (guided, conditional, Wasserstein Generative Adversarial Networks). To overcome training difficulties and improve design qualities, we build our models on conditional Wasserstein GAN (WGAN) that uses Wasserstein distance in the loss function. Our major contributions include (1) constructing a low-dimensional and generalizable representation of the fold space for the conditional input, (2) developing an ultrafast sequence-to-fold predictor (or oracle) and incorporating its feedback into WGAN as a loss to guide model training, and (3) exploiting sequence data with and without paired structures to enable a semisupervised training strategy. Assessed by the oracle over 100 novel folds not in the training set, gcWGAN generates more successful designs and covers 3.5 times more target folds compared to a competing data-driven method (cVAE). Assessed by sequence- and structure-based predictors, gcWGAN designs are physically and biologically sound. Assessed by a structure predictor over representative novel folds, including one not even part of basis folds, gcWGAN designs have comparable or better fold accuracy yet much more sequence diversity and novelty than cVAE. The ultrafast data-driven model is further shown to boost the success of a principle-driven de novo method (RosettaDesign), through generating design seeds and tailoring design space. In conclusion, gcWGAN explores uncharted sequence space to design proteins by learning generalizable principles from current sequence-structure data. Data, source codes, and trained models are available at https://github.com/Shen-Lab/gcWGAN.
Collapse
Affiliation(s)
- Mostafa Karimi
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
- TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, Texas 77840, United States
| | - Shaowen Zhu
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
| | - Yue Cao
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas 77843, United States
- TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, Texas 77840, United States
| |
Collapse
|
24
|
Leone L, Chino M, Nastri F, Maglio O, Pavone V, Lombardi A. Mimochrome, a metalloporphyrin‐based catalytic Swiss knife†. Biotechnol Appl Biochem 2020; 67:495-515. [DOI: 10.1002/bab.1985] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Accepted: 07/09/2020] [Indexed: 12/20/2022]
Affiliation(s)
- Linda Leone
- Department of Chemical Sciences University of Napoli “Federico II” Napoli Italy
| | - Marco Chino
- Department of Chemical Sciences University of Napoli “Federico II” Napoli Italy
| | - Flavia Nastri
- Department of Chemical Sciences University of Napoli “Federico II” Napoli Italy
| | - Ornella Maglio
- Department of Chemical Sciences University of Napoli “Federico II” Napoli Italy
- IBB ‐ National Research Council Napoli Italy
| | - Vincenzo Pavone
- Department of Chemical Sciences University of Napoli “Federico II” Napoli Italy
| | - Angela Lombardi
- Department of Chemical Sciences University of Napoli “Federico II” Napoli Italy
| |
Collapse
|
25
|
Andreani J, Quignot C, Guerois R. Structural prediction of protein interactions and docking using conservation and coevolution. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2020. [DOI: 10.1002/wcms.1470] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Jessica Andreani
- Université Paris‐Saclay CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC) Gif‐sur‐Yvette France
| | - Chloé Quignot
- Université Paris‐Saclay CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC) Gif‐sur‐Yvette France
| | - Raphael Guerois
- Université Paris‐Saclay CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC) Gif‐sur‐Yvette France
| |
Collapse
|
26
|
Beeby M, Ferreira JL, Tripp P, Albers SV, Mitchell DR. Propulsive nanomachines: the convergent evolution of archaella, flagella and cilia. FEMS Microbiol Rev 2020; 44:253-304. [DOI: 10.1093/femsre/fuaa006] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Accepted: 03/06/2020] [Indexed: 02/06/2023] Open
Abstract
ABSTRACT
Echoing the repeated convergent evolution of flight and vision in large eukaryotes, propulsive swimming motility has evolved independently in microbes in each of the three domains of life. Filamentous appendages – archaella in Archaea, flagella in Bacteria and cilia in Eukaryotes – wave, whip or rotate to propel microbes, overcoming diffusion and enabling colonization of new environments. The implementations of the three propulsive nanomachines are distinct, however: archaella and flagella rotate, while cilia beat or wave; flagella and cilia assemble at their tips, while archaella assemble at their base; archaella and cilia use ATP for motility, while flagella use ion-motive force. These underlying differences reflect the tinkering required to evolve a molecular machine, in which pre-existing machines in the appropriate contexts were iteratively co-opted for new functions and whose origins are reflected in their resultant mechanisms. Contemporary homologies suggest that archaella evolved from a non-rotary pilus, flagella from a non-rotary appendage or secretion system, and cilia from a passive sensory structure. Here, we review the structure, assembly, mechanism and homologies of the three distinct solutions as a foundation to better understand how propulsive nanomachines evolved three times independently and to highlight principles of molecular evolution.
Collapse
Affiliation(s)
- Morgan Beeby
- Department of Life Sciences, Frankland Road, Imperial College of London, London, SW7 2AZ, UK
| | - Josie L Ferreira
- Department of Life Sciences, Frankland Road, Imperial College of London, London, SW7 2AZ, UK
| | - Patrick Tripp
- Molecular Biology of Archaea, Institute of Biology, University of Freiburg, Schaenzlestrasse 1, 79211 Freiburg, Germany
| | - Sonja-Verena Albers
- Molecular Biology of Archaea, Institute of Biology, University of Freiburg, Schaenzlestrasse 1, 79211 Freiburg, Germany
| | - David R Mitchell
- Department of Cell and Developmental Biology, SUNY Upstate Medical University, 750 E. Adams St., Syracuse, NY 13210, USA
| |
Collapse
|
27
|
Reading Targeted DNA Damage in the Active Demethylation Pathway: Role of Accessory Domains of Eukaryotic AP Endonucleases and Thymine-DNA Glycosylases. J Mol Biol 2020:S0022-2836(19)30720-X. [DOI: 10.1016/j.jmb.2019.12.020] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Revised: 11/24/2019] [Accepted: 12/05/2019] [Indexed: 01/07/2023]
|
28
|
Heo L, Feig M. High-accuracy protein structures by combining machine-learning with physics-based refinement. Proteins 2019; 88:637-642. [PMID: 31693199 DOI: 10.1002/prot.25847] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Revised: 10/05/2019] [Accepted: 11/03/2019] [Indexed: 12/16/2022]
Abstract
Protein structure prediction has long been available as an alternative to experimental structure determination, especially via homology modeling based on templates from related sequences. Recently, models based on distance restraints from coevolutionary analysis via machine learning to have significantly expanded the ability to predict structures for sequences without templates. One such method, AlphaFold, also performs well on sequences where templates are available but without using such information directly. Here we show that combining machine-learning based models from AlphaFold with state-of-the-art physics-based refinement via molecular dynamics simulations further improves predictions to outperform any other prediction method tested during the latest round of CASP. The resulting models have highly accurate global and local structures, including high accuracy at functionally important interface residues, and they are highly suitable as initial models for crystal structure determination via molecular replacement.
Collapse
Affiliation(s)
- Lim Heo
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan
| |
Collapse
|
29
|
Verma R, Pandit SB. Unraveling the structural landscape of intra-chain domain interfaces: Implication in the evolution of domain-domain interactions. PLoS One 2019; 14:e0220336. [PMID: 31374091 PMCID: PMC6677297 DOI: 10.1371/journal.pone.0220336] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 07/12/2019] [Indexed: 12/22/2022] Open
Abstract
Intra-chain domain interactions are known to play a significant role in the function and stability of multidomain proteins. These interactions are mediated through a physical interaction at domain-domain interfaces (DDIs). With a motivation to understand evolution of interfaces, we have investigated similarities among DDIs. Even though interfaces of protein-protein interactions (PPIs) have been previously studied by structurally aligning interfaces, similar analyses have not yet been performed on DDIs of either multidomain proteins or PPIs. For studying the structural landscape of DDIs, we have used iAlign to structurally align intra-chain domain interfaces of domains. The interface alignment of spatially constrained domains (due to inter-domain linkers) showed that ~88% of these could identify a structural matching interface having similar C-alpha geometry and contact pattern despite that aligned domain pairs are not structurally related. Moreover, the mean interface similarity score (IS-score) is 0.307, which is higher compared to the average random IS-score (0.207) suggesting domain interfaces are not random. The structural space of DDIs is highly connected as ~84% of all possible directed edges among interfaces are found to have at most path length of 8 when 0.26 is IS-score threshold. At this threshold, ~83% of interfaces form the largest strongly connected component. Thus, suggesting that structural space of intra-chain domain interfaces is degenerate and highly connected, as has been found in PPI interfaces. Interestingly, searching for structural neighbors of inter-chain interfaces among intra-chain interfaces showed that ~86% could find a statistically significant match to intra-chain interface with a mean IS-score of 0.311. This implies that domain interfaces are degenerate whether formed within a protein or between proteins. The interface degeneracy is most likely due to limited possible ways of packing secondary structures. In principle, interface similarities can be exploited to accurately model domain interfaces in structure prediction of multidomain proteins.
Collapse
Affiliation(s)
- Rivi Verma
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, India
| | - Shashi Bhushan Pandit
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, India
- * E-mail:
| |
Collapse
|
30
|
Baiesi M, Orlandini E, Seno F, Trovato A. Sequence and structural patterns detected in entangled proteins reveal the importance of co-translational folding. Sci Rep 2019; 9:8426. [PMID: 31182755 PMCID: PMC6557820 DOI: 10.1038/s41598-019-44928-3] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Accepted: 05/23/2019] [Indexed: 11/09/2022] Open
Abstract
Proteins must fold quickly to acquire their biologically functional three-dimensional native structures. Hence, these are mainly stabilized by local contacts, while intricate topologies such as knots are rare. Here, we reveal the existence of specific patterns adopted by protein sequences and structures to deal with backbone self-entanglement. A large scale analysis of the Protein Data Bank shows that loops significantly intertwined with another chain portion are typically closed by weakly bound amino acids. Why is this energetic frustration maintained? A possible picture is that entangled loops are formed only toward the end of the folding process to avoid kinetic traps. Consistently, these loops are more frequently found to be wrapped around a portion of the chain on their N-terminal side, the one translated earlier at the ribosome. Finally, these motifs are less abundant in natural native states than in simulated protein-like structures, yet they appear in 32% of proteins, which in some cases display an amazingly complex intertwining.
Collapse
Affiliation(s)
- Marco Baiesi
- Department of Physics and Astronomy, University of Padova, Via Marzolo 8, I-35131, Padova, Italy
- INFN, Sezione di Padova, Via Marzolo 8, I-35131, Padova, Italy
| | - Enzo Orlandini
- Department of Physics and Astronomy, University of Padova, Via Marzolo 8, I-35131, Padova, Italy
- INFN, Sezione di Padova, Via Marzolo 8, I-35131, Padova, Italy
| | - Flavio Seno
- Department of Physics and Astronomy, University of Padova, Via Marzolo 8, I-35131, Padova, Italy.
- INFN, Sezione di Padova, Via Marzolo 8, I-35131, Padova, Italy.
| | - Antonio Trovato
- Department of Physics and Astronomy, University of Padova, Via Marzolo 8, I-35131, Padova, Italy
- INFN, Sezione di Padova, Via Marzolo 8, I-35131, Padova, Italy
| |
Collapse
|
31
|
Sequence Pattern for Supersecondary Structure of Sandwich-Like Proteins. Methods Mol Biol 2019. [PMID: 30945226 DOI: 10.1007/978-1-4939-9161-7_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
The goal is to define sequence characteristics of beta-sandwich proteins that are unique for the beta-sandwich supersecondary structure (SSS). Finding of the conserved residues that are critical for protein structure can often be accomplished with homology methods, but these methods are not always adequate as residues with similar structural role do not always occupy the same position as determined by sequence alignment. In this paper, we show how to identify residues that play the same structural role in the different proteins of the same SSS, even when these residue positions cannot be aligned with sequence alignment methods. The SSS characteristics are (a) a set of positions in each strand that are involved in the formation of a hydrophobic core, residue content, and correlations of residues at these key positions, (b) maximum allowable number of "low-frequency residues" for each strand, (c) minimum allowed number of "high-frequency" residues for each loop, and (d) minimum and maximum lengths of each loop. These sequence characteristics are referred to as "sequence pattern" for their respective SSS. The high specificity and sensitivity for a particular SSS are confirmed by applying this pattern to all protein structures in the SCOP data bank. We present here the pattern for one of the most common SSS of beta-sandwich proteins.
Collapse
|
32
|
Catazaro J, Caprez A, Swanson D, Powers R. Functional Evolution of Proteins. Proteins 2019; 87:492-501. [PMID: 30714210 DOI: 10.1002/prot.25670] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Revised: 11/02/2018] [Accepted: 01/31/2019] [Indexed: 11/12/2022]
Abstract
The functional evolution of proteins advances through gene duplication followed by functional drift, whereas molecular evolution occurs through random mutational events. Over time, protein active-site structures or functional epitopes remain highly conserved, which enables relationships to be inferred between distant orthologs or paralogs. In this study, we present the first functional clustering and evolutionary analysis of the RCSB Protein Data Bank (RCSB PDB) based on similarities between active-site structures. All of the ligand-bound proteins within the RCSB PDB were scored using our Comparison of Protein Active-site Structures (CPASS) software and database (http://cpass.unl.edu/). Principal component analysis was then used to identify 4431 representative structures to construct a phylogenetic tree based on the CPASS comparative scores (http://itol.embl.de/shared/jcatazaro). The resulting phylogenetic tree identified a sequential, step-wise evolution of protein active-sites and provides novel insights into the emergence of protein function or changes in substrate specificity based on subtle changes in geometry and amino acid composition.
Collapse
Affiliation(s)
- Jonathan Catazaro
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln, Nebraska
| | - Adam Caprez
- Holland Computing Center, Office of Research, University of Nebraska-Lincoln, Lincoln, Nebraska
| | - David Swanson
- Holland Computing Center, Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska
| | - Robert Powers
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln, Nebraska.,Department of Chemistry, Nebraska Center for Integrated Biomolecular Communication, Lincoln, Nebraska
| |
Collapse
|
33
|
Experimental accuracy in protein structure refinement via molecular dynamics simulations. Proc Natl Acad Sci U S A 2018; 115:13276-13281. [PMID: 30530696 DOI: 10.1073/pnas.1811364115] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Refinement is the last step in protein structure prediction pipelines to convert approximate homology models to experimental accuracy. Protocols based on molecular dynamics (MD) simulations have shown promise, but current methods are limited to moderate levels of consistent refinement. To explore the energy landscape between homology models and native structures and analyze the challenges of MD-based refinement, eight test cases were studied via extensive simulations followed by Markov state modeling. In all cases, native states were found very close to the experimental structures and at the lowest free energies, but refinement was hindered by a rough energy landscape. Transitions from the homology model to the native states require the crossing of significant kinetic barriers on at least microsecond time scales. A significant energetic driving force toward the native state was lacking until its immediate vicinity, and there was significant sampling of off-pathway states competing for productive refinement. The role of recent force field improvements is discussed and transition paths are analyzed in detail to inform which key transitions have to be overcome to achieve successful refinement.
Collapse
|
34
|
Navigating Among Known Structures in Protein Space. Methods Mol Biol 2018. [PMID: 30298400 DOI: 10.1007/978-1-4939-8736-8_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Present-day protein space is the result of 3.7 billion years of evolution, constrained by the underlying physicochemical qualities of the proteins. It is difficult to differentiate between evolutionary traces and effects of physicochemical constraints. Nonetheless, as a rule of thumb, instances of structural reuse, or focusing on structural similarity, are likely attributable to physicochemical constraints, whereas sequence reuse, or focusing on sequence similarity, may be more indicative of evolutionary relationships. Both types of relationships have been studied and can provide meaningful insights to protein biophysics and evolution, which in turn can lead to better algorithms for protein search, annotation, and maybe even design.In broad strokes, studies of protein space vary in the entities they represent, the similarity measure comparing these entities, and the representation used. The entities can be, for example, protein chains, domains, supra-domains, or smaller protein sub-parts denoted themes. The measures of similarity between the entities can be based on sequence, structure, function, or any combination of these. The representation can be global, encompassing the whole space, or local, focusing on a particular region surrounding protein(s) of interest. Global representations include lists of grouped proteins, protein networks, and maps. Networks are the abstraction that is derived most directly from the similarity data: each node is the protein entity (e.g., a domain), and edges connect similar domains. Selecting the entities, the similarity measure, and the abstraction are three intertwined decisions: the similarity measures allow us to identify the entities, and the selection of entities influences what is a meaningful similarity measure. Similarly, we seek entities that are related to each other in a way, for which a simple representation describes their relationships succinctly and accurately. This chapter will cover studies that rely on different entities, similarity measures, and a range of representations to better understand protein structure space. Scholars may use publicly available navigators offering a global representation, and in particular the hierarchical classifications SCOP, CATH, and ECOD, or a local representation, which encompass structural alignment algorithms. Alternatively, scholars can configure their own navigator using existing tools. To demonstrate this DIY (do it yourself) approach for navigating in protein space, we investigate substrate-binding proteins. By presenting sequence similarities among this large and diverse protein family as a network, we can infer that one member (pdb ID 4ntl; of yet unknown function) may bind methionine and suggest a putative binding mechanism.
Collapse
|
35
|
Hu G, Wang K, Song J, Uversky VN, Kurgan L. Taxonomic Landscape of the Dark Proteomes: Whole-Proteome Scale Interplay Between Structural Darkness, Intrinsic Disorder, and Crystallization Propensity. Proteomics 2018; 18:e1800243. [PMID: 30198635 DOI: 10.1002/pmic.201800243] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Revised: 08/30/2018] [Indexed: 12/14/2022]
Abstract
Growth rate of the protein sequence universe dramatically exceeds the speed of expansion for the protein structure universe, generating an immense dark proteome that includes proteins with unknown structure. A whole-proteome scale analysis of 5.4 million proteins from 987 proteomes in the three domains of life and viruses to systematically dissect an interplay between structural coverage, degree of putative intrinsic disorder, and predicted propensity for structure determination is performed. It has been found that Archaean and Bacterial proteomes have relatively high structural coverage and low amounts of disorder, whereas Eukaryotic and Viral proteomes are characterized by a broad spread of structural coverage and higher disorder levels. The analysis reveals that dark proteomes (i.e., proteomes containing high fractions of proteins with unknown structure) have significantly elevated amounts of intrinsic disorder and are predicted to be difficult to solve structurally. Although the majority of dark proteomes are of viral origin, many dark viral proteomes have at least modest crystallization propensity and only a handful of them are enriched in the intrinsic disorder. The disorder, structural coverage, and propensity are mapped for structural determination onto a novel proteome-level sequence similarity network to analyze the interplay of these characteristics in the taxonomic landscape.
Collapse
Affiliation(s)
- Gang Hu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, P. R. China
| | - Kui Wang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, P. R. China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, 33612, USA.,Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, 142290, Russia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
36
|
Abstract
The vast, mostly unknown protein universe can be explored by analyzing protein sequences as a string of domains. A broader coverage can be achieved when these domains, the essential blocks in protein evolution, are detected using sequence profiles. Using clustering to collapse redundant profiles into unique function words (UFWs), we find that over the years 2009–2016, the number of UFWs saturates while the number of sequences matched by a combination of two or more UFWs grows exponentially. Between 2009 and 2016 the number of protein sequences from known species increased 10-fold from 8 million to 85 million. About 80% of these sequences contain at least one region recognized by the conserved domain architecture retrieval tool (CDART) as a sequence motif. Motifs provide clues to biological function but CDART often matches the same region of a protein by two or more profiles. Such synonyms complicate estimates of functional complexity. We do full-linkage clustering of redundant profiles by finding maximum disjoint cliques: Each cluster is replaced by a single representative profile to give what we term a unique function word (UFW). From 2009 to 2016, the number of sequence profiles used by CDART increased by 80%; the number of UFWs increased more slowly by 30%, indicating that the number of UFWs may be saturating. The number of sequences matched by a single UFW (sequences with single domain architectures) increased as slowly as the number of different words, whereas the number of sequences matched by a combination of two or more UFWs in sequences with multiple domain architectures (MDAs) increased at the same rate as the total number of sequences. This combinatorial arrangement of a limited number of UFWs in MDAs accounts for the genomic diversity of protein sequences. Although eukaryotes and prokaryotes use very similar sets of “words” or UFWs (57% shared), the “sentences” (MDAs) are different (1.3% shared).
Collapse
|
37
|
C L B, S Nair A. Benchmark Dataset for Whole Genome Sequence Compression. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1228-1236. [PMID: 27214907 DOI: 10.1109/tcbb.2016.2568186] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
UNLABELLED The research in DNA data compression lacks a standard dataset to test out compression tools specific to DNA. This paper argues that the current state of achievement in DNA compression is unable to be benchmarked in the absence of such scientifically compiled whole genome sequence dataset and proposes a benchmark dataset using multistage sampling procedure. Considering the genome sequence of organisms available in the National Centre for Biotechnology and Information (NCBI) as the universe, the proposed dataset selects 1,105 prokaryotes, 200 plasmids, 164 viruses, and 65 eukaryotes. This paper reports the results of using three established tools on the newly compiled dataset and show that their strength and weakness are evident only with a comparison based on the scientifically compiled benchmark dataset. AVAILABILITY The sample dataset and the respective links are available @ https://sourceforge.net/projects/benchmarkdnacompressiondataset/.
Collapse
|
38
|
Complex evolutionary footprints revealed in an analysis of reused protein segments of diverse lengths. Proc Natl Acad Sci U S A 2017; 114:11703-11708. [PMID: 29078314 PMCID: PMC5676897 DOI: 10.1073/pnas.1707642114] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
We question a central paradigm: namely, that the protein domain is the “atomic unit” of evolution. In conflict with the current textbook view, our results unequivocally show that duplication of protein segments happens both above and below the domain level among amino acid segments of diverse lengths. Indeed, we show that significant evolutionary information is lost when the protein is approached as a string of domains. Our finer-grained approach reveals a far more complicated picture, where reused segments often intertwine and overlap with each other. Our results are consistent with a recursive model of evolution, in which segments of various lengths, typically smaller than domains, “hop” between environments. The fit segments remain, leaving traces that can still be detected. Proteins share similar segments with one another. Such “reused parts”—which have been successfully incorporated into other proteins—are likely to offer an evolutionary advantage over de novo evolved segments, as most of the latter will not even have the capacity to fold. To systematically explore the evolutionary traces of segment “reuse” across proteins, we developed an automated methodology that identifies reused segments from protein alignments. We search for “themes”—segments of at least 35 residues of similar sequence and structure—reused within representative sets of 15,016 domains [Evolutionary Classification of Protein Domains (ECOD) database] or 20,398 chains [Protein Data Bank (PDB)]. We observe that theme reuse is highly prevalent and that reuse is more extensive when the length threshold for identifying a theme is lower. Structural domains, the best characterized form of reuse in proteins, are just one of many complex and intertwined evolutionary traces. Others include long themes shared among a few proteins, which encompass and overlap with shorter themes that recur in numerous proteins. The observed complexity is consistent with evolution by duplication and divergence, and some of the themes might include descendants of ancestral segments. The observed recursive footprints, where the same amino acid can simultaneously participate in several intertwined themes, could be a useful concept for protein design. Data are available at http://trachel-srv.cs.haifa.ac.il/rachel/ppi/themes/.
Collapse
|
39
|
Ahrens JB, Nunez-Castilla J, Siltberg-Liberles J. Evolution of intrinsic disorder in eukaryotic proteins. Cell Mol Life Sci 2017; 74:3163-3174. [PMID: 28597295 PMCID: PMC11107722 DOI: 10.1007/s00018-017-2559-0] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Accepted: 06/01/2017] [Indexed: 12/23/2022]
Abstract
Conformational flexibility conferred though regions of intrinsic structural disorder allows proteins to behave as dynamic molecules. While it is well-known that intrinsically disordered regions can undergo disorder-to-order transitions in real-time as part of their function, we also are beginning to learn more about the dynamics of disorder-to-order transitions along evolutionary time-scales. Intrinsically disordered regions endow proteins with functional promiscuity, which is further enhanced by the ability of some of these regions to undergo real-time disorder-to-order transitions. Disorder content affects gene retention after whole genome duplication, but it is not necessarily conserved. Altered patterns of disorder resulting from evolutionary disorder-to-order transitions indicate that disorder evolves to modify function through refining stability, regulation, and interactions. Here, we review the evolution of intrinsically disordered regions in eukaryotic proteins. We discuss the interplay between secondary structure and disorder on evolutionary time-scales, the importance of disorder for eukaryotic proteome expansion and functional divergence, and the evolutionary dynamics of disorder.
Collapse
Affiliation(s)
- Joseph B Ahrens
- Department of Biological Sciences, Biomolecular Sciences Institute, Florida International University, 11200 SW 8th St, Miami, FL, 33199, USA
| | - Janelle Nunez-Castilla
- Department of Biological Sciences, Biomolecular Sciences Institute, Florida International University, 11200 SW 8th St, Miami, FL, 33199, USA
| | - Jessica Siltberg-Liberles
- Department of Biological Sciences, Biomolecular Sciences Institute, Florida International University, 11200 SW 8th St, Miami, FL, 33199, USA.
| |
Collapse
|
40
|
Levy Y. Protein Assembly and Building Blocks: Beyond the Limits of the LEGO Brick Metaphor. Biochemistry 2017; 56:5040-5048. [PMID: 28809494 DOI: 10.1021/acs.biochem.7b00666] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Proteins, like other biomolecules, have a modular and hierarchical structure. Various building blocks are used to construct proteins of high structural complexity and diverse functionality. In multidomain proteins, for example, domains are fused to each other in different combinations to achieve different functions. Although the LEGO brick metaphor is justified as a means of simplifying the complexity of three-dimensional protein structures, several fundamental properties (such as allostery or the induced-fit mechanism) make deviation from it necessary to respect the plasticity, softness, and cross-talk that are essential to protein function. In this work, we illustrate recently reported protein behavior in multidomain proteins that deviates from the LEGO brick analogy. While earlier studies showed that a protein domain is often unaffected by being fused to another domain or becomes more stable following the formation of a new interface between the tethered domains, destabilization due to tethering has been reported for several systems. We illustrate that tethering may sometimes result in a multidomain protein behaving as "less than the sum of its parts". We survey these cases for which structure additivity does not guarantee thermodynamic additivity. Protein destabilization due to fusion to other domains may be linked in some cases to biological function and should be taken into account when designing large assemblies.
Collapse
Affiliation(s)
- Yaakov Levy
- Department of Structural Biology, Weizmann Institute of Science , Rehovot 76100, Israel
| |
Collapse
|
41
|
Ghosh A, Ostrander JS, Zanni MT. Watching Proteins Wiggle: Mapping Structures with Two-Dimensional Infrared Spectroscopy. Chem Rev 2017; 117:10726-10759. [PMID: 28060489 PMCID: PMC5500453 DOI: 10.1021/acs.chemrev.6b00582] [Citation(s) in RCA: 192] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Proteins exhibit structural fluctuations over decades of time scales. From the picosecond side chain motions to aggregates that form over the course of minutes, characterizing protein structure over these vast lengths of time is important to understanding their function. In the past 15 years, two-dimensional infrared spectroscopy (2D IR) has been established as a versatile tool that can uniquely probe proteins structures on many time scales. In this review, we present some of the basic principles behind 2D IR and show how they have, and can, impact the field of protein biophysics. We highlight experiments in which 2D IR spectroscopy has provided structural and dynamical data that would be difficult to obtain with more standard structural biology techniques. We also highlight technological developments in 2D IR that continue to expand the scope of scientific problems that can be accessed in the biomedical sciences.
Collapse
Affiliation(s)
| | - Joshua S. Ostrander
- Department of Chemistry, University of Wisconsin—Madison, Madison, Wisconsin 53706, United States
| | - Martin T. Zanni
- Department of Chemistry, University of Wisconsin—Madison, Madison, Wisconsin 53706, United States
| |
Collapse
|
42
|
Olivares-Quiroz L. Protein folding and unfolding pathways: The role of energy barriers, configurational entropy and internal energy: Comment on "There and back again: Two views on the protein folding puzzle" by Alexei V. Finkelstein et al. Phys Life Rev 2017; 21:75-76. [PMID: 28602717 DOI: 10.1016/j.plrev.2017.06.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Accepted: 06/02/2017] [Indexed: 12/01/2022]
Affiliation(s)
- L Olivares-Quiroz
- Physics and Complex Systems Department, Universidad Autonoma de la Ciudad de Mexico Campus SLT, Calle Prolongación San Isidro No. 151, Colonia San Lorenzo Tezonco, Delegación Iztapalapa, Ciudad de México, C.P. 09790, Mexico.
| |
Collapse
|
43
|
Repurposing proteins for new bioinorganic functions. Essays Biochem 2017; 61:245-258. [DOI: 10.1042/ebc20160068] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Revised: 01/17/2017] [Accepted: 01/23/2017] [Indexed: 02/06/2023]
Abstract
Inspired by the remarkable sophistication and complexity of natural metalloproteins, the field of protein design and engineering has traditionally sought to understand and recapitulate the design principles that underlie the interplay between metals and protein scaffolds. Yet, some recent efforts in the field demonstrate that it is possible to create new metalloproteins with structural, functional and physico-chemical properties that transcend evolutionary boundaries. This essay aims to highlight some of these efforts and draw attention to the ever-expanding scope of bioinorganic chemistry and its new connections to synthetic biology, biotechnology, supramolecular chemistry and materials engineering.
Collapse
|
44
|
Exploring the dark foldable proteome by considering hydrophobic amino acids topology. Sci Rep 2017; 7:41425. [PMID: 28134276 PMCID: PMC5278394 DOI: 10.1038/srep41425] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Accepted: 12/19/2016] [Indexed: 12/18/2022] Open
Abstract
The protein universe corresponds to the set of all proteins found in all organisms. A way to explore it is by taking into account the domain content of the proteins. However, some part of sequences and many entire sequences remain un-annotated despite a converging number of domain families. The un-annotated part of the protein universe is referred to as the dark proteome and remains poorly characterized. In this study, we quantify the amount of foldable domains within the dark proteome by using the hydrophobic cluster analysis methodology. These un-annotated foldable domains were grouped using a combination of remote homology searches and domain annotations, leading to define different levels of darkness. The dark foldable domains were analyzed to understand what make them different from domains stored in databases and thus difficult to annotate. The un-annotated domains of the dark proteome universe display specific features relative to database domains: shorter length, non-canonical content and particular topology in hydrophobic residues, higher propensity for disorder, and a higher energy. These features make them hard to relate to known families. Based on these observations, we emphasize that domain annotation methodologies can still be improved to fully apprehend and decipher the molecular evolution of the protein universe.
Collapse
|
45
|
|
46
|
Abstract
Proteins are the workhorses of the cell and, over billions of years, they have evolved an amazing plethora of extremely diverse and versatile structures with equally diverse functions. Evolutionary emergence of new proteins and transitions between existing ones are believed to be rare or even impossible. However, recent advances in comparative genomics have repeatedly called some 10%-30% of all genes without any detectable similarity to existing proteins. Even after careful scrutiny, some of those orphan genes contain protein coding reading frames with detectable transcription and translation. Thus some proteins seem to have emerged from previously non-coding 'dark genomic matter'. These 'de novo' proteins tend to be disordered, fast evolving, weakly expressed but also rapidly assuming novel and physiologically important functions. Here we review mechanisms by which 'de novo' proteins might be created, under which circumstances they may become fixed and why they are elusive. We propose a 'grow slow and moult' model in which first a reading frame is extended, coding for an initially disordered and non-globular appendage which, over time, becomes more structured and may also become associated with other proteins.
Collapse
|
47
|
Serrano P, Dutta SK, Proudfoot A, Mohanty B, Susac L, Martin B, Geralt M, Jaroszewski L, Godzik A, Elsliger M, Wilson IA, Wüthrich K. NMR in structural genomics to increase structural coverage of the protein universe: Delivered by Prof. Kurt Wüthrich on 7 July 2013 at the 38th FEBS Congress in St. Petersburg, Russia. FEBS J 2016; 283:3870-3881. [PMID: 27154589 DOI: 10.1111/febs.13751] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2016] [Revised: 04/12/2016] [Accepted: 05/04/2016] [Indexed: 12/12/2022]
Abstract
For more than a decade, the Joint Center for Structural Genomics (JCSG; www.jcsg.org) worked toward increased three-dimensional structure coverage of the protein universe. This coordinated quest was one of the main goals of the four high-throughput (HT) structure determination centers of the Protein Structure Initiative (PSI; www.nigms.nih.gov/Research/specificareas/PSI). To achieve the goals of the PSI, the JCSG made use of the complementarity of structure determination by X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy to increase and diversify the range of targets entering the HT structure determination pipeline. The overall strategy, for both techniques, was to determine atomic resolution structures for representatives of large protein families, as defined by the Pfam database, which had no structural coverage and could make significant contributions to biological and biomedical research. Furthermore, the experimental structures could be leveraged by homology modeling to further expand the structural coverage of the protein universe and increase biological insights. Here, we describe what could be achieved by this structural genomics approach, using as an illustration the contributions from 20 NMR structure determinations out of a total of 98 JCSG NMR structures, which were selected because they are the first three-dimensional structure representations of the respective Pfam protein families. The information from this small sample is representative for the overall results from crystal and NMR structure determination in the JCSG. There are five new folds, which were classified as domains of unknown functions (DUF), three of the proteins could be functionally annotated based on three-dimensional structure similarity with previously characterized proteins, and 12 proteins showed only limited similarity with previous deposits in the Protein Data Bank (PDB) and were classified as DUFs.
Collapse
Affiliation(s)
- Pedro Serrano
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Samit K Dutta
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Andrew Proudfoot
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Biswaranjan Mohanty
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA.,Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Lukas Susac
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Bryan Martin
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Michael Geralt
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Lukasz Jaroszewski
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Program on Bioinformatics and Systems Biology, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, USA
| | - Adam Godzik
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Program on Bioinformatics and Systems Biology, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, USA
| | - Marc Elsliger
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Ian A Wilson
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA.,Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Kurt Wüthrich
- Joint Center for Structural Genomics, The Scripps Research Institute, La Jolla, CA, USA.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA.,Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, CA, USA
| |
Collapse
|
48
|
Using natural sequences and modularity to design common and novel protein topologies. Curr Opin Struct Biol 2016; 38:26-36. [PMID: 27270240 DOI: 10.1016/j.sbi.2016.05.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Revised: 05/13/2016] [Accepted: 05/18/2016] [Indexed: 02/07/2023]
Abstract
Protein design is still a challenging undertaking, often requiring multiple attempts or iterations for success. Typically, the source of failure is unclear, and scoring metrics appear similar between successful and failed cases. Nevertheless, the use of sequence statistics, modularity and symmetry from natural proteins, combined with computational design both at the coarse-grained and atomistic levels is propelling a new wave of design efforts to success. Here we highlight recent examples of design, showing how the wealth of natural protein sequence and topology data may be leveraged to reduce the search space and increase the likelihood of achieving desired outcomes.
Collapse
|
49
|
Xu J, Zhang J. Impact of structure space continuity on protein fold classification. Sci Rep 2016; 6:23263. [PMID: 27006112 PMCID: PMC4804218 DOI: 10.1038/srep23263] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2015] [Accepted: 03/03/2016] [Indexed: 11/09/2022] Open
Abstract
Protein structure classification hierarchically clusters domain structures based on structure and/or sequence similarities and plays important roles in the study of protein structure-function relationship and protein evolution. Among many classifications, SCOP and CATH are widely viewed as the gold standards. Fold classification is of special interest because this is the lowest level of classification that does not depend on protein sequence similarity. The current fold classifications such as those in SCOP and CATH are controversial because they implicitly assume that folds are discrete islands in the structure space, whereas increasing evidence suggests significant similarities among folds and supports a continuous fold space. Although this problem is widely recognized, its impact on fold classification has not been quantitatively evaluated. Here we develop a likelihood method to classify a domain into the existing folds of CATH or SCOP using both query-fold structure similarities and within-fold structure heterogeneities. The new classification differs from the original classification for 3.4-12% of domains, depending on factors such as the structure similarity score and original classification scheme used. Because these factors differ for different biological purposes, our results indicate that the importance of considering structure space continuity in fold classification depends on the specific question asked.
Collapse
Affiliation(s)
- Jinrui Xu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
50
|
Fox NK, Brenner SE, Chandonia JM. The value of protein structure classification information-Surveying the scientific literature. Proteins 2015; 83:2025-38. [PMID: 26313554 PMCID: PMC4609302 DOI: 10.1002/prot.24915] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2015] [Revised: 08/06/2015] [Accepted: 08/18/2015] [Indexed: 11/08/2022]
Abstract
The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP-extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012-2013 that cite SCOP, 439 actually use data from the resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non-SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings.
Collapse
Affiliation(s)
- Naomi K Fox
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720
| | - Steven E Brenner
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720.,Department of Plant and Microbial Biology, University of California, Berkeley, California, 94720
| | - John-Marc Chandonia
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720
| |
Collapse
|