1
|
Monti M, Fiorentino J, Miltiadis-Vrachnos D, Bini G, Cotrufo T, Sanchez de Groot N, Armaos A, Tartaglia GG. catGRANULE 2.0: accurate predictions of liquid-liquid phase separating proteins at single amino acid resolution. Genome Biol 2025; 26:33. [PMID: 39979996 PMCID: PMC11843755 DOI: 10.1186/s13059-025-03497-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Accepted: 02/06/2025] [Indexed: 02/22/2025] Open
Abstract
Liquid-liquid phase separation (LLPS) enables the formation of membraneless organelles, essential for cellular organization and implicated in diseases. We introduce catGRANULE 2.0 ROBOT, an algorithm integrating physicochemical properties and AlphaFold-derived structural features to predict LLPS at single-amino-acid resolution. The method achieves high performance and reliably evaluates mutation effects on LLPS propensity, providing detailed predictions of how specific mutations enhance or inhibit phase separation. Supported by experimental validations, including microscopy data, it predicts LLPS across diverse organisms and cellular compartments, offering valuable insights into LLPS mechanisms and mutational impacts. The tool is freely available at https://tools.tartaglialab.com/catgranule2 and https://doi.org/10.5281/zenodo.14205831 .
Collapse
Affiliation(s)
- Michele Monti
- Center for Life Nano- & NeuroScience, Fondazione Istituto Italiano di Tecnologia, Viale Regina Elena 291, 00161, Rome, Italy
- RNA Systems Biology Lab, Centre for Human Technologies, Fondazione Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16152, Genoa, Italy
| | - Jonathan Fiorentino
- Center for Life Nano- & NeuroScience, Fondazione Istituto Italiano di Tecnologia, Viale Regina Elena 291, 00161, Rome, Italy
- RNA Systems Biology Lab, Centre for Human Technologies, Fondazione Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16152, Genoa, Italy
| | - Dimitrios Miltiadis-Vrachnos
- RNA Systems Biology Lab, Centre for Human Technologies, Fondazione Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16152, Genoa, Italy
- Department of Biology and Biotechnologies, University of Rome Sapienza, Piazzale Aldo Moro 5, 00185, Rome, Italy
| | - Giorgio Bini
- RNA Systems Biology Lab, Centre for Human Technologies, Fondazione Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16152, Genoa, Italy
- Physics Department, University of Genoa, Via Dodecaneso 33, 16146, Genoa, Italy
| | - Tiziana Cotrufo
- Departament de Biologia Cellular, Fisiologia i Immunologia, Universitat de Barcelona, Avenida Diagonal 643, 08028, Barcelona, Spain
| | - Natalia Sanchez de Groot
- Department of Biochemistry and Molecular Biology, Universitat Autònoma de Barcelona, Bellaterra (Cerdanyola del Vallès), 08193, Barcelona, Spain
| | - Alexandros Armaos
- Center for Life Nano- & NeuroScience, Fondazione Istituto Italiano di Tecnologia, Viale Regina Elena 291, 00161, Rome, Italy
- RNA Systems Biology Lab, Centre for Human Technologies, Fondazione Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16152, Genoa, Italy
| | - Gian Gaetano Tartaglia
- Center for Life Nano- & NeuroScience, Fondazione Istituto Italiano di Tecnologia, Viale Regina Elena 291, 00161, Rome, Italy.
- RNA Systems Biology Lab, Centre for Human Technologies, Fondazione Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16152, Genoa, Italy.
| |
Collapse
|
2
|
Ahmed Z, Shahzadi K, Jin Y, Li R, Momanyi BM, Zulfiqar H, Ning L, Lin H. Identification of RNA‐dependent liquid‐liquid phase separation proteins using an artificial intelligence strategy. Proteomics 2024; 24:e2400044. [PMID: 38824664 DOI: 10.1002/pmic.202400044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 05/03/2024] [Accepted: 05/21/2024] [Indexed: 06/04/2024]
Abstract
RNA-dependent liquid-liquid phase separation (LLPS) proteins play critical roles in cellular processes such as stress granule formation, DNA repair, RNA metabolism, germ cell development, and protein translation regulation. The abnormal behavior of these proteins is associated with various diseases, particularly neurodegenerative disorders like amyotrophic lateral sclerosis and frontotemporal dementia, making their identification crucial. However, conventional biochemistry-based methods for identifying these proteins are time-consuming and costly. Addressing this challenge, our study developed a robust computational model for their identification. We constructed a comprehensive dataset containing 137 RNA-dependent and 606 non-RNA-dependent LLPS protein sequences, which were then encoded using amino acid composition, composition of K-spaced amino acid pairs, Geary autocorrelation, and conjoined triad methods. Through a combination of correlation analysis, mutual information scoring, and incremental feature selection, we identified an optimal feature subset. This subset was used to train a random forest model, which achieved an accuracy of 90% when tested against an independent dataset. This study demonstrates the potential of computational methods as efficient alternatives for the identification of RNA-dependent LLPS proteins. To enhance the accessibility of the model, a user-centric web server has been established and can be accessed via the link: http://rpp.lin-group.cn.
Collapse
Affiliation(s)
- Zahoor Ahmed
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China
| | - Kiran Shahzadi
- Department of Biotechnology, Women University of Azad Jammu and Kashmir Bagh, Bagh, Azad Kashmir, Pakistan
| | - Yanting Jin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Rui Li
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Biffon Manyura Momanyi
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hasan Zulfiqar
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China
| | - Lin Ning
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
| | - Hao Lin
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
3
|
Frank M, Ni P, Jensen M, Gerstein MB. Leveraging a large language model to predict protein phase transition: A physical, multiscale, and interpretable approach. Proc Natl Acad Sci U S A 2024; 121:e2320510121. [PMID: 39110734 PMCID: PMC11331094 DOI: 10.1073/pnas.2320510121] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Accepted: 07/03/2024] [Indexed: 08/21/2024] Open
Abstract
Protein phase transitions (PPTs) from the soluble state to a dense liquid phase (forming droplets via liquid-liquid phase separation) or to solid aggregates (such as amyloids) play key roles in pathological processes associated with age-related diseases such as Alzheimer's disease. Several computational frameworks are capable of separately predicting the formation of droplets or amyloid aggregates based on protein sequences, yet none have tackled the prediction of both within a unified framework. Recently, large language models (LLMs) have exhibited great success in protein structure prediction; however, they have not yet been used for PPTs. Here, we fine-tune a LLM for predicting PPTs and demonstrate its usage in evaluating how sequence variants affect PPTs, an operation useful for protein design. In addition, we show its superior performance compared to suitable classical benchmarks. Due to the "black-box" nature of the LLM, we also employ a classical random forest model along with biophysical features to facilitate interpretation. Finally, focusing on Alzheimer's disease-related proteins, we demonstrate that greater aggregation is associated with reduced gene expression in Alzheimer's disease, suggesting a natural defense mechanism.
Collapse
Affiliation(s)
- Mor Frank
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT06520
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT06510
| | - Pengyu Ni
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT06520
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT06510
| | - Matthew Jensen
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT06520
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT06510
| | - Mark B. Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT06520
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT06510
- Department of Computer Science, Yale University, New Haven, CT06511
- Department of Statistics and Data Science, Yale University, New Haven, CT06511
| |
Collapse
|
4
|
Chin KY, Ishida S, Sasaki Y, Terayama K. Predicting condensate formation of protein and RNA under various environmental conditions. BMC Bioinformatics 2024; 25:143. [PMID: 38566033 PMCID: PMC10988968 DOI: 10.1186/s12859-024-05764-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 03/26/2024] [Indexed: 04/04/2024] Open
Abstract
BACKGROUND Liquid-liquid phase separation (LLPS) by biomolecules plays a central role in various biological phenomena and has garnered significant attention. The behavior of LLPS is strongly influenced by the characteristics of RNAs and environmental factors such as pH and temperature, as well as the properties of proteins. Recently, several databases recording LLPS-related biomolecules have been established, and prediction models of LLPS-related phenomena have been explored using these databases. However, a prediction model that concurrently considers proteins, RNAs, and experimental conditions has not been developed due to the limited information available from individual experiments in public databases. RESULTS To address this challenge, we have constructed a new dataset, RNAPSEC, which serves each experiment as a data point. This dataset was accomplished by manually collecting data from public literature. Utilizing RNAPSEC, we developed two prediction models that consider a protein, RNA, and experimental conditions. The first model can predict the LLPS behavior of a protein and RNA under given experimental conditions. The second model can predict the required conditions for a given protein and RNA to undergo LLPS. CONCLUSIONS RNAPSEC and these prediction models are expected to accelerate our understanding of the roles of proteins, RNAs, and environmental factors in LLPS.
Collapse
Affiliation(s)
- Ka Yin Chin
- Graduate School of Medical Life Science, Yokohama City University, 1-7-29, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Shoichi Ishida
- Graduate School of Medical Life Science, Yokohama City University, 1-7-29, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Yukio Sasaki
- Graduate School of Medical Life Science, Yokohama City University, 1-7-29, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Kei Terayama
- Graduate School of Medical Life Science, Yokohama City University, 1-7-29, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan.
- RIKEN Center for Advanced Intelligence Project, 1-4-1, Nihonbashi, Chuo-ku, Tokyo, 103-0027, Japan.
- MDX Research Center for Element Strategy, Tokyo Institute of Technology, 4259 Nagatsuta-cho, Midori-ku, Yokohama, Kanagawa, 226-8501, Japan.
| |
Collapse
|
5
|
Deng B, Wan G. Technologies for studying phase-separated biomolecular condensates. ADVANCED BIOTECHNOLOGY 2024; 2:10. [PMID: 39883284 PMCID: PMC11740866 DOI: 10.1007/s44307-024-00020-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 02/26/2024] [Accepted: 02/27/2024] [Indexed: 01/31/2025]
Abstract
Biomolecular condensates, also referred to as membrane-less organelles, function as fundamental organizational units within cells. These structures primarily form through liquid-liquid phase separation, a process in which proteins and nucleic acids segregate from the surrounding milieu to assemble into micron-scale structures. By concentrating functionally related proteins and nucleic acids, these biomolecular condensates regulate a myriad of essential cellular processes. To study these significant and intricate organelles, a range of technologies have been either adapted or developed. In this review, we provide an overview of the most utilized technologies in this rapidly evolving field. These include methods used to identify new condensates, explore their components, investigate their properties and spatiotemporal regulation, and understand the organizational principles governing these condensates. We also discuss potential challenges and review current advancements in applying the principles of biomolecular condensates to the development of new technologies, such as those in synthetic biology.
Collapse
Affiliation(s)
- Boyuan Deng
- Guangdong Provincial Key Laboratory of Pharmaceutical Functional Genes, MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, GuangZhou, GuangDong, China
| | - Gang Wan
- Guangdong Provincial Key Laboratory of Pharmaceutical Functional Genes, MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, GuangZhou, GuangDong, China.
| |
Collapse
|
6
|
Liao S, Zhang Y, Han X, Wang T, Wang X, Yan Q, Li Q, Qi Y, Zhang Z. A sequence-based model for identifying proteins undergoing liquid-liquid phase separation/forming fibril aggregates via machine learning. Protein Sci 2024; 33:e4927. [PMID: 38380794 PMCID: PMC10880426 DOI: 10.1002/pro.4927] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 01/27/2024] [Accepted: 01/30/2024] [Indexed: 02/22/2024]
Abstract
Liquid-liquid phase separation (LLPS) and the solid aggregate (also referred to as amyloid aggregates) formation of proteins, have gained significant attention in recent years due to their associations with various physiological and pathological processes in living organisms. The systematic investigation of the differences and connections between proteins undergoing LLPS and those forming amyloid fibrils at the sequence level has not yet been explored. In this research, we aim to address this gap by comparing the two types of proteins across 36 features using collected data available currently. The statistical comparison results indicate that, 24 of the selected 36 features exhibit significant difference between the two protein groups. A LLPS-Fibrils binary classification model built on these 24 features using random forest reveals that the fraction of intrinsically disordered residues (FIDR ) is identified as the most crucial feature. While, in the further three-class LLPS-Fibrils-Background classification model built on the same screened features, the composition of cysteine and that of leucine show more significant contributions than others. Through feature ablation analysis, we finally constructed a model FLFB (Feature-based LLPS-Fibrils-Background protein predictor) using six refined features, with an average area under the receiver operating characteristics of 0.83. This work indicates using sequence features and a machine learning model, proteins undergoing LLPS or forming amyloid fibrils can be identified.
Collapse
Affiliation(s)
- Shaofeng Liao
- College of Life SciencesUniversity of Chinese Academy of SciencesBeijingChina
| | - Yujun Zhang
- College of Life SciencesUniversity of Chinese Academy of SciencesBeijingChina
| | - Xinchen Han
- College of Life SciencesUniversity of Chinese Academy of SciencesBeijingChina
| | - Tinglan Wang
- College of Life SciencesUniversity of Chinese Academy of SciencesBeijingChina
| | - Xi Wang
- College of Life SciencesUniversity of Chinese Academy of SciencesBeijingChina
| | - Qinglin Yan
- College of Life SciencesUniversity of Chinese Academy of SciencesBeijingChina
| | - Qian Li
- College of Life SciencesUniversity of Chinese Academy of SciencesBeijingChina
| | - Yifei Qi
- School of PharmacyFudan UniversityShanghaiChina
| | - Zhuqing Zhang
- College of Life SciencesUniversity of Chinese Academy of SciencesBeijingChina
| |
Collapse
|
7
|
Saar KL, Qian D, Good LL, Morgunov AS, Collepardo-Guevara R, Best RB, Knowles TPJ. Theoretical and Data-Driven Approaches for Biomolecular Condensates. Chem Rev 2023; 123:8988-9009. [PMID: 37171907 PMCID: PMC10375482 DOI: 10.1021/acs.chemrev.2c00586] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Indexed: 05/14/2023]
Abstract
Biomolecular condensation processes are increasingly recognized as a fundamental mechanism that living cells use to organize biomolecules in time and space. These processes can lead to the formation of membraneless organelles that enable cells to perform distinct biochemical processes in controlled local environments, thereby supplying them with an additional degree of spatial control relative to that achieved by membrane-bound organelles. This fundamental importance of biomolecular condensation has motivated a quest to discover and understand the molecular mechanisms and determinants that drive and control this process. Within this molecular viewpoint, computational methods can provide a unique angle to studying biomolecular condensation processes by contributing the resolution and scale that are challenging to reach with experimental techniques alone. In this Review, we focus on three types of dry-lab approaches: theoretical methods, physics-driven simulations and data-driven machine learning methods. We review recent progress in using these tools for probing biomolecular condensation across all three fields and outline the key advantages and limitations of each of the approaches. We further discuss some of the key outstanding challenges that we foresee the community addressing next in order to develop a more complete picture of the molecular driving forces behind biomolecular condensation processes and their biological roles in health and disease.
Collapse
Affiliation(s)
- Kadi L. Saar
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Cambridge CB2 1EW, United Kingdom
- Transition
Bio Ltd., Cambridge, United Kingdom
| | - Daoyuan Qian
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Cambridge CB2 1EW, United Kingdom
| | - Lydia L. Good
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Cambridge CB2 1EW, United Kingdom
- Laboratory
of Chemical Physics, National Institute of Diabetes and Digestive
and Kidney Diseases, National Institutes
of Health, Bethesda, Maryland 20892, United States
| | - Alexey S. Morgunov
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Cambridge CB2 1EW, United Kingdom
| | - Rosana Collepardo-Guevara
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Cambridge CB2 1EW, United Kingdom
- Department
of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Robert B. Best
- Laboratory
of Chemical Physics, National Institute of Diabetes and Digestive
and Kidney Diseases, National Institutes
of Health, Bethesda, Maryland 20892, United States
| | - Tuomas P. J. Knowles
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Cambridge CB2 1EW, United Kingdom
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, Cambridge CB3 0HE, United Kingdom
| |
Collapse
|
8
|
Nadendla K, Simpson GG, Becher J, Journeaux T, Cabeza-Cabrerizo M, Bernardes GJL. Strategies for Conditional Regulation of Proteins. JACS AU 2023; 3:344-357. [PMID: 36873677 PMCID: PMC9975842 DOI: 10.1021/jacsau.2c00654] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 01/09/2023] [Accepted: 01/10/2023] [Indexed: 06/18/2023]
Abstract
Design of the next-generation of therapeutics, biosensors, and molecular tools for basic research requires that we bring protein activity under control. Each protein has unique properties, and therefore, it is critical to tailor the current techniques to develop new regulatory methods and regulate new proteins of interest (POIs). This perspective gives an overview of the widely used stimuli and synthetic and natural methods for conditional regulation of proteins.
Collapse
Affiliation(s)
- Karthik Nadendla
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, CB2 1EW, Cambridge, U.K.
| | - Grant G. Simpson
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, CB2 1EW, Cambridge, U.K.
| | - Julie Becher
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, CB2 1EW, Cambridge, U.K.
| | - Toby Journeaux
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, CB2 1EW, Cambridge, U.K.
| | - Mar Cabeza-Cabrerizo
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, CB2 1EW, Cambridge, U.K.
| | - Gonçalo J. L. Bernardes
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, CB2 1EW, Cambridge, U.K.
- Instituto
de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, 1649-028 Lisboa, Portugal
| |
Collapse
|
9
|
Chew PY, Reinhardt A. Phase diagrams-Why they matter and how to predict them. J Chem Phys 2023; 158:030902. [PMID: 36681642 DOI: 10.1063/5.0131028] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Understanding the thermodynamic stability and metastability of materials can help us to, for example, gauge whether crystalline polymorphs in pharmaceutical formulations are likely to be durable. It can also help us to design experimental routes to novel phases with potentially interesting properties. In this Perspective, we provide an overview of how thermodynamic phase behavior can be quantified both in computer simulations and machine-learning approaches to determine phase diagrams, as well as combinations of the two. We review the basic workflow of free-energy computations for condensed phases, including some practical implementation advice, ranging from the Frenkel-Ladd approach to thermodynamic integration and to direct-coexistence simulations. We illustrate the applications of such methods on a range of systems from materials chemistry to biological phase separation. Finally, we outline some challenges, questions, and practical applications of phase-diagram determination which we believe are likely to be possible to address in the near future using such state-of-the-art free-energy calculations, which may provide fundamental insight into separation processes using multicomponent solvents.
Collapse
Affiliation(s)
- Pin Yu Chew
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Aleks Reinhardt
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| |
Collapse
|
10
|
Roca-Martinez J, Lazar T, Gavalda-Garcia J, Bickel D, Pancsa R, Dixit B, Tzavella K, Ramasamy P, Sanchez-Fornaris M, Grau I, Vranken WF. Challenges in describing the conformation and dynamics of proteins with ambiguous behavior. Front Mol Biosci 2022; 9:959956. [PMID: 35992270 PMCID: PMC9382080 DOI: 10.3389/fmolb.2022.959956] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 06/27/2022] [Indexed: 11/13/2022] Open
Abstract
Traditionally, our understanding of how proteins operate and how evolution shapes them is based on two main data sources: the overall protein fold and the protein amino acid sequence. However, a significant part of the proteome shows highly dynamic and/or structurally ambiguous behavior, which cannot be correctly represented by the traditional fixed set of static coordinates. Representing such protein behaviors remains challenging and necessarily involves a complex interpretation of conformational states, including probabilistic descriptions. Relating protein dynamics and multiple conformations to their function as well as their physiological context (e.g., post-translational modifications and subcellular localization), therefore, remains elusive for much of the proteome, with studies to investigate the effect of protein dynamics relying heavily on computational models. We here investigate the possibility of delineating three classes of protein conformational behavior: order, disorder, and ambiguity. These definitions are explored based on three different datasets, using interpretable machine learning from a set of features, from AlphaFold2 to sequence-based predictions, to understand the overlap and differences between these datasets. This forms the basis for a discussion on the current limitations in describing the behavior of dynamic and ambiguous proteins.
Collapse
Affiliation(s)
- Joel Roca-Martinez
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
| | - Tamas Lazar
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- VIB-VUB Center for Structural Biology, Brussels, Belgium
| | - Jose Gavalda-Garcia
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
| | - David Bickel
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
| | - Rita Pancsa
- Research Centre for Natural Sciences, Institute of Enzymology, Budapest, Hungary
| | - Bhawna Dixit
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
- IBiTech-Biommeda, Universiteit Gent, Gent, Belgium
| | - Konstantina Tzavella
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
| | - Pathmanaban Ramasamy
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
- VIB-UGent Center for Medical Biotechnology, Universiteit Gent, Gent, Belgium
| | - Maite Sanchez-Fornaris
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
- Department of Computer Sciences, University of Camagüey, Camagüey, Cuba
| | - Isel Grau
- Information Systems, Eindhoven University of Technology, Eindhoven, Netherlands
| | - Wim F. Vranken
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, Belgium
| |
Collapse
|
11
|
Orlando G, Raimondi D, Duran-Romaña R, Moreau Y, Schymkowitz J, Rousseau F. PyUUL provides an interface between biological structures and deep learning algorithms. Nat Commun 2022; 13:961. [PMID: 35181656 PMCID: PMC8857184 DOI: 10.1038/s41467-022-28327-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Accepted: 01/18/2022] [Indexed: 11/09/2022] Open
Abstract
Structural bioinformatics suffers from the lack of interfaces connecting biological structures and machine learning methods, making the application of modern neural network architectures impractical. This negatively affects the development of structure-based bioinformatics methods, causing a bottleneck in biological research. Here we present PyUUL ( https://pyuul.readthedocs.io/ ), a library to translate biological structures into 3D tensors, allowing an out-of-the-box application of state-of-the-art deep learning algorithms. The library converts biological macromolecules to data structures typical of computer vision, such as voxels and point clouds, for which extensive machine learning research has been performed. Moreover, PyUUL allows an out-of-the box GPU and sparse calculation. Finally, we demonstrate how PyUUL can be used by researchers to address some typical bioinformatics problems, such as structure recognition and docking.
Collapse
Affiliation(s)
- Gabriele Orlando
- Switch Laboratory, VIB-KU Leuven Center for Brain and Disease Research, Herestraat 49, 3000, Leuven, Belgium
- Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000, Leuven, Belgium
| | | | - Ramon Duran-Romaña
- Switch Laboratory, VIB-KU Leuven Center for Brain and Disease Research, Herestraat 49, 3000, Leuven, Belgium
- Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000, Leuven, Belgium
| | | | - Joost Schymkowitz
- Switch Laboratory, VIB-KU Leuven Center for Brain and Disease Research, Herestraat 49, 3000, Leuven, Belgium.
- Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000, Leuven, Belgium.
| | - Frederic Rousseau
- Switch Laboratory, VIB-KU Leuven Center for Brain and Disease Research, Herestraat 49, 3000, Leuven, Belgium.
- Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000, Leuven, Belgium.
| |
Collapse
|
12
|
Wang X, Zhou X, Yan Q, Liao S, Tang W, Xu P, Gao Y, Li Q, Dou Z, Yang W, Huang B, Li J, Zhang Z. LLPSDB v2.0: an updated database of proteins undergoing liquid-liquid phase separation in vitro. Bioinformatics 2022; 38:2010-2014. [PMID: 35025997 PMCID: PMC8963276 DOI: 10.1093/bioinformatics/btac026] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2021] [Revised: 12/17/2021] [Accepted: 01/11/2022] [Indexed: 11/14/2022] Open
Abstract
Summary Emerging evidences have suggested that liquid–liquid phase separation (LLPS) of proteins plays a vital role both in a wide range of biological processes and in related diseases. Whether a protein undergoes phase separation not only is determined by the chemical and physical properties of biomolecule themselves, but also is regulated by environmental conditions such as temperature, ionic strength, pH, as well as volume excluded by other macromolecules. A web accessible database LLPSDB was developed recently by our group, in which all the proteins involved in LLPS in vitro as well as corresponding experimental conditions were curated comprehensively from published literatures. With the rapid increase of investigations in biomolecular LLPS and growing popularity of LLPSDB, we updated the database, and developed a new version LLPSDB v2.0. In comparison of the previously released version, more than double contents of data are curated, and a new class ‘Ambiguous system’ is added. In addition, the web interface is improved, such as that users can search the database by selecting option ‘phase separation status’ alone or combined with other options. We anticipate that this updated database will serve as a more comprehensive and helpful resource for users. Availability and implementation LLPSDB v2.0 is freely available at: http://bio-comp.org.cn/llpsdbv2. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xi Wang
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xiang Zhou
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Qinglin Yan
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Shaofeng Liao
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Wenqin Tang
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Peiyu Xu
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yangzhenyu Gao
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Qian Li
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zhihui Dou
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Weishan Yang
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Beifang Huang
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jinhong Li
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zhuqing Zhang
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| |
Collapse
|
13
|
Lindorff-Larsen K, Kragelund BB. On the potential of machine learning to examine the relationship between sequence, structure, dynamics and function of intrinsically disordered proteins. J Mol Biol 2021; 433:167196. [PMID: 34390736 DOI: 10.1016/j.jmb.2021.167196] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 08/03/2021] [Accepted: 08/04/2021] [Indexed: 11/29/2022]
Abstract
Intrinsically disordered proteins (IDPs) constitute a broad set of proteins with few uniting and many diverging properties. IDPs-and intrinsically disordered regions (IDRs) interspersed between folded domains-are generally characterized as having no persistent tertiary structure; instead they interconvert between a large number of different and often expanded structures. IDPs and IDRs are involved in an enormously wide range of biological functions and reveal novel mechanisms of interactions, and while they defy the common structure-function paradigm of folded proteins, their structural preferences and dynamics are important for their function. We here discuss open questions in the field of IDPs and IDRs, focusing on areas where machine learning and other computational methods play a role. We discuss computational methods aimed to predict transiently formed local and long-range structure, including methods for integrative structural biology. We discuss the many different ways in which IDPs and IDRs can bind to other molecules, both via short linear motifs, as well as in the formation of larger dynamic complexes such as biomolecular condensates. We discuss how experiments are providing insight into such complexes and may enable more accurate predictions. Finally, we discuss the role of IDPs in disease and how new methods are needed to interpret the mechanistic effects of genomic variants in IDPs.
Collapse
Affiliation(s)
- Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory & Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen. Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark.
| | - Birthe B Kragelund
- Structural Biology and NMR Laboratory & Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen. Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark.
| |
Collapse
|