1
|
Del Conte A, Camagni GF, Clementel D, Minervini G, Monzon AM, Ferrari C, Piovesan D, Tosatto SCE. RING 4.0: faster residue interaction networks with novel interaction types across over 35,000 different chemical structures. Nucleic Acids Res 2024:gkae337. [PMID: 38686797 DOI: 10.1093/nar/gkae337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 04/09/2024] [Accepted: 04/19/2024] [Indexed: 05/02/2024] Open
Abstract
Residue interaction networks (RINs) are a valuable approach for representing contacts in protein structures. RINs have been widely used in various research areas, including the analysis of mutation effects, domain-domain communication, catalytic activity, and molecular dynamics simulations. The RING server is a powerful tool to calculate non-covalent molecular interactions based on geometrical parameters, providing high-quality and reliable results. Here, we introduce RING 4.0, which includes significant enhancements for identifying both covalent and non-covalent bonds in protein structures. It now encompasses seven different interaction types, with the addition of π-hydrogen, halogen bonds and metal ion coordination sites. The definitions of all available bond types have also been refined and RING can now process the complete PDB chemical component dictionary (over 35000 different molecules) which provides atom names and covalent connectivity information for all known ligands. Optimization of the software has improved execution time by an order of magnitude. The RING web server has been redesigned to provide a more engaging and interactive user experience, incorporating new visualization tools. Users can now visualize all types of interactions simultaneously in the structure viewer and network component. The web server, including extensive help and tutorials, is available from URL: https://ring.biocomputingup.it/.
Collapse
Affiliation(s)
- Alessio Del Conte
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Giorgia F Camagni
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Damiano Clementel
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | | | | | - Carlo Ferrari
- Department of Information Engineering, University of Padova, Padova, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | | |
Collapse
|
2
|
Quaglia F, Chasapi A, Nugnes MV, Aspromonte MC, Leonardi E, Piovesan D, Tosatto SCE. Best practices for the manual curation of intrinsically disordered proteins in DisProt. Database (Oxford) 2024; 2024:baae009. [PMID: 38507044 PMCID: PMC10953794 DOI: 10.1093/database/baae009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 12/18/2023] [Accepted: 02/03/2024] [Indexed: 03/22/2024]
Abstract
The DisProt database is a resource containing manually curated data on experimentally validated intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) from the literature. Developed in 2005, its primary goal was to collect structural and functional information into proteins that lack a fixed three-dimensional structure. Today, DisProt has evolved into a major repository that not only collects experimental data but also contributes to our understanding of the IDPs/IDRs roles in various biological processes, such as autophagy or the life cycle mechanisms in viruses or their involvement in diseases (such as cancer and neurodevelopmental disorders). DisProt offers detailed information on the structural states of IDPs/IDRs, including state transitions, interactions and their functions, all provided as curated annotations. One of the central activities of DisProt is the meticulous curation of experimental data from the literature. For this reason, to ensure that every expert and volunteer curator possesses the requisite knowledge for data evaluation, collection and integration, training courses and curation materials are available. However, biocuration guidelines concur on the importance of developing robust guidelines that not only provide critical information about data consistency but also ensure data acquisition.This guideline aims to provide both biocurators and external users with best practices for manually curating IDPs and IDRs in DisProt. It describes every step of the literature curation process and provides use cases of IDP curation within DisProt. Database URL: https://disprot.org/.
Collapse
Affiliation(s)
- Federica Quaglia
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), Via Giovanni Amendola, 122/O, Bari 70126, Italy
- Department of Biomedical Sciences, University of Padova, Via Ugo Bassi, 58/B, Padova 35131, Italy
| | - Anastasia Chasapi
- Biological Computation & Process Laboratory, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas, 6th km Harilaou - Thermis 57001 Thermi, Thessalonica 57001, Greece
| | - Maria Victoria Nugnes
- Department of Biomedical Sciences, University of Padova, Via Ugo Bassi, 58/B, Padova 35131, Italy
| | | | - Emanuela Leonardi
- Department of Biomedical Sciences, University of Padova, Via Ugo Bassi, 58/B, Padova 35131, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Via Ugo Bassi, 58/B, Padova 35131, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova, Via Ugo Bassi, 58/B, Padova 35131, Italy
| |
Collapse
|
3
|
Bellanda M, Damulewicz M, Zambelli B, Costanzi E, Gregoris F, Mammi S, Tosatto SCE, Costa R, Minervini G, Mazzotta GM. A PDZ scaffolding/CaM-mediated pathway in Cryptochrome signaling. Protein Sci 2024; 33:e4914. [PMID: 38358255 PMCID: PMC10868427 DOI: 10.1002/pro.4914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 12/12/2023] [Accepted: 01/13/2024] [Indexed: 02/16/2024]
Abstract
Cryptochromes are cardinal constituents of the circadian clock, which orchestrates daily physiological rhythms in living organisms. A growing body of evidence points to their participation in pathways that have not traditionally been associated with circadian clock regulation, implying that cryptochromes may be subject to modulation by multiple signaling mechanisms. In this study, we demonstrate that human CRY2 (hCRY2) forms a complex with the large, modular scaffolding protein known as Multi-PDZ Domain Protein 1 (MUPP1). This interaction is facilitated by the calcium-binding protein Calmodulin (CaM) in a calcium-dependent manner. Our findings suggest a novel cooperative mechanism for the regulation of mammalian cryptochromes, mediated by calcium ions (Ca2+ ) and CaM. We propose that this Ca2+ /CaM-mediated signaling pathway may be an evolutionarily conserved mechanism that has been maintained from Drosophila to mammals, most likely in relation to its potential role in the broader context of cryptochrome function and regulation. Further, the understanding of cryptochrome interactions with other proteins and signaling pathways could lead to a better definition of its role within the intricate network of molecular interactions that govern circadian rhythms.
Collapse
Affiliation(s)
| | - Milena Damulewicz
- Department of Cell Biology and ImagingJagiellonian UniversityKrakówPoland
| | - Barbara Zambelli
- Department of Pharmacy and BiotechnologyUniversity of BolognaBolognaItaly
| | - Elisa Costanzi
- Department of Chemical SciencesUniversity of PadovaPadovaItaly
| | | | - Stefano Mammi
- Department of Chemical SciencesUniversity of PadovaPadovaItaly
| | | | - Rodolfo Costa
- Department of BiologyUniversity of PadovaPadovaItaly
- Institute of Neuroscience, National Research Council of Italy (CNR)PadovaItaly
- Chronobiology Section, Faculty of Health and Medical SciencesUniversity of SurreyGuildfordUK
| | | | | |
Collapse
|
4
|
Jain S, Bakolitsa C, Brenner SE, Radivojac P, Moult J, Repo S, Hoskins RA, Andreoletti G, Barsky D, Chellapan A, Chu H, Dabbiru N, Kollipara NK, Ly M, Neumann AJ, Pal LR, Odell E, Pandey G, Peters-Petrulewicz RC, Srinivasan R, Yee SF, Yeleswarapu SJ, Zuhl M, Adebali O, Patra A, Beer MA, Hosur R, Peng J, Bernard BM, Berry M, Dong S, Boyle AP, Adhikari A, Chen J, Hu Z, Wang R, Wang Y, Miller M, Wang Y, Bromberg Y, Turina P, Capriotti E, Han JJ, Ozturk K, Carter H, Babbi G, Bovo S, Di Lena P, Martelli PL, Savojardo C, Casadio R, Cline MS, De Baets G, Bonache S, Díez O, Gutiérrez-Enríquez S, Fernández A, Montalban G, Ootes L, Özkan S, Padilla N, Riera C, De la Cruz X, Diekhans M, Huwe PJ, Wei Q, Xu Q, Dunbrack RL, Gotea V, Elnitski L, Margolin G, Fariselli P, Kulakovskiy IV, Makeev VJ, Penzar DD, Vorontsov IE, Favorov AV, Forman JR, Hasenahuer M, Fornasari MS, Parisi G, Avsec Z, Çelik MH, Nguyen TYD, Gagneur J, Shi FY, Edwards MD, Guo Y, Tian K, Zeng H, Gifford DK, Göke J, Zaucha J, Gough J, Ritchie GRS, Frankish A, Mudge JM, Harrow J, Young EL, Yu Y, Huff CD, Murakami K, Nagai Y, Imanishi T, Mungall CJ, Jacobsen JOB, Kim D, Jeong CS, Jones DT, Li MJ, Guthrie VB, Bhattacharya R, Chen YC, Douville C, Fan J, Kim D, Masica D, Niknafs N, Sengupta S, Tokheim C, Turner TN, Yeo HTG, Karchin R, Shin S, Welch R, Keles S, Li Y, Kellis M, Corbi-Verge C, Strokach AV, Kim PM, Klein TE, Mohan R, Sinnott-Armstrong NA, Wainberg M, Kundaje A, Gonzaludo N, Mak ACY, Chhibber A, Lam HYK, Dahary D, Fishilevich S, Lancet D, Lee I, Bachman B, Katsonis P, Lua RC, Wilson SJ, Lichtarge O, Bhat RR, Sundaram L, Viswanath V, Bellazzi R, Nicora G, Rizzo E, Limongelli I, Mezlini AM, Chang R, Kim S, Lai C, O’Connor R, Topper S, van den Akker J, Zhou AY, Zimmer AD, Mishne G, Bergquist TR, Breese MR, Guerrero RF, Jiang Y, Kiga N, Li B, Mort M, Pagel KA, Pejaver V, Stamboulian MH, Thusberg J, Mooney SD, Teerakulkittipong N, Cao C, Kundu K, Yin Y, Yu CH, Kleyman M, Lin CF, Stackpole M, Mount SM, Eraslan G, Mueller NS, Naito T, Rao AR, Azaria JR, Brodie A, Ofran Y, Garg A, Pal D, Hawkins-Hooker A, Kenlay H, Reid J, Mucaki EJ, Rogan PK, Schwarz JM, Searls DB, Lee GR, Seok C, Krämer A, Shah S, Huang CV, Kirsch JF, Shatsky M, Cao Y, Chen H, Karimi M, Moronfoye O, Sun Y, Shen Y, Shigeta R, Ford CT, Nodzak C, Uppal A, Shi X, Joseph T, Kotte S, Rana S, Rao A, Saipradeep VG, Sivadasan N, Sunderam U, Stanke M, Su A, Adzhubey I, Jordan DM, Sunyaev S, Rousseau F, Schymkowitz J, Van Durme J, Tavtigian SV, Carraro M, Giollo M, Tosatto SCE, Adato O, Carmel L, Cohen NE, Fenesh T, Holtzer T, Juven-Gershon T, Unger R, Niroula A, Olatubosun A, Väliaho J, Yang Y, Vihinen M, Wahl ME, Chang B, Chong KC, Hu I, Sun R, Wu WKK, Xia X, Zee BC, Wang MH, Wang M, Wu C, Lu Y, Chen K, Yang Y, Yates CM, Kreimer A, Yan Z, Yosef N, Zhao H, Wei Z, Yao Z, Zhou F, Folkman L, Zhou Y, Daneshjou R, Altman RB, Inoue F, Ahituv N, Arkin AP, Lovisa F, Bonvini P, Bowdin S, Gianni S, Mantuano E, Minicozzi V, Novak L, Pasquo A, Pastore A, Petrosino M, Puglisi R, Toto A, Veneziano L, Chiaraluce R, Ball MP, Bobe JR, Church GM, Consalvi V, Cooper DN, Buckley BA, Sheridan MB, Cutting GR, Scaini MC, Cygan KJ, Fredericks AM, Glidden DT, Neil C, Rhine CL, Fairbrother WG, Alontaga AY, Fenton AW, Matreyek KA, Starita LM, Fowler DM, Löscher BS, Franke A, Adamson SI, Graveley BR, Gray JW, Malloy MJ, Kane JP, Kousi M, Katsanis N, Schubach M, Kircher M, Mak ACY, Tang PLF, Kwok PY, Lathrop RH, Clark WT, Yu GK, LeBowitz JH, Benedicenti F, Bettella E, Bigoni S, Cesca F, Mammi I, Marino-Buslje C, Milani D, Peron A, Polli R, Sartori S, Stanzial F, Toldo I, Turolla L, Aspromonte MC, Bellini M, Leonardi E, Liu X, Marshall C, McCombie WR, Elefanti L, Menin C, Meyn MS, Murgia A, Nadeau KCY, Neuhausen SL, Nussbaum RL, Pirooznia M, Potash JB, Dimster-Denk DF, Rine JD, Sanford JR, Snyder M, Cote AG, Sun S, Verby MW, Weile J, Roth FP, Tewhey R, Sabeti PC, Campagna J, Refaat MM, Wojciak J, Grubb S, Schmitt N, Shendure J, Spurdle AB, Stavropoulos DJ, Walton NA, Zandi PP, Ziv E, Burke W, Chen F, Carr LR, Martinez S, Paik J, Harris-Wai J, Yarborough M, Fullerton SM, Koenig BA, McInnes G, Shigaki D, Chandonia JM, Furutsuki M, Kasak L, Yu C, Chen R, Friedberg I, Getz GA, Cong Q, Kinch LN, Zhang J, Grishin NV, Voskanian A, Kann MG, Tran E, Ioannidis NM, Hunter JM, Udani R, Cai B, Morgan AA, Sokolov A, Stuart JM, Minervini G, Monzon AM, Batzoglou S, Butte AJ, Greenblatt MS, Hart RK, Hernandez R, Hubbard TJP, Kahn S, O’Donnell-Luria A, Ng PC, Shon J, Veltman J, Zook JM. CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods. Genome Biol 2024; 25:53. [PMID: 38389099 PMCID: PMC10882881 DOI: 10.1186/s13059-023-03113-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2023] [Accepted: 11/17/2023] [Indexed: 02/24/2024] Open
Abstract
BACKGROUND The Critical Assessment of Genome Interpretation (CAGI) aims to advance the state-of-the-art for computational prediction of genetic variant impact, particularly where relevant to disease. The five complete editions of the CAGI community experiment comprised 50 challenges, in which participants made blind predictions of phenotypes from genetic data, and these were evaluated by independent assessors. RESULTS Performance was particularly strong for clinical pathogenic variants, including some difficult-to-diagnose cases, and extends to interpretation of cancer-related variants. Missense variant interpretation methods were able to estimate biochemical effects with increasing accuracy. Assessment of methods for regulatory variants and complex trait disease risk was less definitive and indicates performance potentially suitable for auxiliary use in the clinic. CONCLUSIONS Results show that while current methods are imperfect, they have major utility for research and clinical applications. Emerging methods and increasingly large, robust datasets for training and assessment promise further progress ahead.
Collapse
|
5
|
Aspromonte MC, Nugnes MV, Quaglia F, Bouharoua A, Tosatto SCE, Piovesan D. DisProt in 2024: improving function annotation of intrinsically disordered proteins. Nucleic Acids Res 2024; 52:D434-D441. [PMID: 37904585 PMCID: PMC10767923 DOI: 10.1093/nar/gkad928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 10/05/2023] [Accepted: 10/10/2023] [Indexed: 11/01/2023] Open
Abstract
DisProt (URL: https://disprot.org) is the gold standard database for intrinsically disordered proteins and regions, providing valuable information about their functions. The latest version of DisProt brings significant advancements, including a broader representation of functions and an enhanced curation process. These improvements aim to increase both the quality of annotations and their coverage at the sequence level. Higher coverage has been achieved by adopting additional evidence codes. Quality of annotations has been improved by systematically applying Minimum Information About Disorder Experiments (MIADE) principles and reporting all the details of the experimental setup that could potentially influence the structural state of a protein. The DisProt database now includes new thematic datasets and has expanded the adoption of Gene Ontology terms, resulting in an extensive functional repertoire which is automatically propagated to UniProtKB. Finally, we show that DisProt's curated annotations strongly correlate with disorder predictions inferred from AlphaFold2 pLDDT (predicted Local Distance Difference Test) confidence scores. This comparison highlights the utility of DisProt in explaining apparent uncertainty of certain well-defined predicted structures, which often correspond to folding-upon-binding fragments. Overall, DisProt serves as a comprehensive resource, combining experimental evidence of disorder information to enhance our understanding of intrinsically disordered proteins and their functional implications.
Collapse
Affiliation(s)
| | | | - Federica Quaglia
- Department of Biomedical Sciences, University of Padova, Padova, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), Bari, Italy
| | - Adel Bouharoua
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | | | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| |
Collapse
|
6
|
Ghafouri H, Lazar T, Del Conte A, Tenorio Ku LG, Tompa P, Tosatto SCE, Monzon AM. PED in 2024: improving the community deposition of structural ensembles for intrinsically disordered proteins. Nucleic Acids Res 2024; 52:D536-D544. [PMID: 37904608 PMCID: PMC10767937 DOI: 10.1093/nar/gkad947] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/10/2023] [Accepted: 10/13/2023] [Indexed: 11/01/2023] Open
Abstract
The Protein Ensemble Database (PED) (URL: https://proteinensemble.org) is the primary resource for depositing structural ensembles of intrinsically disordered proteins. This updated version of PED reflects advancements in the field, denoting a continual expansion with a total of 461 entries and 538 ensembles, including those generated without explicit experimental data through novel machine learning (ML) techniques. With this significant increment in the number of ensembles, a few yet-unprecedented new entries entered the database, including those also determined or refined by electron paramagnetic resonance or circular dichroism data. In addition, PED was enriched with several new features, including a novel deposition service, improved user interface, new database cross-referencing options and integration with the 3D-Beacons network-all representing efforts to improve the FAIRness of the database. Foreseeably, PED will keep growing in size and expanding with new types of ensembles generated by accurate and fast ML-based generative models and coarse-grained simulations. Therefore, among future efforts, priority will be given to further develop the database to be compatible with ensembles modeled at a coarse-grained level.
Collapse
Affiliation(s)
| | - Tamas Lazar
- VIB-VUB Center for Structural Biology, Vlaams Instituut voor Biotechnologie (VIB), Brussels, Belgium
- Structural Biology Brussels, Department of Bioengineering, Vrije Universiteit Brussel (VUB), Brussels, Belgium
| | - Alessio Del Conte
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | | | - Peter Tompa
- VIB-VUB Center for Structural Biology, Vlaams Instituut voor Biotechnologie (VIB), Brussels, Belgium
- Structural Biology Brussels, Department of Bioengineering, Vrije Universiteit Brussel (VUB), Brussels, Belgium
- Institute of Enzymology, Research Centre for Natural Sciences (RCNS), Budapest, Hungary
| | | | | |
Collapse
|
7
|
Monzon AM, Arrías PN, Elofsson A, Mier P, Andrade-Navarro MA, Bevilacqua M, Clementel D, Bateman A, Hirsh L, Fornasari MS, Parisi G, Piovesan D, Kajava AV, Tosatto SCE. A STRP-ed definition of Structured Tandem Repeats in Proteins. J Struct Biol 2023; 215:108023. [PMID: 37652396 DOI: 10.1016/j.jsb.2023.108023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 07/31/2023] [Accepted: 08/28/2023] [Indexed: 09/02/2023]
Abstract
Tandem Repeat Proteins (TRPs) are a class of proteins with repetitive amino acid sequences that have been studied extensively for over two decades. Different features at the level of sequence, structure, function and evolution have been attributed to them by various authors. And yet many of its salient features appear only when looking at specific subclasses of protein tandem repeats. Here, we attempt to rationalize the existing knowledge on Tandem Repeat Proteins (TRPs) by pointing out several dichotomies. The emerging picture is more nuanced than generally assumed and allows us to draw some boundaries of what is not a "proper" TRP. We conclude with an operational definition of a specific subset, which we have denominated STRPs (Structural Tandem Repeat Proteins), which separates a subclass of tandem repeats with distinctive features from several other less well-defined types of repeats. We believe that this definition will help researchers in the field to better characterize the biological meaning of this large yet largely understudied group of proteins.
Collapse
Affiliation(s)
- Alexander Miguel Monzon
- Dept. of Information Engineering, University of Padova, via Giovanni Gradenigo 6/B, 35131 Padova, Italy
| | - Paula Nazarena Arrías
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Arne Elofsson
- Dept. of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Tomtebodavägen 23, 171 21 Solna, Sweden
| | - Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Martina Bevilacqua
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Damiano Clementel
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Layla Hirsh
- Dept. of Engineering, Faculty of Science and Engineering, Pontifical Catholic University of Peru, Av. Universitaria 1801 San Miguel, Lima 32, Lima, Peru
| | - Maria Silvina Fornasari
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina
| | - Gustavo Parisi
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina
| | - Damiano Piovesan
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier (CRBM), UMR 5237 CNRS, Université Montpellier, 1919 Route de Mende, Cedex 5, 34293 Montpellier, France
| | - Silvio C E Tosatto
- Dept. of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy.
| |
Collapse
|
8
|
Conte AD, Mehdiabadi M, Bouhraoua A, Miguel Monzon A, Tosatto SCE, Piovesan D. Critical assessment of protein intrinsic disorder prediction (CAID) - Results of round 2. Proteins 2023; 91:1925-1934. [PMID: 37621223 DOI: 10.1002/prot.26582] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 06/22/2023] [Accepted: 08/08/2023] [Indexed: 08/26/2023]
Abstract
Protein intrinsic disorder (ID) is a complex and context-dependent phenomenon that covers a continuum between fully disordered states and folded states with long dynamic regions. The lack of a ground truth that fits all ID flavors and the potential for order-to-disorder transitions depending on specific conditions makes ID prediction challenging. The CAID2 challenge aimed to evaluate the performance of different prediction methods across different benchmarks, leveraging the annotation provided by the DisProt database, which stores the coordinates of ID regions when there is experimental evidence in the literature. The CAID2 challenge demonstrated varying performance of different prediction methods across different benchmarks, highlighting the need for continued development of more versatile and efficient prediction software. Depending on the application, researchers may need to balance performance with execution time when selecting a predictor. Methods based on AlphaFold2 seem to be good ID predictors but they are better at detecting absence of order rather than ID regions as defined in DisProt. The CAID2 predictors can be freely used through the CAID Prediction Portal, and CAID has been integrated into OpenEBench, which will become the official platform for running future CAID challenges.
Collapse
Affiliation(s)
- Alessio Del Conte
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Mahta Mehdiabadi
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Adel Bouhraoua
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | | | | | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| |
Collapse
|
9
|
Arrías PN, Monzon AM, Clementel D, Mozaffari S, Piovesan D, Kajava AV, Tosatto SCE. The repetitive structure of DNA clamps: An overlooked protein tandem repeat. J Struct Biol 2023; 215:108001. [PMID: 37467824 DOI: 10.1016/j.jsb.2023.108001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 07/12/2023] [Accepted: 07/16/2023] [Indexed: 07/21/2023]
Abstract
Structured tandem repeats proteins (STRPs) are a specific kind of tandem repeat proteins characterized by a modular and repetitive three-dimensional structure arrangement. The majority of STRPs adopt solenoid structures, but with the increasing availability of experimental structures and high-quality predicted structural models, more STRP folds can be characterized. Here, we describe "Box repeats", an overlooked STRP fold present in the DNA sliding clamp processivity factors, which has eluded classification although structural data has been available since the late 1990s. Each Box repeat is a β⍺βββ module of about 60 residues, which forms a class V "beads-on-a-string" type STRP. The number of repeats present in processivity factors is organism dependent. Monomers of PCNA proteins in both Archaea and Eukarya have 4 repeats, while the monomers of bacterial beta-sliding clamps have 6 repeats. This new repeat fold has been added to the RepeatsDB database, which now provides structural annotation for 66 Box repeat proteins belonging to different organisms, including viruses.
Collapse
Affiliation(s)
- Paula Nazarena Arrías
- Department of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Alexander Miguel Monzon
- Department of Information Engineering, University of Padova, via Giovanni Gradenigo 6/B, 35131 Padova, Italy
| | - Damiano Clementel
- Department of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Soroush Mozaffari
- Department of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier (CRBM), UMR 5237 CNRS, Université Montpellier, 1919 Route de Mende, Cedex 5, 34293 Montpellier, France
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova, via U. Bassi 58/b, 35121 Padova, Italy.
| |
Collapse
|
10
|
Mészáros B, Hatos A, Palopoli N, Quaglia F, Salladini E, Van Roey K, Arthanari H, Dosztányi Z, Felli IC, Fischer PD, Hoch JC, Jeffries CM, Longhi S, Maiani E, Orchard S, Pancsa R, Papaleo E, Pierattelli R, Piovesan D, Pritisanac I, Tenorio L, Viennet T, Tompa P, Vranken W, Tosatto SCE, Davey NE. Minimum information guidelines for experiments structurally characterizing intrinsically disordered protein regions. Nat Methods 2023; 20:1291-1303. [PMID: 37400558 DOI: 10.1038/s41592-023-01915-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 05/18/2023] [Indexed: 07/05/2023]
Abstract
An unambiguous description of an experiment, and the subsequent biological observation, is vital for accurate data interpretation. Minimum information guidelines define the fundamental complement of data that can support an unambiguous conclusion based on experimental observations. We present the Minimum Information About Disorder Experiments (MIADE) guidelines to define the parameters required for the wider scientific community to understand the findings of an experiment studying the structural properties of intrinsically disordered regions (IDRs). MIADE guidelines provide recommendations for data producers to describe the results of their experiments at source, for curators to annotate experimental data to community resources and for database developers maintaining community resources to disseminate the data. The MIADE guidelines will improve the interpretability of experimental results for data consumers, facilitate direct data submission, simplify data curation, improve data exchange among repositories and standardize the dissemination of the key metadata on an IDR experiment by IDR data sources.
Collapse
Affiliation(s)
- Bálint Mészáros
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Department of Structural Biology and Center for Data Driven Discovery, St Jude Children's Research Hospital, Memphis, TN, USA
| | - András Hatos
- Department of Biomedical Sciences, University of Padova, Padova, Italy
- Department of Oncology, Lausanne University Hospital, Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Swiss Cancer Center Leman, Lausanne, Switzerland
| | - Nicolas Palopoli
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes - CONICET, Bernal, Buenos Aires, Argentina
| | - Federica Quaglia
- Department of Biomedical Sciences, University of Padova, Padova, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), Bari, Italy
| | - Edoardo Salladini
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Kim Van Roey
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
| | - Haribabu Arthanari
- Harvard Medical School (HMS), Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute (DFCI), Boston, MA, USA
| | | | - Isabella C Felli
- Department of Chemistry 'Ugo Schiff' and Magnetic Resonance Center, University of Florence, Sesto Fiorentino (Florence), Italy
| | - Patrick D Fischer
- Harvard Medical School (HMS), Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute (DFCI), Boston, MA, USA
| | - Jeffrey C Hoch
- Department of Molecular Biology and Biophysics, UConn Health, Farmington, CT, USA
| | - Cy M Jeffries
- European Molecular Biology Laboratory (EMBL), Hamburg Unit, c/o Deutsches Elektronen-Synchrotron, Hamburg, Germany
| | - Sonia Longhi
- Laboratory Architecture et Fonction des Macromolécules Biologiques (AFMB), UMR 7257, Aix Marseille University and Centre National de la Recherche Scientifique (CNRS), Marseille, France
| | - Emiliano Maiani
- Cancer Structural Biology, Danish Cancer Society Research Center, Copenhagen, Denmark
- UniCamillus - Saint Camillus International University of Health and Medical Sciences, Rome, Italy
| | - Sandra Orchard
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, UK
| | - Rita Pancsa
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest, Hungary
| | - Elena Papaleo
- Cancer Structural Biology, Danish Cancer Society Research Center, Copenhagen, Denmark
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and Technology, Technical University of Denmark, Lyngby, Denmark
| | - Roberta Pierattelli
- Department of Chemistry 'Ugo Schiff' and Magnetic Resonance Center, University of Florence, Sesto Fiorentino (Florence), Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Iva Pritisanac
- Hospital for Sick Children, Toronto, Ontario, Canada
- Medical University of Graz, Graz, Austria
| | - Luiggi Tenorio
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Thibault Viennet
- Harvard Medical School (HMS), Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute (DFCI), Boston, MA, USA
| | - Peter Tompa
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest, Hungary
- VIB-VUB Center for Structural Biology, Brussels, Belgium
- Structural Biology Brussels, Department of Bioengineering Sciences, Vrije Universiteit Brussel, Brussels, Belgium
| | - Wim Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Structural Biology Brussels, Department of Bioengineering Sciences, Vrije Universiteit Brussel, Brussels, Belgium
| | | | - Norman E Davey
- Division Of Cancer Biology, Institute of Cancer Research, Chester Beatty Laboratories, Chelsea, London, UK.
| |
Collapse
|
11
|
Aspromonte MC, Conte AD, Zhu S, Tan W, Shen Y, Zhang Y, Li Q, Wang MH, Babbi G, Bovo S, Martelli PL, Casadio R, Althagafi A, Toonsi S, Kulmanov M, Hoehndorf R, Katsonis P, Williams A, Lichtarge O, Xian S, Surento W, Pejaver V, Mooney SD, Sunderam U, Srinivasan R, Murgia A, Piovesan D, Tosatto SCE, Leonardi E. CAGI6 ID-Challenge: Assessment of phenotype and variant predictions in 415 children with Neurodevelopmental Disorders (NDDs). Res Sq 2023:rs.3.rs-3209168. [PMID: 37577579 PMCID: PMC10418555 DOI: 10.21203/rs.3.rs-3209168/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
In the context of the Critical Assessment of the Genome Interpretation, 6th edition (CAGI6), the Genetics of Neurodevelopmental Disorders Lab in Padua proposed a new ID-challenge to give the opportunity of developing computational methods for predicting patient's phenotype and the causal variants. Eight research teams and 30 models had access to the phenotype details and real genetic data, based on the sequences of 74 genes (VCF format) in 415 pediatric patients affected by Neurodevelopmental Disorders (NDDs). NDDs are clinically and genetically heterogeneous conditions, with onset in infant age. In this study we evaluate the ability and accuracy of computational methods to predict comorbid phenotypes based on clinical features described in each patient and causal variants. Finally, we asked to develop a method to find new possible genetic causes for patients without a genetic diagnosis. As already done for the CAGI5, seven clinical features (ID, ASD, ataxia, epilepsy, microcephaly, macrocephaly, hypotonia), and variants (causative, putative pathogenic and contributing factors) were provided. Considering the overall clinical manifestation of our cohort, we give out the variant data and phenotypic traits of the 150 patients from CAGI5 ID-Challenge as training and validation for the prediction methods development.
Collapse
Affiliation(s)
| | | | - Shaowen Zhu
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843
| | - Wuwei Tan
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843
| | | | - Qi Li
- CUHK Shenzhen Research Institute, Shenzhen
| | | | - Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna
| | - Samuele Bovo
- Department of Agricultural and Food Sciences, University of Bologna
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna
| | - Azza Althagafi
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23
| | - Sumyyah Toonsi
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23
| | - Maxat Kulmanov
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030
| | - Amanda Williams
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030
| | - Su Xian
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA 98195
| | - Wesley Surento
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA 98195
| | - Vikas Pejaver
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA 98195
| | - Uma Sunderam
- Innovation Labs, Tata Consultancy Services, Hyderabad
| | | | | | | | | | | |
Collapse
|
12
|
Del Conte A, Bouhraoua A, Mehdiabadi M, Clementel D, Monzon AM, Tosatto SCE, Piovesan D. CAID prediction portal: a comprehensive service for predicting intrinsic disorder and binding regions in proteins. Nucleic Acids Res 2023:7184153. [PMID: 37246642 DOI: 10.1093/nar/gkad430] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 04/26/2023] [Accepted: 05/10/2023] [Indexed: 05/30/2023] Open
Abstract
Intrinsic disorder (ID) in proteins is well-established in structural biology, with increasing evidence for its involvement in essential biological processes. As measuring dynamic ID behavior experimentally on a large scale remains difficult, scores of published ID predictors have tried to fill this gap. Unfortunately, their heterogeneity makes it difficult to compare performance, confounding biologists wanting to make an informed choice. To address this issue, the Critical Assessment of protein Intrinsic Disorder (CAID) benchmarks predictors for ID and binding regions as a community blind-test in a standardized computing environment. Here we present the CAID Prediction Portal, a web server executing all CAID methods on user-defined sequences. The server generates standardized output and facilitates comparison between methods, producing a consensus prediction highlighting high-confidence ID regions. The website contains extensive documentation explaining the meaning of different CAID statistics and providing a brief description of all methods. Predictor output is visualized in an interactive feature viewer and made available for download in a single table, with the option to recover previous sessions via a private dashboard. The CAID Prediction Portal is a valuable resource for researchers interested in studying ID in proteins. The server is available at the URL: https://caid.idpcentral.org.
Collapse
Affiliation(s)
- Alessio Del Conte
- Department of Biomedical Sciences, University of Padova, via Ugo Bassi 58b, 35121Padova, Italy
| | - Adel Bouhraoua
- Department of Biomedical Sciences, University of Padova, via Ugo Bassi 58b, 35121Padova, Italy
| | - Mahta Mehdiabadi
- Department of Biomedical Sciences, University of Padova, via Ugo Bassi 58b, 35121Padova, Italy
| | - Damiano Clementel
- Department of Biomedical Sciences, University of Padova, via Ugo Bassi 58b, 35121Padova, Italy
| | - Alexander Miguel Monzon
- Department of Information Engineering, University of Padova, via Giovanni Gradenigo 6/B, 35131Padova, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova, via Ugo Bassi 58b, 35121Padova, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, via Ugo Bassi 58b, 35121Padova, Italy
| |
Collapse
|
13
|
Martínez-Pérez E, Pajkos M, Tosatto SCE, Gibson TJ, Dosztanyi Z, Marino-Buslje C. Pipeline for transferring annotations between proteins beyond globular domains. Protein Sci 2023:e4655. [PMID: 37167423 DOI: 10.1002/pro.4655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 04/09/2023] [Accepted: 05/09/2023] [Indexed: 05/13/2023]
Abstract
BACKGROUND DisProt is the primary repository of Intrinsically Disordered Proteins (IDPs). This database is manually curated and the annotations there have strong experimental support. Currently, DisProt contains a relatively small number of proteins highlighting the importance of transferring annotations regarding verified disorder state and corresponding functions to homologous proteins in other species. In such a way, providing them with highly valuable information to better understand their biological roles. While the principles and practicalities of homology transfer are well-established for globular proteins, these are largely lacking for disordered proteins. METHODS We used DisProt to evaluate the transferability of the annotation terms to orthologous proteins. For each protein, we looked for their orthologs, with the assumption that they will have a similar function. Then, for each protein and their orthologs we made multiple sequence alignments (MSAs). Disordered sequences are fast evolving and can be hard to align: Therefore we implemented alignment quality control steps ensuring robust alignments before mapping the annotations. RESULTS We have designed a pipeline to obtain good quality MSAs and to transfer annotations from any protein to their orthologs. Applying the pipeline to DisProt proteins, from the 1,731 entries with 5,623 annotations we can reach 97,555 orthologs and transfer a total of 301,190 terms by homology. We also provide a web server for consulting the results of DisProt proteins and execute the pipeline for any other protein. The server Homology Transfer IDP (HoTIDP) is accessible at http://hotidp.leloir.org.ar. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Elizabeth Martínez-Pérez
- Bioinformatics Unit, Fundación Instituto Leloir/IIBBA, Buenos Aires, Argentina
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Mátyás Pajkos
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest, Hungary
| | | | - Toby J Gibson
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Zsuzsanna Dosztanyi
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest, Hungary
| | | |
Collapse
|
14
|
Del Conte A, Monzon AM, Clementel D, Camagni GF, Minervini G, Tosatto SCE, Piovesan D. RING-PyMOL: residue interaction networks of structural ensembles and molecular dynamics. Bioinformatics 2023; 39:7133739. [PMID: 37079739 PMCID: PMC10159649 DOI: 10.1093/bioinformatics/btad260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 03/21/2023] [Accepted: 04/17/2023] [Indexed: 04/22/2023] Open
Abstract
• RING-PyMOL is a plugin for PyMOL providing a set of analysis tools for structural ensembles and molecular dynamic (MD) simulations. RING-PyMOL combines residue interaction networks, as provided by the RING software, with structural clustering to enhance the analysis and visualization of the conformational complexity. It combines precise calculation of non-covalent interactions with the power of PyMOL to manipulate and visualize protein structures. The plugin identifies and highlights correlating contacts and interaction patterns that can explain structural allostery, active sites and structural heterogeneity connected with molecular function. It is easy to use and extremely fast, processing and rendering hundreds of models and long trajectories in seconds. RING-PyMOL generates a number of interactive plots and output files for use with external tools. The underlying RING software has been improved extensively. It is ten times faster, can process mmCIF files and it identifies typed interactions also for nucleic acids. AVAILABILITY AND IMPLEMENTATION https://github.com/BioComputingUP/ring-pymol.
Collapse
Affiliation(s)
- Alessio Del Conte
- Department of Biomedical Sciences, University of Padua, Padova, 35121, Italy
| | | | - Damiano Clementel
- Department of Biomedical Sciences, University of Padua, Padova, 35121, Italy
| | - Giorgia F Camagni
- Department of Biomedical Sciences, University of Padua, Padova, 35121, Italy
| | - Giovanni Minervini
- Department of Biomedical Sciences, University of Padua, Padova, 35121, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, Padova, 35121, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padua, Padova, 35121, Italy
| |
Collapse
|
15
|
Hatos A, Teixeira JM, Barrera-Vilarmau S, Horvath A, Tosatto SCE, Vendruscolo M, Fuxreiter M. FuzPred: a web server for the sequence-based prediction of the context-dependent binding modes of proteins. Nucleic Acids Res 2023:7092913. [PMID: 36987846 DOI: 10.1093/nar/gkad214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 03/01/2023] [Accepted: 03/27/2023] [Indexed: 03/30/2023] Open
Abstract
Proteins form complex interactions in the cellular environment to carry out their functions. They exhibit a wide range of binding modes depending on the cellular conditions, which result in a variety of ordered or disordered assemblies. To help rationalise the binding behavior of proteins, the FuzPred server predicts their sequence-based binding modes without specifying their binding partners. The binding mode defines whether the bound state is formed through a disorder-to-order transition resulting in a well-defined conformation, or through a disorder-to-disorder transition where the binding partners remain conformationally heterogeneous. To account for the context-dependent nature of the binding modes, the FuzPred method also estimates the multiplicity of binding modes, the likelihood of sampling multiple binding modes. Protein regions with a high multiplicity of binding modes may serve as regulatory sites or hot-spots for structural transitions in the assembly. To facilitate the interpretation of the predictions, protein regions with different interaction behaviors can be visualised on protein structures generated by AlphaFold. The FuzPred web server (https://fuzpred.bio.unipd.it) thus offers insights into the structural and dynamical changes of proteins upon interactions and contributes to development of structure-function relationships under a variety of cellular conditions.
Collapse
Affiliation(s)
- Andras Hatos
- Department of Biomedical Sciences, University of Padova, Padova, Italy
- Department of Oncology, Lausanne University Hospital, Lausanne 1011, Switzerland; Department of Computational Biology, University of Lausanne, Lausanne 1015, Switzerland; Swiss Institute of Bioinformatics, Lausanne1015, Switzerland; Swiss Cancer Center Leman, Lausanne 1011, Switzerland
| | - João Mc Teixeira
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | | | - Attila Horvath
- John Curtin School of Medical Research, The Australian National University, Acton, Australia
| | | | - Michele Vendruscolo
- Centre for Misfolding Diseases, Department of Chemistry, University of Cambridge, UK
| | - Monika Fuxreiter
- Department of Biomedical Sciences, University of Padova, Padova, Italy
- Department of Physics and Astronomy, University of Padova, Padova, Italy
| |
Collapse
|
16
|
Camagni GF, Minervini G, Tosatto SCE. Structural Characterization of Hypoxia Inducible Factor α-Prolyl Hydroxylase Domain 2 Interaction through MD Simulations. Int J Mol Sci 2023; 24:ijms24054710. [PMID: 36902141 PMCID: PMC10003257 DOI: 10.3390/ijms24054710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 02/23/2023] [Accepted: 02/25/2023] [Indexed: 03/05/2023] Open
Abstract
The Prolyl Hydroxylases (PHDs) are an enzymatic family that regulates cell oxygen-sensing. PHDs hydroxylate hypoxia-inducible transcription factors α (HIFs-α) driving their proteasomal degradation. Hypoxia inhibits PHDs activity, inducing HIFs-α stabilization and cell adaptation to hypoxia. As a hallmark of cancer, hypoxia promotes neo-angiogenesis and cell proliferation. PHD isoforms are thought to have a variable impact on tumor progression. All isoforms hydroxylate HIF-α (HIF-1,2,3α) with different affinities. However, what determines these differences and how they pair with tumor growth is poorly understood. Here, molecular dynamics simulations were used to characterize the PHD2 binding properties in complexes with HIF-1α and HIF-2α. In parallel, conservation analysis and binding free energy calculations were performed to better understand PHD2 substrate affinity. Our data suggest a direct association between the PHD2 C-terminus and HIF-2α that is not observed in the PHD2/HIF-1α complex. Furthermore, our results indicate that phosphorylation of a PHD2 residue, Thr405, causes a variation in binding energy, despite the fact that this PTM has only a limited structural impact on PHD2/HIFs-α complexes. Collectively, our findings suggest that the PHD2 C-terminus may act as a molecular regulator of PHD's activity.
Collapse
|
17
|
Deutsch EW, Vizcaíno JA, Jones AR, Binz PA, Lam H, Klein J, Bittremieux W, Perez-Riverol Y, Tabb DL, Walzer M, Ricard-Blum S, Hermjakob H, Neumann S, Mak TD, Kawano S, Mendoza L, Van Den Bossche T, Gabriels R, Bandeira N, Carver J, Pullman B, Sun Z, Hoffmann N, Shofstahl J, Zhu Y, Licata L, Quaglia F, Tosatto SCE, Orchard SE. Proteomics Standards Initiative at Twenty Years: Current Activities and Future Work. J Proteome Res 2023; 22:287-301. [PMID: 36626722 PMCID: PMC9903322 DOI: 10.1021/acs.jproteome.2c00637] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Indexed: 01/11/2023]
Abstract
The Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has been successfully developing guidelines, data formats, and controlled vocabularies (CVs) for the proteomics community and other fields supported by mass spectrometry since its inception 20 years ago. Here we describe the general operation of the PSI, including its leadership, working groups, yearly workshops, and the document process by which proposals are thoroughly and publicly reviewed in order to be ratified as PSI standards. We briefly describe the current state of the many existing PSI standards, some of which remain the same as when originally developed, some of which have undergone subsequent revisions, and some of which have become obsolete. Then the set of proposals currently being developed are described, with an open call to the community for participation in the forging of the next generation of standards. Finally, we describe some synergies and collaborations with other organizations and look to the future in how the PSI will continue to promote the open sharing of data and thus accelerate the progress of the field of proteomics.
Collapse
Affiliation(s)
- Eric W. Deutsch
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Juan Antonio Vizcaíno
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Andrew R. Jones
- Institute
of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Pierre-Alain Binz
- Clinical
Chemistry Service, Lausanne University Hospital, 1011 976 Lausanne, Switzerland
| | - Henry Lam
- Department
of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 999077, P. R. China.
| | - Joshua Klein
- Program for
Bioinformatics, Boston University, Boston, Massachusetts 02215, United States
| | - Wout Bittremieux
- Skaggs
School
of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
- Department
of Computer Science, University of Antwerp, 2020 Antwerpen, Belgium
| | - Yasset Perez-Riverol
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - David L. Tabb
- SA MRC
Centre for TB Research, DST/NRF Centre of Excellence for Biomedical
TB Research, Division of Molecular Biology and Human Genetics, Faculty
of Medicine and Health Sciences, Stellenbosch
University, Cape Town 7602, South Africa
| | - Mathias Walzer
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Sylvie Ricard-Blum
- Univ.
Lyon, Université Lyon 1, ICBMS, UMR 5246, 69622 Villeurbanne, France
| | - Henning Hermjakob
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Steffen Neumann
- Bioinformatics
and Scientific Data, Leibniz Institute of
Plant Biochemistry, 06120 Halle, Germany
- German
Centre for Integrative Biodiversity Research (iDiv), 04103 Halle-Jena-Leipzig, Germany
| | - Tytus D. Mak
- Mass Spectrometry
Data Center, National Institute of Standards
and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United
States
| | - Shin Kawano
- Database
Center for Life Science, Joint Support Center for Data Science Research, Research Organization of Information and Systems, Chiba 277-0871, Japan
- Faculty
of Contemporary Society, Toyama University
of International Studies, Toyama 930-1292, Japan
- School
of Frontier Engineering, Kitasato University, Sagamihara 252-0373, Japan
| | - Luis Mendoza
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Tim Van Den Bossche
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Nuno Bandeira
- Skaggs
School
of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego 92093-0404, United States
| | - Jeremy Carver
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego 92093-0404, United States
| | - Benjamin Pullman
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego 92093-0404, United States
| | - Zhi Sun
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Nils Hoffmann
- Institute
for Bio- and Geosciences (IBG-5), Forschungszentrum
Jülich GmbH, 52428 Jülich, Germany
| | - Jim Shofstahl
- Thermo
Fisher Scientific, 355 River Oaks Parkway, San Jose, California 95134, United States
| | - Yunping Zhu
- National
Center for Protein Sciences (Beijing), Beijing
Institute of Lifeomics, #38, Life Science Park, Changping District, Beijing 102206, China
| | - Luana Licata
- Fondazione
Human Technopole, 20157 Milan, Italy
- Department
of Biology, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Federica Quaglia
- Institute
of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), 70126 Bari, Italy
- Department
of Biomedical Sciences, University of Padova, 35131 Padova, Italy
| | | | - Sandra E. Orchard
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| |
Collapse
|
18
|
Paysan-Lafosse T, Blum M, Chuguransky S, Grego T, Pinto BL, Salazar G, Bileschi M, Bork P, Bridge A, Colwell L, Gough J, Haft D, Letunić I, Marchler-Bauer A, Mi H, Natale D, Orengo C, Pandurangan A, Rivoire C, Sigrist CJA, Sillitoe I, Thanki N, Thomas PD, Tosatto SCE, Wu C, Bateman A. InterPro in 2022. Nucleic Acids Res 2023; 51:D418-D427. [PMID: 36350672 PMCID: PMC9825450 DOI: 10.1093/nar/gkac993] [Citation(s) in RCA: 478] [Impact Index Per Article: 478.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 10/12/2022] [Accepted: 10/28/2022] [Indexed: 11/11/2022] Open
Abstract
The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. Here, we report recent developments with InterPro (version 90.0) and its associated software, including updates to data content and to the website. These developments extend and enrich the information provided by InterPro, and provide a more user friendly access to the data. Additionally, we have worked on adding Pfam website features to the InterPro website, as the Pfam website will be retired in late 2022. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB. Moreover, we report the development of a card game as a method of engaging the non-scientific community. Finally, we discuss the benefits and challenges brought by the use of artificial intelligence for protein structure prediction.
Collapse
Affiliation(s)
- Typhaine Paysan-Lafosse
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Matthias Blum
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Sara Chuguransky
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Tiago Grego
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Beatriz Lázaro Pinto
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Gustavo A Salazar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | | | - Peer Bork
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
- Yonsei Frontier Lab (YFL), Yonsei University, 03722 Seoul, South Korea
- Department of Bioinformatics, Biocenter, University of Würzburg, 97074 Würzburg, Germany
| | - Alan Bridge
- Swiss-Prot Group, Swiss Institute of Bioinformatics, CMU, 1 rue Michel Servet, CH-1211, Geneva 4, Switzerland
| | - Lucy Colwell
- Google Research, Brain team, Cambridge, MA, USA
- Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Julian Gough
- Medical Research Council Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Ave, Trumpington, Cambridge CB2 0QH, UK
| | - Daniel H Haft
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Ivica Letunić
- Biobyte Solutions GmbH, Bothestr 142, 69126 Heidelberg, Germany
| | - Aron Marchler-Bauer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Huaiyu Mi
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Darren A Natale
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Christine A Orengo
- Department of Structural and Molecular Biology, University College London, Gower St, Bloomsbury, London WC1E 6BT, UK
| | - Arun P Pandurangan
- Medical Research Council Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Ave, Trumpington, Cambridge CB2 0QH, UK
- Department of Biochemistry, Sanger Building, University of Cambridge, Cambridge, UK
| | - Catherine Rivoire
- Swiss-Prot Group, Swiss Institute of Bioinformatics, CMU, 1 rue Michel Servet, CH-1211, Geneva 4, Switzerland
| | - Christian J A Sigrist
- Swiss-Prot Group, Swiss Institute of Bioinformatics, CMU, 1 rue Michel Servet, CH-1211, Geneva 4, Switzerland
| | - Ian Sillitoe
- Department of Structural and Molecular Biology, University College London, Gower St, Bloomsbury, London WC1E 6BT, UK
| | - Narmada Thanki
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy
| | - Cathy H Wu
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
- Center for Bioinformatics and Computational Biology and Protein Information Resource, University of Delaware, Newark, DE 19711, USA
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
19
|
Piovesan D, Monzon AM, Tosatto SCE. Intrinsic Protein Disorder and Conditional Folding in AlphaFoldDB. Protein Sci 2022; 31:e4466. [PMID: 36210722 PMCID: PMC9601767 DOI: 10.1002/pro.4466] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 09/29/2022] [Accepted: 10/05/2022] [Indexed: 11/23/2022]
Abstract
Intrinsically disordered regions (IDRs) defying the traditional protein structure–function paradigm have been difficult to analyze. The availability of accurate structure predictions on a large scale in AlphaFoldDB offers a fresh perspective on IDR prediction. Here, we establish three baselines for IDR prediction from AlphaFoldDB models based on the recent CAID dataset. Surprisingly, AlphaFoldDB is highly competitive for predicting both IDRs and conditionally folded binding regions, demonstrating the plasticity of the disorder to structure continuum.
Collapse
Affiliation(s)
| | - Alexander Miguel Monzon
- Dept. of Biomedical Sciences, University of Padova, Italy.,Dept. of Information Engineering, University of Padova, Italy
| | | |
Collapse
|
20
|
Pradelli F, Minervini G, Tosatto SCE. Mocafe: a comprehensive Python library for simulating cancer development with Phase Field Models. Bioinformatics 2022; 38:4440-4441. [PMID: 35876789 DOI: 10.1093/bioinformatics/btac521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 06/07/2022] [Accepted: 07/22/2022] [Indexed: 12/24/2022] Open
Abstract
SUMMARY Mathematical models are effective in studying cancer development at different scales from metabolism to tissue. Phase Field Models (PFMs) have been shown to reproduce accurately cancer growth and other related phenomena, including expression of relevant molecules, extracellular matrix remodeling and angiogenesis. However, implementations of such models are rarely published, reducing access to these techniques. To reduce this gap, we developed Mocafe, a modular open-source Python package that implements some of the most important PFMs reported in the literature. Mocafe is designed to handle both PFMs purely based on differential equations and hybrid agent-based PFMs. Moreover, Mocafe is meant to be extensible, allowing the inclusion of new models in future releases. AVAILABILITY AND IMPLEMENTATION Mocafe is a Python package based on FEniCS, a popular computing platform for solving partial differential equations. The source code, extensive documentation and demos are provided on GitHub at URL: https://github.com/BioComputingUP/mocafe. Moreover, we uploaded on Zenodo an archive of the package, which is available at https://doi.org/10.5281/zenodo.6366052.
Collapse
Affiliation(s)
- Franco Pradelli
- Department of Biomedical Sciences, University of Padova, 35121 Padova, Italy
| | - Giovanni Minervini
- Department of Biomedical Sciences, University of Padova, 35121 Padova, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova, 35121 Padova, Italy
| |
Collapse
|
21
|
Quaglia F, Salladini E, Carraro M, Minervini G, Tosatto SCE, Le Mercier P. SARS-CoV-2 variants preferentially emerge at intrinsically disordered protein sites helping immune evasion. FEBS J 2022; 289:4240-4250. [PMID: 35108439 PMCID: PMC9542094 DOI: 10.1111/febs.16379] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 01/21/2022] [Accepted: 01/31/2022] [Indexed: 12/13/2022]
Abstract
The SARS‐CoV‐2 pandemic is maintained by the emergence of successive variants, highlighting the flexibility of the protein sequences of the virus. We show that experimentally determined intrinsically disordered regions (IDRs) are abundant in the SARS‐CoV‐2 viral proteins, making up to 28% of disorder content for the S1 subunit of spike and up to 51% for the nucleoprotein, with the vast majority of mutations occurring in the 13 major variants mapped to these IDRs. Strikingly, antigenic sites are enriched in IDRs, in the receptor‐binding domain (RBD) and in the N‐terminal domain (NTD), suggesting a key role of structural flexibility in the antigenicity of the SARS‐CoV‐2 protein surface. Mutations occurring in the S1 subunit and nucleoprotein (N) IDRs are critical for immune evasion and antibody escape, suggesting potential additional implications for vaccines and monoclonal therapeutic strategies. Overall, this suggests the presence of variable regions on S1 and N protein surfaces, which confer sequence and antigenic flexibility to the virus without altering its protein functions.
Collapse
Affiliation(s)
- Federica Quaglia
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), Bari, Italy.,Department of Biomedical Sciences, University of Padova, Italy
| | | | - Marco Carraro
- Department of Biomedical Sciences, University of Padova, Italy
| | | | | | - Philippe Le Mercier
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland
| |
Collapse
|
22
|
Quaglia F, Hatos A, Salladini E, Piovesan D, Tosatto SCE. Exploring Manually Curated Annotations of Intrinsically Disordered Proteins with DisProt. Curr Protoc 2022; 2:e484. [PMID: 35789137 DOI: 10.1002/cpz1.484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
DisProt is the major repository of manually curated data for intrinsically disordered proteins collected from the literature. Although lacking a stable three-dimensional structure under physiological conditions, intrinsically disordered proteins carry out a plethora of biological functions, some of them directly arising from their flexible nature. A growing number of scientific studies have been published during the last few decades to shed light on their unstructured state, their binding modes, and their functions. DisProt makes use of a team of expert biocurators to provide up-to-date annotations of intrinsically disordered proteins from the literature, making them available to the scientific community. Here we present a comprehensive description on how to use DisProt in different contexts and provide a detailed explanation of how to explore and interpret manually curated annotations of intrinsically disordered proteins. We describe how to search DisProt annotations, both using the web interface and the API for programmatic access. Finally, we explain how to visualize and interpret a DisProt entry, the SARS-CoV-2 Nucleoprotein, characterized by the presence of unstructured N-terminal and C-terminal regions and a flexible linker. © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Performing a search in DisProt Support Protocol 1: Downloading options Support Protocol 2: Programmatic access with DisProt REST API Basic Protocol 2: Exploring the DisProt Ontology page Basic Protocol 3: Visualizing and interpreting DisProt entries-the SARS-CoV-2 Nucleoprotein use case.
Collapse
Affiliation(s)
- Federica Quaglia
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), Bari, Italy
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - András Hatos
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Edoardo Salladini
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | | |
Collapse
|
23
|
Hatos A, Tosatto SCE, Vendruscolo M, Fuxreiter M. FuzDrop on AlphaFold: visualizing the sequence-dependent propensity of liquid-liquid phase separation and aggregation of proteins. Nucleic Acids Res 2022; 50:W337-W344. [PMID: 35610022 PMCID: PMC9252777 DOI: 10.1093/nar/gkac386] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 04/20/2022] [Accepted: 05/19/2022] [Indexed: 11/15/2022] Open
Abstract
Many proteins perform their functions within membraneless organelles, where they form a liquid-like condensed state, also known as droplet state. The FuzDrop method predicts the probability of spontaneous liquid-liquid phase separation of proteins and provides a sequence-based score to identify the regions that promote this process. Furthermore, the FuzDrop method estimates the propensity of conversion of proteins to the amyloid state, and identifies aggregation hot-spots, which can drive the irreversible maturation of the liquid-like droplet state. These predictions can also identify mutations that can induce formation of amyloid aggregates, including those implicated in human diseases. To facilitate the interpretation of the predictions, the droplet-promoting and aggregation-promoting regions can be visualized on protein structures generated by AlphaFold. The FuzDrop server (https://fuzdrop.bio.unipd.it) thus offers insights into the complex behavior of proteins in their condensed states and facilitates the understanding of the functional relationships of proteins.
Collapse
Affiliation(s)
- Andras Hatos
- Department of Biomedical Sciences, University of Padova, via Ugo Bassi 58/B, 35131 Padova, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova, via Ugo Bassi 58/B, 35131 Padova, Italy
| | - Michele Vendruscolo
- Centre for Misfolding Diseases, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK
| | - Monika Fuxreiter
- Department of Biomedical Sciences, University of Padova, via Ugo Bassi 58/B, 35131 Padova, Italy
| |
Collapse
|
24
|
Clementel D, Del Conte A, Monzon AM, Camagni GF, Minervini G, Piovesan D, Tosatto SCE. RING 3.0: fast generation of probabilistic residue interaction networks from structural ensembles. Nucleic Acids Res 2022; 50:W651-W656. [PMID: 35554554 PMCID: PMC9252747 DOI: 10.1093/nar/gkac365] [Citation(s) in RCA: 56] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 04/15/2022] [Accepted: 04/30/2022] [Indexed: 12/18/2022] Open
Abstract
Residue interaction networks (RINs) are used to represent residue contacts in protein structures. Thanks to the advances in network theory, RINs have been proved effective as an alternative to coordinate data in the analysis of complex systems. The RING server calculates high quality and reliable non-covalent molecular interactions based on geometrical parameters. Here, we present the new RING 3.0 version extending the previous functionality in several ways. The underlying software library has been re-engineered to improve speed by an order of magnitude. RING now also supports the mmCIF format and provides typed interactions for the entire PDB chemical component dictionary, including nucleic acids. Moreover, RING now employs probabilistic graphs, where multiple conformations (e.g. NMR or molecular dynamics ensembles) are mapped as weighted edges, opening up new ways to analyze structural data. The web interface has been expanded to include a simultaneous view of the RIN alongside a structure viewer, with both synchronized and clickable. Contact evolution across models (or time) is displayed as a heatmap and can help in the discovery of correlating interaction patterns. The web server, together with an extensive help and tutorial, is available from URL: https://ring.biocomputingup.it/.
Collapse
Affiliation(s)
- Damiano Clementel
- Department of Biomedical Sciences, University of Padova, Padova 35131, Italy
| | - Alessio Del Conte
- Department of Biomedical Sciences, University of Padova, Padova 35131, Italy
| | | | - Giorgia F Camagni
- Department of Biomedical Sciences, University of Padova, Padova 35131, Italy
| | - Giovanni Minervini
- Department of Biomedical Sciences, University of Padova, Padova 35131, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova 35131, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova, Padova 35131, Italy
| |
Collapse
|
25
|
Piovesan D, Monzon AM, Quaglia F, Tosatto SCE. Databases for intrinsically disordered proteins. Acta Crystallogr D Struct Biol 2022; 78:144-151. [PMID: 35102880 PMCID: PMC8805306 DOI: 10.1107/s2059798321012109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Accepted: 11/12/2021] [Indexed: 11/28/2022] Open
Abstract
Intrinsically disordered regions (IDRs) lacking a fixed three-dimensional protein structure are widespread and play a central role in cell regulation. Only a small fraction of IDRs have been functionally characterized, with heterogeneous experimental evidence that is largely buried in the literature. Predictions of IDRs are still difficult to estimate and are poorly characterized. Here, an overview of the publicly available knowledge about IDRs is reported, including manually curated resources, deposition databases and prediction repositories. The types, scopes and availability of the various resources are analyzed, and their complementarity and overlap are highlighted. The volume of information included and the relevance to the field of structural biology are compared.
Collapse
Affiliation(s)
- Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | | | - Federica Quaglia
- Department of Biomedical Sciences, University of Padova, Padova, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR–IBIOM), Bari, Italy
| | | |
Collapse
|
26
|
Bevilacqua M, Paladin L, Tosatto SCE, Piovesan D. ProSeqViewer: an interactive, responsive and efficient TypeScript library for visualization of sequences and alignments in web applications. Bioinformatics 2022; 38:1129-1130. [PMID: 34788797 DOI: 10.1093/bioinformatics/btab764] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 10/13/2021] [Accepted: 11/10/2021] [Indexed: 02/03/2023] Open
Abstract
SUMMARY Biological data is ever-increasing in amount and complexity. The mapping of this data to biological entities such as nucleotide and amino acid sequences supports biological data analysis, classification and prediction. Sequence alignments and comparison allow the transfer of knowledge to evolutionary-related entities, the mapping of functional domains, the identification of binding and modification sites. To support these types of studies, we developed ProSeqViewer, a tool to visualize annotation on single sequences and multiple sequence alignments. This state-of-the-art multifunctional library was developed as a modular component to be integrated into static or dynamic web resources and support intuitive visualization of sequence features. ProseSeqViewer is extremely lightweight, fast, interactive, dynamic, responsive and works at any screen size. It generates pure HTML which is compatible with any browser and operating system. ProSeqViewer can exchange events with other visualization components and is already used by multiple biological databases. AVAILABILITY AND IMPLEMENTATION ProSeqViewer is an open-source TypeScript library compatible with state-of-the-art website environments. The source code and an extensive documentation including use cases are available from the URL: https://github.com/BioComputingUP/ProSeqViewer.
Collapse
Affiliation(s)
- Martina Bevilacqua
- Department of Biomedical Sciences, University of Padua, 35121 Padova, Italy
| | - Lisanna Paladin
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, 35121 Padova, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padua, 35121 Padova, Italy
| |
Collapse
|
27
|
Varadi M, Anyango S, Armstrong D, Berrisford J, Choudhary P, Deshpande M, Nadzirin N, Nair SS, Pravda L, Tanweer A, Al-Lazikani B, Andreini C, Barton GJ, Bednar D, Berka K, Blundell T, Brock KP, Carazo JM, Damborsky J, David A, Dey S, Dunbrack R, Recio JF, Fraternali F, Gibson T, Helmer-Citterich M, Hoksza D, Hopf T, Jakubec D, Kannan N, Krivak R, Kumar M, Levy ED, London N, Macias JR, Srivatsan MM, Marks DS, Martens L, McGowan SA, McGreig JE, Modi V, Parra RG, Pepe G, Piovesan D, Prilusky J, Putignano V, Radusky LG, Ramasamy P, Rausch AO, Reuter N, Rodriguez LA, Rollins NJ, Rosato A, Rubach P, Serrano L, Singh G, Skoda P, Sorzano COS, Stourac J, Sulkowska JI, Svobodova R, Tichshenko N, Tosatto SCE, Vranken W, Wass MN, Xue D, Zaidman D, Thornton J, Sternberg M, Orengo C, Velankar S. PDBe-KB: collaboratively defining the biological context of structural data. Nucleic Acids Res 2022; 50:D534-D542. [PMID: 34755867 PMCID: PMC8728252 DOI: 10.1093/nar/gkab988] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 10/01/2021] [Accepted: 10/14/2021] [Indexed: 12/15/2022] Open
Abstract
The Protein Data Bank in Europe - Knowledge Base (PDBe-KB, https://pdbe-kb.org) is an open collaboration between world-leading specialist data resources contributing functional and biophysical annotations derived from or relevant to the Protein Data Bank (PDB). The goal of PDBe-KB is to place macromolecular structure data in their biological context by developing standardised data exchange formats and integrating functional annotations from the contributing partner resources into a knowledge graph that can provide valuable biological insights. Since we described PDBe-KB in 2019, there have been significant improvements in the variety of available annotation data sets and user functionality. Here, we provide an overview of the consortium, highlighting the addition of annotations such as predicted covalent binders, phosphorylation sites, effects of mutations on the protein structure and energetic local frustration. In addition, we describe a library of reusable web-based visualisation components and introduce new features such as a bulk download data service and a novel superposition service that generates clusters of superposed protein chains weekly for the whole PDB archive.
Collapse
|
28
|
Quaglia F, Mészáros B, Salladini E, Hatos A, Pancsa R, Chemes LB, Pajkos M, Lazar T, Peña-Díaz S, Santos J, Ács V, Farahi N, Fichó E, Aspromonte M, Bassot C, Chasapi A, Davey N, Davidović R, Dobson L, Elofsson A, Erdős G, Gaudet P, Giglio M, Glavina J, Iserte J, Iglesias V, Kálmán Z, Lambrughi M, Leonardi E, Longhi S, Macedo-Ribeiro S, Maiani E, Marchetti J, Marino-Buslje C, Mészáros A, Monzon A, Minervini G, Nadendla S, Nilsson JF, Novotný M, Ouzounis C, Palopoli N, Papaleo E, Pereira P, Pozzati G, Promponas V, Pujols J, Rocha AS, Salas M, Sawicki LR, Schad E, Shenoy A, Szaniszló T, Tsirigos K, Veljkovic N, Parisi G, Ventura S, Dosztányi Z, Tompa P, Tosatto SCE, Piovesan D. DisProt in 2022: improved quality and accessibility of protein intrinsic disorder annotation. Nucleic Acids Res 2022; 50:D480-D487. [PMID: 34850135 PMCID: PMC8728214 DOI: 10.1093/nar/gkab1082] [Citation(s) in RCA: 79] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/15/2021] [Accepted: 10/20/2021] [Indexed: 02/03/2023] Open
Abstract
The Database of Intrinsically Disordered Proteins (DisProt, URL: https://disprot.org) is the major repository of manually curated annotations of intrinsically disordered proteins and regions from the literature. We report here recent updates of DisProt version 9, including a restyled web interface, refactored Intrinsically Disordered Proteins Ontology (IDPO), improvements in the curation process and significant content growth of around 30%. Higher quality and consistency of annotations is provided by a newly implemented reviewing process and training of curators. The increased curation capacity is fostered by the integration of DisProt with APICURON, a dedicated resource for the proper attribution and recognition of biocuration efforts. Better interoperability is provided through the adoption of the Minimum Information About Disorder (MIADE) standard, an active collaboration with the Gene Ontology (GO) and Evidence and Conclusion Ontology (ECO) consortia and the support of the ELIXIR infrastructure.
Collapse
Affiliation(s)
- Federica Quaglia
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), Bari, Italy
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Bálint Mészáros
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
| | - Edoardo Salladini
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - András Hatos
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Rita Pancsa
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest 1117, Hungary
| | - Lucía B Chemes
- Instituto de Investigaciones Biotecnológicas (IIBiO-CONICET), Universidad Nacional de San Martín, Av. 25 de Mayo y Francia, CP1650 Buenos Aires, Argentina
| | - Mátyás Pajkos
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary
| | - Tamas Lazar
- VIB-VUB Center for Structural Biology, Vlaams Instituut voor Biotechnology, Brussels, Belgium
- Structural Biology Brussels (SBB), Bioengineering Sciences Department, Vrije Universiteit Brussel (VUB), Brussels, Belgium
| | - Samuel Peña-Díaz
- Institut de Biotecnologia i Biomedicina, Universitat Autònoma de Barcelona, Barcelona, Spain
- Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Jaime Santos
- Institut de Biotecnologia i Biomedicina, Universitat Autònoma de Barcelona, Barcelona, Spain
- Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Veronika Ács
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest 1117, Hungary
| | - Nazanin Farahi
- VIB-VUB Center for Structural Biology, Vlaams Instituut voor Biotechnology, Brussels, Belgium
- Structural Biology Brussels (SBB), Bioengineering Sciences Department, Vrije Universiteit Brussel (VUB), Brussels, Belgium
| | - Erzsébet Fichó
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest 1117, Hungary
- Cytocast Kft., Vecsés, Hungary
| | - Maria Cristina Aspromonte
- Department of Woman and Child Health, University of Padova, Padova, Italy
- Pediatric Research Institute, Città della Speranza, Padova, Italy
| | - Claudio Bassot
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, 171 21 Solna, Sweden
| | - Anastasia Chasapi
- Biological Computation & Process Laboratory, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas, Thermi, Thessalonica 57001, Greece
| | - Norman E Davey
- Institute of Cancer Research, Chester Beatty Laboratories, 237 Fulham Rd, Chelsea, London, UK
| | - Radoslav Davidović
- Laboratory for Bioinformatics and Computational Chemistry, Vinča Institute of Nuclear Sciences, National Institute of the Republic of Serbia, University of Belgrade, 11000Belgrade, Serbia
| | - Laszlo Dobson
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest 1117, Hungary
| | - Arne Elofsson
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, 171 21 Solna, Sweden
| | - Gábor Erdős
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary
| | - Pascale Gaudet
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Michelle Giglio
- Institute for Genome Sciences, University of Maryland School of Medicine 670 W. Baltimore St., Baltimore, MD 21201, USA
| | - Juliana Glavina
- Instituto de Investigaciones Biotecnológicas (IIBiO-CONICET), Universidad Nacional de San Martín, Av. 25 de Mayo y Francia, CP1650 Buenos Aires, Argentina
| | - Javier Iserte
- Bioinformatics Unit, Fundación Instituto Leloir, Buenos Aires, C1405BWE, Argentina
| | - Valentín Iglesias
- Institut de Biotecnologia i Biomedicina, Universitat Autònoma de Barcelona, Barcelona, Spain
- Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Zsófia Kálmán
- Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Práter u. 50/A, 1083 Budapest, Hungary
| | - Matteo Lambrughi
- Cancer Structural Biology, Danish Cancer Society Research Center, Strandboulevarden 49, 2100 Copenhagen, Denmark
| | - Emanuela Leonardi
- Department of Woman and Child Health, University of Padova, Padova, Italy
- Pediatric Research Institute, Città della Speranza, Padova, Italy
| | - Sonia Longhi
- Lab. Architecture et Fonction des Macromolécules Biologiques (AFMB), UMR 7257, Aix Marseille University and Centre National de la Recherche Scientifique (CNRS), 163 Avenue de Luminy, Case 932, 13288, Marseille, France
| | - Sandra Macedo-Ribeiro
- Instituto de Biologia Molecular e Celular (IBMC), Universidade do Porto, 4200-135 Porto, Portugal
- Instituto de Investigação e Inovação em Saúde (i3S), Universidade do Porto, 4200-135 Porto, Portugal
| | - Emiliano Maiani
- Cancer Structural Biology, Danish Cancer Society Research Center, Strandboulevarden 49, 2100 Copenhagen, Denmark
| | - Julia Marchetti
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes - CONICET, Bernal, Buenos Aires B1876BXD, Argentina
| | | | - Attila Mészáros
- VIB-VUB Center for Structural Biology, Vlaams Instituut voor Biotechnology, Brussels, Belgium
- Structural Biology Brussels (SBB), Bioengineering Sciences Department, Vrije Universiteit Brussel (VUB), Brussels, Belgium
| | | | | | - Suvarna Nadendla
- Institute for Genome Sciences, University of Maryland School of Medicine 670 W. Baltimore St., Baltimore, MD 21201, USA
| | - Juliet F Nilsson
- Lab. Architecture et Fonction des Macromolécules Biologiques (AFMB), UMR 7257, Aix Marseille University and Centre National de la Recherche Scientifique (CNRS), 163 Avenue de Luminy, Case 932, 13288, Marseille, France
| | - Marian Novotný
- Dep. of Cell Biology, Faculty of Science, Vinicna 7, 128 43, Prague, Czech Republic
| | - Christos A Ouzounis
- Biological Computation & Process Laboratory, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas, Thermi, Thessalonica 57001, Greece
- Biological Computation & Computational Biology Group, Artificial Intelligence & Information Analysis Lab, Department of Computer Science, Aristotle University of Thessalonica, Thessalonica 54124, Greece
| | - Nicolás Palopoli
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes - CONICET, Bernal, Buenos Aires B1876BXD, Argentina
| | - Elena Papaleo
- Cancer Structural Biology, Danish Cancer Society Research Center, Strandboulevarden 49, 2100 Copenhagen, Denmark
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and Technology, Technical University of Denmark, Lyngby, Denmark
| | - Pedro José Barbosa Pereira
- Instituto de Biologia Molecular e Celular (IBMC), Universidade do Porto, 4200-135 Porto, Portugal
- Instituto de Investigação e Inovação em Saúde (i3S), Universidade do Porto, 4200-135 Porto, Portugal
| | - Gabriele Pozzati
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, 171 21 Solna, Sweden
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
| | - Jordi Pujols
- Institut de Biotecnologia i Biomedicina, Universitat Autònoma de Barcelona, Barcelona, Spain
- Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, Barcelona, Spain
| | | | - Martin Salas
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes - CONICET, Bernal, Buenos Aires B1876BXD, Argentina
| | - Luciana Rodriguez Sawicki
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes - CONICET, Bernal, Buenos Aires B1876BXD, Argentina
| | - Eva Schad
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest 1117, Hungary
| | - Aditi Shenoy
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, 171 21 Solna, Sweden
| | - Tamás Szaniszló
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary
| | - Konstantinos D Tsirigos
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - Nevena Veljkovic
- Laboratory for Bioinformatics and Computational Chemistry, Vinča Institute of Nuclear Sciences, National Institute of the Republic of Serbia, University of Belgrade, 11000Belgrade, Serbia
| | - Gustavo Parisi
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes - CONICET, Bernal, Buenos Aires B1876BXD, Argentina
| | - Salvador Ventura
- Institut de Biotecnologia i Biomedicina, Universitat Autònoma de Barcelona, Barcelona, Spain
- Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, Barcelona, Spain
- ICREA, Barcelona, Spain
| | - Zsuzsanna Dosztányi
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary
| | - Peter Tompa
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest 1117, Hungary
- VIB-VUB Center for Structural Biology, Vlaams Instituut voor Biotechnology, Brussels, Belgium
- Structural Biology Brussels (SBB), Bioengineering Sciences Department, Vrije Universiteit Brussel (VUB), Brussels, Belgium
| | | | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| |
Collapse
|
29
|
Nadendla S, Jackson R, Munro J, Quaglia F, Mészáros B, Olley D, Hobbs ET, Goralski SM, Chibucos M, Mungall CJ, Tosatto SCE, Erill I, Giglio MG. ECO: the Evidence and Conclusion Ontology, an update for 2022. Nucleic Acids Res 2022; 50:D1515-D1521. [PMID: 34986598 PMCID: PMC8728134 DOI: 10.1093/nar/gkab1025] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/12/2021] [Accepted: 10/18/2021] [Indexed: 11/12/2022] Open
Abstract
The Evidence and Conclusion Ontology (ECO) is a community resource that provides an ontology of terms used to capture the type of evidence that supports biomedical annotations and assertions. Consistent capture of evidence information with ECO allows tracking of annotation provenance, establishment of quality control measures, and evidence-based data mining. ECO is in use by dozens of data repositories and resources with both specific and general areas of focus. ECO is continually being expanded and enhanced in response to user requests as well as our aim to adhere to community best-practices for ontology development. The ECO support team engages in multiple collaborations with other ontologies and annotating groups. Here we report on recent updates to the ECO ontology itself as well as associated resources that are available through this project. ECO project products are freely available for download from the project website (https://evidenceontology.org/) and GitHub (https://github.com/evidenceontology/evidenceontology). ECO is released into the public domain under a CC0 1.0 Universal license.
Collapse
Affiliation(s)
- Suvarna Nadendla
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Rebecca Jackson
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - James Munro
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Federica Quaglia
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), Bari, Italy.,Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Bálint Mészáros
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
| | - Dustin Olley
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Elizabeth T Hobbs
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, Maryland, United States
| | - Stephen M Goralski
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, Maryland, United States
| | - Marcus Chibucos
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Christopher John Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Lab, Berkeley, California, USA
| | | | - Ivan Erill
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, Maryland, United States
| | - Michelle G Giglio
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
30
|
Balatti GE, Barletta GP, Parisi G, Tosatto SCE, Bellanda M, Fernandez-Alberti S. Intrinsically Disordered Region Modulates Ligand Binding in Glutaredoxin 1 from Trypanosoma Brucei. J Phys Chem B 2021; 125:13366-13375. [PMID: 34870419 DOI: 10.1021/acs.jpcb.1c07035] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Glutaredoxins are small proteins that share a common well-conserved thioredoxin-fold and participate in a wide variety of biological processes. Among them, class II Grx are redox-inactive proteins involved in iron-sulfur (Fe-S) metabolism. In the present work, we report different structural and dynamics aspects of 1CGrx1 from the pathogenic parasite Trypanosoma brucei that differentiate it from other orthologues by the presence of a parasite-specific unstructured N-terminal extension whose role has not been fully elucidated yet. Previous nuclear magnetic resonance (NMR) studies revealed significant differences with respect to the mutant lacking the disordered tail. Herein, we have performed atomistic molecular dynamics simulations that, complementary to NMR studies, confirm the intrinsically disordered nature of the N-terminal extension. Moreover, we confirm the main role of these residues in modulating the conformational dynamics of the glutathione-binding pocket. We observe that the N-terminal extension modifies the ligand cavity stiffening it by specific interactions that ultimately modulate its intrinsic flexibility, which may modify its role in the storage and/or transfer of preformed iron-sulfur clusters. These unique structural and dynamics aspects of Trypanosoma brucei 1CGrx1 differentiate it from other orthologues and could have functional relevance. In this way, our results encourage the study of other similar protein folding families with intrinsically disordered regions whose functional roles are still unrevealed and the screening of potential 1CGrx1 inhibitors as antitrypanosomal drug candidates.
Collapse
Affiliation(s)
- Galo E Balatti
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes/CONICET, B1876BXD Bernal, Argentina
| | - G Patricio Barletta
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes/CONICET, B1876BXD Bernal, Argentina
| | - Gustavo Parisi
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes/CONICET, B1876BXD Bernal, Argentina
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova, Viale G. Colombo 3, 35131 Padua, Italy
| | - Massimo Bellanda
- Department of Chemical Sciences, University of Padova, via Marzolo 1, 35131 Padua, Italy
| | | |
Collapse
|
31
|
Hatos A, Monzon AM, Tosatto SCE, Piovesan D, Fuxreiter M. FuzDB: a new phase in understanding fuzzy interactions. Nucleic Acids Res 2021; 50:D509-D517. [PMID: 34791357 PMCID: PMC8728163 DOI: 10.1093/nar/gkab1060] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/13/2021] [Accepted: 10/27/2021] [Indexed: 11/14/2022] Open
Abstract
Fuzzy interactions are specific, variable contacts between proteins and other biomolecules (proteins, DNA, RNA, small molecules) formed in accord to the cellular context. Fuzzy interactions have recently been demonstrated to regulate biomolecular condensates generated by liquid-liquid phase separation. The FuzDB v4.0 database (https://fuzdb.org) assembles experimentally identified examples of fuzzy interactions, where disordered regions mediate functionally important, context-dependent contacts between the partners in stoichiometric and higher-order assemblies. The new version of FuzDB establishes cross-links with databases on structure (PDB, BMRB, PED), function (ELM, UniProt) and biomolecular condensates (PhaSepDB, PhaSePro, LLPSDB). FuzDB v4.0 is a source to decipher molecular basis of complex cellular interaction behaviors, including those in protein droplets.
Collapse
Affiliation(s)
- Andras Hatos
- Department of Biomedical Sciences, University of Padova, via Ugo Bassi 58/B, 35131 Padova, Italy
| | - Alexander Miguel Monzon
- Department of Biomedical Sciences, University of Padova, via Ugo Bassi 58/B, 35131 Padova, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova, via Ugo Bassi 58/B, 35131 Padova, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, via Ugo Bassi 58/B, 35131 Padova, Italy
| | - Monika Fuxreiter
- Department of Biomedical Sciences, University of Padova, via Ugo Bassi 58/B, 35131 Padova, Italy.,Department of Biochemistry and Molecular Biology, University of Debrecen, Nagyerdei krt 98, 4010 Debrecen, Hungary
| |
Collapse
|
32
|
Mier P, Paladin L, Tamana S, Petrosian S, Hajdu-Soltész B, Urbanek A, Gruca A, Plewczynski D, Grynberg M, Bernadó P, Gáspári Z, Ouzounis CA, Promponas VJ, Kajava AV, Hancock JM, Tosatto SCE, Dosztanyi Z, Andrade-Navarro MA. Disentangling the complexity of low complexity proteins. Brief Bioinform 2021; 21:458-472. [PMID: 30698641 PMCID: PMC7299295 DOI: 10.1093/bib/bbz007] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 12/19/2018] [Accepted: 01/07/2019] [Indexed: 12/31/2022] Open
Abstract
There are multiple definitions for low complexity regions (LCRs) in protein sequences, with all of them broadly considering LCRs as regions with fewer amino acid types compared to an average composition. Following this view, LCRs can also be defined as regions showing composition bias. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, and more generally the overlaps between different properties related to LCRs, using examples. We argue that statistical measures alone cannot capture all structural aspects of LCRs and recommend the combined usage of a variety of predictive tools and measurements. While the methodologies available to study LCRs are already very advanced, we foresee that a more comprehensive annotation of sequences in the databases will enable the improvement of predictions and a better understanding of the evolution and the connection between structure and function of LCRs. This will require the use of standards for the generation and exchange of data describing all aspects of LCRs. Short abstract There are multiple definitions for low complexity regions (LCRs) in protein sequences. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, plus overlaps between different properties related to LCRs, using examples.
Collapse
Affiliation(s)
- Pablo Mier
- Institute of Organismic and Molecular Evolution, Johannes Gutenberg University of Mainz, Mainz, Germany
| | - Lisanna Paladin
- Department of Biomedical Science, University of Padova, Padova, Italy
| | - Stella Tamana
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
| | - Sophia Petrosian
- Biological Computation and Process Laboratory, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas, Thessalonica, Greece
| | - Borbála Hajdu-Soltész
- MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary
| | - Annika Urbanek
- Centre de Biochimie Structurale, INSERM, CNRS, Université de Montpellier, Montpellier, France
| | - Aleksandra Gruca
- Institute of Informatics, Silesian University of Technology, Gliwice, Poland
| | - Dariusz Plewczynski
- Center of New Technologies, University of Warsaw, Warsaw, Poland.,Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | | | - Pau Bernadó
- Centre de Biochimie Structurale, INSERM, CNRS, Université de Montpellier, Montpellier, France
| | - Zoltán Gáspári
- Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary
| | - Christos A Ouzounis
- Biological Computation and Process Laboratory, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas, Thessalonica, Greece
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
| | - Andrey V Kajava
- Centre de Recherche en Biologie Cellulaire de Montpellier, CNRS-UMR, Institut de Biologie Computationnelle, Universite de Montpellier, Montpellier, France.,Institute of Bioengineering, University ITMO, St. Petersburg, Russia
| | - John M Hancock
- Earlham Institute, Norwich, UK.,ELIXIR Hub, Welcome Genome Campus, Hinxton, UK
| | - Silvio C E Tosatto
- Department of Biomedical Science, University of Padova, Padova, Italy.,CNR Institute of Neuroscience, Padova, Italy
| | - Zsuzsanna Dosztanyi
- MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Johannes Gutenberg University of Mainz, Mainz, Germany
| |
Collapse
|
33
|
Quaglia F, Lazar T, Hatos A, Tompa P, Piovesan D, Tosatto SCE. Exploring Curated Conformational Ensembles of Intrinsically Disordered Proteins in the Protein Ensemble Database. Curr Protoc 2021; 1:e192. [PMID: 34252246 DOI: 10.1002/cpz1.192] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The Protein Ensemble Database (PED; https://proteinensemble.org/) is the major repository of conformational ensembles of intrinsically disordered proteins (IDPs). Conformational ensembles of IDPs are primarily provided by their authors or occasionally collected from literature, and are subsequently deposited in PED along with the corresponding structured, manually curated metadata. The modeling of conformational ensembles usually relies on experimental data from small-angle X-ray scattering (SAXS), fluorescence resonance energy transfer (FRET), NMR spectroscopy, and molecular dynamics (MD) simulations, or a combination of these techniques. The growing number of scientific studies based on these data, along with the astounding and swift progress in the field of protein intrinsic disorder, has required a significant update and upgrade of PED, first published in 2014. To this end, the database was entirely renewed in 2020 and now has a dedicated team of biocurators providing manually curated descriptions of the methods and conditions applied to generate the conformational ensembles and for checking consistency of the data. Here, we present a detailed description on how to explore PED with its protein pages and experimental pages, and how to interpret entries of conformational ensembles. We describe how to efficiently search conformational ensembles deposited in PED by means of its web interface and API. We demonstrate how to make sense of the PED protein page and its associated experimental entry pages with reference to the yeast Sic1 use case. © 2021 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Performing a search in PED Support Protocol 1: Programmatic access with the PED API Basic Protocol 2: Interpreting the protein page and the experimental entry page-the Sic1 use case Support Protocol 2: Downloading options Support Protocol 3: Understanding the validation report-the Sic1 use case Basic Protocol 3: Submitting new conformational ensembles to PED Basic Protocol 4: Providing feedback in PED.
Collapse
Affiliation(s)
- Federica Quaglia
- Department of Biomedical Sciences, University of Padova, Padova, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), Bari, Italy
| | - Tamas Lazar
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium.,VIB-VUB Center for Structural Biology, Brussels, Belgium
| | - András Hatos
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Peter Tompa
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium.,VIB-VUB Center for Structural Biology, Brussels, Belgium.,Institute of Enzymology, Research Centre for Natural Sciences, Budapest, Hungary
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | | |
Collapse
|
34
|
Hatos A, Quaglia F, Piovesan D, Tosatto SCE. APICURON: a database to credit and acknowledge the work of biocurators. Database (Oxford) 2021; 2021:baab019. [PMID: 33882120 PMCID: PMC8060004 DOI: 10.1093/database/baab019] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 03/12/2021] [Accepted: 04/12/2021] [Indexed: 11/14/2022]
Abstract
APICURON is an open and freely accessible resource that tracks and credits the work of biocurators across multiple participating knowledgebases. Biocuration is essential to extract knowledge from research data and make it available in a structured and standardized way to the scientific community. However, processing biological data-mainly from literature-requires a huge effort that is difficult to attribute and quantify. APICURON collects biocuration events from third-party resources and aggregates this information, spotlighting biocurator contributions. APICURON promotes biocurator engagement implementing gamification concepts like badges, medals and leaderboards and at the same time provides a monitoring service for registered resources and for biocurators themselves. APICURON adopts a data model that is flexible enough to represent and track the majority of biocuration activities. Biocurators are identified through their Open Researcher and Contributor ID. The definition of curation events, scoring systems and rules for assigning badges and medals are resource-specific and easily customizable. Registered resources can transfer curation activities on the fly through a secure and robust Application Programming Interface (API). Here, we show how simple and effective it is to connect a resource to APICURON, describing the DisProt database of intrinsically disordered proteins as a use case. We believe APICURON will provide biological knowledgebases with a service to recognize and credit the effort of their biocurators, monitor their activity and promote curator engagement. Database URL: https://apicuron.org.
Collapse
Affiliation(s)
- András Hatos
- Department of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padova 35131, Italy
| | - Federica Quaglia
- Department of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padova 35131, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padova 35131, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padova 35131, Italy
| |
Collapse
|
35
|
Monzon AM, Bonato P, Necci M, Tosatto SCE, Piovesan D. FLIPPER: Predicting and Characterizing Linear Interacting Peptides in the Protein Data Bank. J Mol Biol 2021; 433:166900. [PMID: 33647288 DOI: 10.1016/j.jmb.2021.166900] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 02/22/2021] [Accepted: 02/22/2021] [Indexed: 12/31/2022]
Abstract
A large fraction of peptides or protein regions are disordered in isolation and fold upon binding. These regions, also called MoRFs, SLiMs or LIPs, are often associated with signaling and regulation processes. However, despite their importance, only a limited number of examples are available in public databases and their automatic detection at the proteome level is problematic. Here we present FLIPPER, an automatic method for the detection of structurally linear sub-regions or peptides that interact with another chain in a protein complex. FLIPPER is a random forest classification that takes the protein structure as input and provides the propensity of each amino acid to be part of a LIP region. Models are built taking into consideration structural features such as intra- and inter-chain contacts, secondary structure, solvent accessibility in both bound and unbound state, structural linearity and chain length. FLIPPER is accurate when evaluated on non-redundant independent datasets, 99% precision and 99% sensitivity on PixelDB-25 and 87% precision and 88% sensitivity on DIBS-25. Finally, we used FLIPPER to process the entire Protein Data Bank and identified different classes of LIPs based on different binding modes and partner molecules. We provide a detailed description of these LIP categories and show that a large fraction of these regions are not detected by disorder predictors. All FLIPPER predictions are integrated in the MobiDB 4.0 database.
Collapse
Affiliation(s)
| | - Paolo Bonato
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Marco Necci
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Silvio C E Tosatto
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy.
| | - Damiano Piovesan
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| |
Collapse
|
36
|
Lazar T, Martínez-Pérez E, Quaglia F, Hatos A, Chemes L, Iserte JA, Méndez NA, Garrone NA, Saldaño T, Marchetti J, Rueda A, Bernadó P, Blackledge M, Cordeiro TN, Fagerberg E, Forman-Kay JD, Fornasari M, Gibson TJ, Gomes GNW, Gradinaru C, Head-Gordon T, Jensen MR, Lemke E, Longhi S, Marino-Buslje C, Minervini G, Mittag T, Monzon A, Pappu RV, Parisi G, Ricard-Blum S, Ruff KM, Salladini E, Skepö M, Svergun D, Vallet S, Varadi M, Tompa P, Tosatto SCE, Piovesan D. PED in 2021: a major update of the protein ensemble database for intrinsically disordered proteins. Nucleic Acids Res 2021; 49:D404-D411. [PMID: 33305318 PMCID: PMC7778965 DOI: 10.1093/nar/gkaa1021] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/13/2020] [Accepted: 12/08/2020] [Indexed: 12/21/2022] Open
Abstract
The Protein Ensemble Database (PED) (https://proteinensemble.org), which holds structural ensembles of intrinsically disordered proteins (IDPs), has been significantly updated and upgraded since its last release in 2016. The new version, PED 4.0, has been completely redesigned and reimplemented with cutting-edge technology and now holds about six times more data (162 versus 24 entries and 242 versus 60 structural ensembles) and a broader representation of state of the art ensemble generation methods than the previous version. The database has a completely renewed graphical interface with an interactive feature viewer for region-based annotations, and provides a series of descriptors of the qualitative and quantitative properties of the ensembles. High quality of the data is guaranteed by a new submission process, which combines both automatic and manual evaluation steps. A team of biocurators integrate structured metadata describing the ensemble generation methodology, experimental constraints and conditions. A new search engine allows the user to build advanced queries and search all entry fields including cross-references to IDP-related resources such as DisProt, MobiDB, BMRB and SASBDB. We expect that the renewed PED will be useful for researchers interested in the atomic-level understanding of IDP function, and promote the rational, structure-based design of IDP-targeting drugs.
Collapse
Affiliation(s)
- Tamas Lazar
- VIB-VUB Center for Structural Biology, Flanders Institute for Biotechnology, Brussels 1050, Belgium
- Structural Biology Brussels, Bioengineering Sciences Department, Vrije Universiteit Brussel, Brussels 1050, Belgium
| | - Elizabeth Martínez-Pérez
- Bioinformatics Unit, Fundación Instituto Leloir, Buenos Aires, C1405BWE, Argentina
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
| | - Federica Quaglia
- Dept. of Biomedical Sciences, University of Padua, Padova 35131, Italy
| | - András Hatos
- Dept. of Biomedical Sciences, University of Padua, Padova 35131, Italy
| | - Lucía B Chemes
- Instituto de Investigaciones Biotecnológicas “Dr. Rodolfo A. Ugalde’’, IIB-UNSAM, IIBIO-CONICET, Universidad Nacional de SanMartín, CP1650 San Martín, Buenos Aires, Argentina
| | - Javier A Iserte
- Bioinformatics Unit, Fundación Instituto Leloir, Buenos Aires, C1405BWE, Argentina
| | - Nicolás A Méndez
- Instituto de Investigaciones Biotecnológicas “Dr. Rodolfo A. Ugalde’’, IIB-UNSAM, IIBIO-CONICET, Universidad Nacional de SanMartín, CP1650 San Martín, Buenos Aires, Argentina
| | - Nicolás A Garrone
- Instituto de Investigaciones Biotecnológicas “Dr. Rodolfo A. Ugalde’’, IIB-UNSAM, IIBIO-CONICET, Universidad Nacional de SanMartín, CP1650 San Martín, Buenos Aires, Argentina
| | - Tadeo E Saldaño
- Laboratorio de Química y Biología Computacional, Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal B1876BXD, Buenos Aires, Argentina
| | - Julia Marchetti
- Laboratorio de Química y Biología Computacional, Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal B1876BXD, Buenos Aires, Argentina
| | - Ana Julia Velez Rueda
- Laboratorio de Química y Biología Computacional, Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal B1876BXD, Buenos Aires, Argentina
| | - Pau Bernadó
- Centre de Biochimie Structurale (CBS), CNRS, INSERM, University of Montpellier, Montpellier 34090, France
| | | | - Tiago N Cordeiro
- Centre de Biochimie Structurale (CBS), CNRS, INSERM, University of Montpellier, Montpellier 34090, France
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, Oeiras 2780-157, Portugal
| | - Eric Fagerberg
- Theoretical Chemistry, Lund University, Lund, POB 124, SE-221 00, Sweden
| | - Julie D Forman-Kay
- Molecular Medicine Program, Hospital for Sick Children, Toronto, M5G 1X8, Ontario, Canada
- Department of Biochemistry, University of Toronto, Toronto, M5S 1A8, Ontario, Canada
| | - Maria S Fornasari
- Laboratorio de Química y Biología Computacional, Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal B1876BXD, Buenos Aires, Argentina
| | - Toby J Gibson
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
| | - Gregory-Neal W Gomes
- Department of Physics, University of Toronto, Toronto, M5S 1A7, Ontario, Canada
- Department of Chemical and Physical Sciences, University of Toronto Mississauga, Mississauga, L5L 1C6, Ontario, Canada
| | - Claudiu C Gradinaru
- Department of Physics, University of Toronto, Toronto, M5S 1A7, Ontario, Canada
- Department of Chemical and Physical Sciences, University of Toronto Mississauga, Mississauga, L5L 1C6, Ontario, Canada
| | - Teresa Head-Gordon
- Departments of Chemistry, Bioengineering, Chemical and Biomolecular Engineering University of California, Berkeley, CA 94720, USA
| | | | - Edward A Lemke
- Biocentre, Johannes Gutenberg-University Mainz, Mainz 55128, Germany
- Institute of Molecular Biology, Mainz 55128, Germany
| | - Sonia Longhi
- Aix-Marseille University, CNRS, Architecture et Fonction des Macromolécules Biologiques (AFMB), Marseille 13288, France
| | | | | | - Tanja Mittag
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | | | - Rohit V Pappu
- Department of Biomedical Engineering, Center for Science & Engineering of Living Systems (CSELS), Washington University in St. Louis, MO 63130, USA
| | - Gustavo Parisi
- Laboratorio de Química y Biología Computacional, Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal B1876BXD, Buenos Aires, Argentina
| | - Sylvie Ricard-Blum
- Univ Lyon, University Claude Bernard Lyon 1, CNRS, INSA Lyon, CPE, Institute of Molecular and Supramolecular Chemistry and Biochemistry (ICBMS), UMR 5246, Villeurbanne, 69629 Lyon Cedex 07, France
| | - Kiersten M Ruff
- Department of Biomedical Engineering, Center for Science & Engineering of Living Systems (CSELS), Washington University in St. Louis, MO 63130, USA
| | - Edoardo Salladini
- Aix-Marseille University, CNRS, Architecture et Fonction des Macromolécules Biologiques (AFMB), Marseille 13288, France
| | - Marie Skepö
- Theoretical Chemistry, Lund University, Lund, POB 124, SE-221 00, Sweden
- LINXS - Lund Institute of Advanced Neutron and X-ray Science, Lund 223 70, Sweden
| | - Dmitri Svergun
- European Molecular Biology Laboratory, Hamburg Unit, Hamburg 22607, Germany
| | - Sylvain D Vallet
- Univ Lyon, University Claude Bernard Lyon 1, CNRS, INSA Lyon, CPE, Institute of Molecular and Supramolecular Chemistry and Biochemistry (ICBMS), UMR 5246, Villeurbanne, 69629 Lyon Cedex 07, France
| | - Mihaly Varadi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Peter Tompa
- To whom correspondence should be addressed. Tel +32 473 785386;
| | - Silvio C E Tosatto
- Correspondence may also be addressed to Silvio C. E. Tosatto. Tel: +39 049 827 6269;
| | - Damiano Piovesan
- Dept. of Biomedical Sciences, University of Padua, Padova 35131, Italy
| |
Collapse
|
37
|
Blum M, Chang HY, Chuguransky S, Grego T, Kandasaamy S, Mitchell A, Nuka G, Paysan-Lafosse T, Qureshi M, Raj S, Richardson L, Salazar GA, Williams L, Bork P, Bridge A, Gough J, Haft DH, Letunic I, Marchler-Bauer A, Mi H, Natale DA, Necci M, Orengo CA, Pandurangan AP, Rivoire C, Sigrist CJA, Sillitoe I, Thanki N, Thomas PD, Tosatto SCE, Wu CH, Bateman A, Finn RD. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res 2021; 49:D344-D354. [PMID: 33156333 PMCID: PMC7778928 DOI: 10.1093/nar/gkaa977] [Citation(s) in RCA: 1044] [Impact Index Per Article: 348.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/08/2020] [Accepted: 10/23/2020] [Indexed: 01/22/2023] Open
Abstract
The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan.
Collapse
Affiliation(s)
- Matthias Blum
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Hsin-Yu Chang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Sara Chuguransky
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Tiago Grego
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Swaathi Kandasaamy
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Alex Mitchell
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Gift Nuka
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Typhaine Paysan-Lafosse
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Matloob Qureshi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Shriya Raj
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Lorna Richardson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Gustavo A Salazar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Lowri Williams
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Peer Bork
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Alan Bridge
- Swiss-Prot Group, Swiss Institute of Bioinformatics, CMU, 1 rue Michel Servet, CH-1211, Geneva 4, Switzerland
| | - Julian Gough
- Medical Research Council Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Ave, Trumpington, Cambridge CB2 0QH, UK
| | - Daniel H Haft
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda MD 20894 USA
| | - Ivica Letunic
- Biobyte Solutions GmbH, Bothestr 142, 69126 Heidelberg, Germany
| | - Aron Marchler-Bauer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda MD 20894 USA
| | - Huaiyu Mi
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Darren A Natale
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Marco Necci
- Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy
| | - Christine A Orengo
- Department of Structural and Molecular Biology, University College London, Gower St, Bloomsbury, London WC1E 6BT, UK
| | - Arun P Pandurangan
- Medical Research Council Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Ave, Trumpington, Cambridge CB2 0QH, UK
| | - Catherine Rivoire
- Swiss-Prot Group, Swiss Institute of Bioinformatics, CMU, 1 rue Michel Servet, CH-1211, Geneva 4, Switzerland
| | - Christian J A Sigrist
- Swiss-Prot Group, Swiss Institute of Bioinformatics, CMU, 1 rue Michel Servet, CH-1211, Geneva 4, Switzerland
| | - Ian Sillitoe
- Department of Structural and Molecular Biology, University College London, Gower St, Bloomsbury, London WC1E 6BT, UK
| | - Narmada Thanki
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda MD 20894 USA
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy
| | - Cathy H Wu
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
38
|
Paladin L, Bevilacqua M, Errigo S, Piovesan D, Mičetić I, Necci M, Monzon AM, Fabre ML, Lopez JL, Nilsson JF, Rios J, Menna PL, Cabrera M, Buitron MG, Kulik MG, Fernandez-Alberti S, Fornasari MS, Parisi G, Lagares A, Hirsh L, Andrade-Navarro MA, Kajava AV, Tosatto SCE. RepeatsDB in 2021: improved data and extended classification for protein tandem repeat structures. Nucleic Acids Res 2021; 49:D452-D457. [PMID: 33237313 PMCID: PMC7778985 DOI: 10.1093/nar/gkaa1097] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/17/2020] [Accepted: 11/19/2020] [Indexed: 11/21/2022] Open
Abstract
The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structures from identical or similar sequences. This update of RepeatsDB aligns with our commitment to develop a resource that extracts, organizes and distributes specialized information on tandem repeat protein structures.
Collapse
Affiliation(s)
- Lisanna Paladin
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Martina Bevilacqua
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Sara Errigo
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Damiano Piovesan
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Ivan Mičetić
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Marco Necci
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | | | - Maria Laura Fabre
- IBBM-CONICET, Dept. of Biological Sciences, La Plata National University, 49 y 115, 1900 La Plata, Argentina
| | - Jose Luis Lopez
- IBBM-CONICET, Dept. of Biological Sciences, La Plata National University, 49 y 115, 1900 La Plata, Argentina
| | - Juliet F Nilsson
- IBBM-CONICET, Dept. of Biological Sciences, La Plata National University, 49 y 115, 1900 La Plata, Argentina
| | - Javier Rios
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Pablo Lorenzano Menna
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Maia Cabrera
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Martin Gonzalez Buitron
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Mariane Gonçalves Kulik
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Sebastian Fernandez-Alberti
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Maria Silvina Fornasari
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Gustavo Parisi
- Dept. of Science and Technology, National University of Quilmes, Roque Sáenz Peña 352, Bernal, Buenos Aires, Argentina
| | - Antonio Lagares
- IBBM-CONICET, Dept. of Biological Sciences, La Plata National University, 49 y 115, 1900 La Plata, Argentina
| | - Layla Hirsh
- Dept. of Engineering, Faculty of Science and Engineering, Pontifical Catholic University of Peru, Av. Universitaria 1801 San Miguel, Lima 32, Lima, Peru
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University of Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237, CNRS, Univ. Montpellier, Montpellier, France
| | - Silvio C E Tosatto
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| |
Collapse
|
39
|
Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer ELL, Tosatto SCE, Paladin L, Raj S, Richardson LJ, Finn RD, Bateman A. Pfam: The protein families database in 2021. Nucleic Acids Res 2021; 49:D412-D419. [PMID: 33125078 PMCID: PMC7779014 DOI: 10.1093/nar/gkaa913] [Citation(s) in RCA: 2297] [Impact Index Per Article: 765.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 10/01/2020] [Accepted: 10/06/2020] [Indexed: 12/19/2022] Open
Abstract
The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new families have been added in Pfam 33.1 and numerous improvements have been made to existing entries. To facilitate research on COVID-19, we have revised the Pfam entries that cover the SARS-CoV-2 proteome, and built new entries for regions that were not covered by Pfam. We have reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family. The new Pfam-B is based on a clustering by the MMseqs2 software. We have compared all of the regions in the RepeatsDB to those in Pfam and have started to use the results to build and refine Pfam repeat families. Pfam is freely available for browsing and download at http://pfam.xfam.org/.
Collapse
Affiliation(s)
- Jaina Mistry
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Sara Chuguransky
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Lowri Williams
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Matloob Qureshi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Gustavo A Salazar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy
| | - Lisanna Paladin
- Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy
| | - Shriya Raj
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Lorna J Richardson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| |
Collapse
|
40
|
Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer ELL, Tosatto SCE, Paladin L, Raj S, Richardson LJ, Finn RD, Bateman A. Pfam: The protein families database in 2021. Nucleic Acids Res 2021. [PMID: 33125078 DOI: 10.6019/tol.pfam_fams-t.2018.00001.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/14/2023] Open
Abstract
The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new families have been added in Pfam 33.1 and numerous improvements have been made to existing entries. To facilitate research on COVID-19, we have revised the Pfam entries that cover the SARS-CoV-2 proteome, and built new entries for regions that were not covered by Pfam. We have reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family. The new Pfam-B is based on a clustering by the MMseqs2 software. We have compared all of the regions in the RepeatsDB to those in Pfam and have started to use the results to build and refine Pfam repeat families. Pfam is freely available for browsing and download at http://pfam.xfam.org/.
Collapse
Affiliation(s)
- Jaina Mistry
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Sara Chuguransky
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Lowri Williams
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Matloob Qureshi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Gustavo A Salazar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy
| | - Lisanna Paladin
- Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy
| | - Shriya Raj
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Lorna J Richardson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| |
Collapse
|
41
|
Necci M, Piovesan D, Clementel D, Dosztányi Z, Tosatto SCE. MobiDB-lite 3.0: fast consensus annotation of intrinsic disorder flavours in proteins. Bioinformatics 2020; 36:5533-5534. [PMID: 33325498 DOI: 10.1093/bioinformatics/btaa1045] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Revised: 11/03/2020] [Accepted: 12/07/2020] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION The earlier version of MobiDB-lite is currently used in large-scale proteome annotation platforms to detect intrinsic disorder. However, new theoretical models allow for the classification of intrinsically disordered regions into subtypes from sequence features associated with specific polymeric properties or compositional bias. RESULTS MobiDB-lite 3.0 maintains its previous speed and performance but also provides a finer classification of disorder by identifying regions with characteristics of polyolyampholytes, positive or negative polyelectrolytes, low complexity regions or enriched in cysteine, proline or glycine or polar residues. Sub-regions are abundantly detected in IDRs of the human proteome. The new version of MobiDB-lite represents a new step for the proteome level analysis of protein disorder. AVAILABILITY Both the MobiDB-lite 3.0 source code and a docker container are available from the GitHub repository: https://github.com/BioComputingUP/MobiDB-lite.
Collapse
Affiliation(s)
- Marco Necci
- Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, 35121 Padova, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, 35121 Padova, Italy
| | - Damiano Clementel
- Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, 35121 Padova, Italy
| | - Zsuzsanna Dosztányi
- MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, ELTE Eötvös Loránd University, Pázmány Péter sétány 1/c, Budapest, Hungary
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, 35121 Padova, Italy
| |
Collapse
|
42
|
Palopoli N, Marchetti J, Monzon AM, Zea DJ, Tosatto SCE, Fornasari MS, Parisi G. Intrinsically Disordered Protein Ensembles Shape Evolutionary Rates Revealing Conformational Patterns. J Mol Biol 2020; 433:166751. [PMID: 33310020 DOI: 10.1016/j.jmb.2020.166751] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 12/01/2020] [Accepted: 12/05/2020] [Indexed: 10/22/2022]
Abstract
Intrinsically disordered proteins (IDPs) lack stable tertiary structure under physiological conditions. The unique composition and complex dynamical behaviour of IDPs make them a challenge for structural biology and molecular evolution studies. Using NMR ensembles, we found that IDPs evolve under a strong site-specific evolutionary rate heterogeneity, mainly originated by different constraints derived from their inter-residue contacts. Evolutionary rate profiles correlate with the experimentally observed conformational diversity of the protein, allowing the description of different conformational patterns possibly related to their structure-function relationships. The correlation between evolutionary rates and contact information improves when structural information is taken not from any individual conformer or the whole ensemble, but from combining a limited number of conformers. Our results suggest that residue contacts in disordered regions constrain evolutionary rates to conserve the dynamic behaviour of the ensemble and that evolutionary rates can be used as a proxy for the conformational diversity of IDPs.
Collapse
Affiliation(s)
- Nicolas Palopoli
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina
| | - Julia Marchetti
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina
| | | | - Diego J Zea
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Paris, France
| | | | - Maria S Fornasari
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina
| | - Gustavo Parisi
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina.
| |
Collapse
|
43
|
Quaglia F, Hatos A, Piovesan D, Tosatto SCE. Exploring Manually Curated Annotations of Intrinsically Disordered Proteins with DisProt. ACTA ACUST UNITED AC 2020; 72:e107. [PMID: 33017101 DOI: 10.1002/cpbi.107] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
DisProt is the major repository of manually curated data for intrinsically disordered proteins collected from the literature. Although lacking a stable tertiary structure under physiological conditions, intrinsically disordered proteins carry out a plethora of biological functions, some of them directly arising from their flexible nature. A growing number of scientific studies have been published during the last few decades in an effort to shed light on their unstructured state, their binding modes, and their functions. DisProt makes use of a team of expert biocurators to provide up-to-date annotations of intrinsically disordered proteins from the literature, making them available to the scientific community. Here we present a comprehensive description on how to use DisProt in different contexts and provide a detailed explanation of how to explore and interpret manually curated annotations of intrinsically disordered proteins. We describe how to search DisProt annotations, using both the web interface and the API for programmatic access. Finally, we explain how to visualize and interpret a DisProt entry, p53, a widely studied protein characterized by the presence of unstructured N-terminal and C-terminal regions. © 2020 Wiley Periodicals LLC. Basic Protocol 1: Performing a search in DisProt Support Protocol 1: Downloading options Support Protocol 2: Programmatic access with DisProt REST API Basic Protocol 2: Visualizing and interpreting DisProt entries: the p53 use case Basic Protocol 3: Providing feedback and submitting new intrinsic disorder-related data.
Collapse
Affiliation(s)
- Federica Quaglia
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - András Hatos
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | | |
Collapse
|
44
|
Jarnot P, Ziemska-Legiecka J, Dobson L, Merski M, Mier P, Andrade-Navarro MA, Hancock JM, Dosztányi Z, Paladin L, Necci M, Piovesan D, Tosatto SCE, Promponas VJ, Grynberg M, Gruca A. PlaToLoCo: the first web meta-server for visualization and annotation of low complexity regions in proteins. Nucleic Acids Res 2020; 48:W77-W84. [PMID: 32421769 PMCID: PMC7319588 DOI: 10.1093/nar/gkaa339] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 04/08/2020] [Accepted: 05/01/2020] [Indexed: 12/25/2022] Open
Abstract
Low complexity regions (LCRs) in protein sequences are characterized by a less diverse amino acid composition compared to typically observed sequence diversity. Recent studies have shown that LCRs may co-occur with intrinsically disordered regions, are highly conserved in many organisms, and often play important roles in protein functions and in diseases. In previous decades, several methods have been developed to identify regions with LCRs or amino acid bias, but most of them as stand-alone applications and currently there is no web-based tool which allows users to explore LCRs in protein sequences with additional functional annotations. We aim to fill this gap by providing PlaToLoCo - PLAtform of TOols for LOw COmplexity-a meta-server that integrates and collects the output of five different state-of-the-art tools for discovering LCRs and provides functional annotations such as domain detection, transmembrane segment prediction, and calculation of amino acid frequencies. In addition, the union or intersection of the results of the search on a query sequence can be obtained. By developing the PlaToLoCo meta-server, we provide the community with a fast and easily accessible tool for the analysis of LCRs with additional information included to aid the interpretation of the results. The PlaToLoCo platform is available at: http://platoloco.aei.polsl.pl/.
Collapse
Affiliation(s)
- Patryk Jarnot
- Department of Computer Networks and Systems, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | | | - Laszlo Dobson
- Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Práter u. 50/A, 1083 Budapest, Hungary.,Research Centre for Natural Sciences, Magyar Tudósok Körútja 2, 1117 Budapest, Hungary
| | - Matthew Merski
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Żwirki i Wigury 101, 02-089 Warsaw, Poland
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - John M Hancock
- ELIXIR, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Zsuzsanna Dosztányi
- Department of Biochemistry, ELTE Eötvös LorándUniversity, Budapest, Pázmány Péter stny 1/c 1117, Budapest, Hungary
| | - Lisanna Paladin
- Department of Biomedical Sciences, University of Padova, Via Ugo Bassi 58/B, 35131 Padova, Italy
| | - Marco Necci
- Department of Biomedical Sciences, University of Padova, Via Ugo Bassi 58/B, 35131 Padova, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Via Ugo Bassi 58/B, 35131 Padova, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova, Via Ugo Bassi 58/B, 35131 Padova, Italy
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, P.O. Box 20537, Nicosia, CY 1678, Cyprus
| | - Marcin Grynberg
- Institute of Biochemistry and Biophysics PAS, Pawinskiego 5A, 02-106 Warsaw, Poland
| | - Aleksandra Gruca
- Department of Computer Networks and Systems, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| |
Collapse
|
45
|
Paladin L, Necci M, Piovesan D, Mier P, Andrade-Navarro MA, Tosatto SCE. A novel approach to investigate the evolution of structured tandem repeat protein families by exon duplication. J Struct Biol 2020; 212:107608. [PMID: 32896658 DOI: 10.1016/j.jsb.2020.107608] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 08/19/2020] [Accepted: 08/21/2020] [Indexed: 11/30/2022]
Abstract
Tandem Repeat Proteins (TRPs) are ubiquitous in cells and are enriched in eukaryotes. They contributed to the evolution of organism complexity, specializing for functions that require quick adaptability such as immunity-related functions. To investigate the hypothesis of repeat protein evolution through exon duplication and rearrangement, we designed a tool to analyze the relationships between exon/intron patterns and structural symmetries. The tool allows comparison of the structure fragments as defined by exon/intron boundaries from Ensembl against the structural element repetitions from RepeatsDB. The all-against-all pairwise structural alignment between fragments and comparison of the two definitions (structural units and exons) are visualized in a single matrix, the "repeat/exon plot". An analysis of different repeat protein families, including the solenoids Leucine-Rich, Ankyrin, Pumilio, HEAT repeats and the β propellers Kelch-like, WD40 and RCC1, shows different behaviors, illustrated here through examples. For each example, the analysis of the exon mapping in homologous proteins supports the conservation of their exon patterns. We propose that when a clear-cut relationship between exon and structural boundaries can be identified, it is possible to infer a specific "evolutionary pattern" which may improve TRPs detection and classification.
Collapse
Affiliation(s)
| | - Marco Necci
- Dept. of Biomedical Sciences, University of Padova, Italy
| | | | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University of Mainz, Germany
| | | | | |
Collapse
|
46
|
Abstract
The von Hippel–Lindau protein (pVHL) is a tumour suppressor mainly known for its role as master regulator of hypoxia-inducible factor (HIF) activity. Functional inactivation of pVHL is causative of the von Hippel–Lindau disease, an inherited predisposition to develop different cancers. Due to its impact on human health, pVHL has been widely studied in the last few decades. However, investigations mostly focus on its role in degrading HIFs, whereas alternative pVHL protein–protein interactions and functions are insistently surfacing in the literature. In this review, we analyse these almost neglected functions by dissecting specific conditions in which pVHL is proposed to have differential roles in promoting cancer. We reviewed its role in regulating phosphorylation as a number of works suggest pVHL to act as an inhibitor by either degrading or promoting downregulation of specific kinases. Further, we summarize hypoxia-dependent and -independent pVHL interactions with multiple protein partners and discuss their implications in tumorigenesis.
Collapse
Affiliation(s)
- Giovanni Minervini
- Department of Biomedical Sciences, University of Padova, Viale G. Colombo 3, 35121 Padova, Italy
| | - Maria Pennuto
- Department of Biomedical Sciences, University of Padova, Viale G. Colombo 3, 35121 Padova, Italy.,Veneto Institute of Molecular Medicine, Via Orus 2, 35129 Padova, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova, Viale G. Colombo 3, 35121 Padova, Italy
| |
Collapse
|
47
|
Monzon AM, Necci M, Quaglia F, Walsh I, Zanotti G, Piovesan D, Tosatto SCE. Experimentally Determined Long Intrinsically Disordered Protein Regions Are Now Abundant in the Protein Data Bank. Int J Mol Sci 2020; 21:ijms21124496. [PMID: 32599863 PMCID: PMC7349999 DOI: 10.3390/ijms21124496] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Revised: 06/18/2020] [Accepted: 06/19/2020] [Indexed: 01/12/2023] Open
Abstract
Intrinsically disordered protein regions are commonly defined from missing electron density in X-ray structures. Experimental evidence for long disorder regions (LDRs) of at least 30 residues was so far limited to manually curated proteins. Here, we describe a comprehensive and large-scale analysis of experimental LDRs for 3133 unique proteins, demonstrating an increasing coverage of intrinsic disorder in the Protein Data Bank (PDB) in the last decade. The results suggest that long missing residue regions are a good quality source to annotate intrinsically disordered regions and perform functional analysis in large data sets. The consensus approach used to define LDRs allows to evaluate context dependent disorder and provide a common definition at the protein level.
Collapse
Affiliation(s)
- Alexander Miguel Monzon
- Department of Biomedical Sciences, University of Padua, 35131 Padua, Italy; (A.M.M.); (M.N.); (F.Q.); (G.Z.)
| | - Marco Necci
- Department of Biomedical Sciences, University of Padua, 35131 Padua, Italy; (A.M.M.); (M.N.); (F.Q.); (G.Z.)
| | - Federica Quaglia
- Department of Biomedical Sciences, University of Padua, 35131 Padua, Italy; (A.M.M.); (M.N.); (F.Q.); (G.Z.)
| | - Ian Walsh
- Bioprocessing Technology Institute, A*STAR, Singapore 138668, Singapore;
| | - Giuseppe Zanotti
- Department of Biomedical Sciences, University of Padua, 35131 Padua, Italy; (A.M.M.); (M.N.); (F.Q.); (G.Z.)
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padua, 35131 Padua, Italy; (A.M.M.); (M.N.); (F.Q.); (G.Z.)
- Correspondence: (D.P.); (S.C.E.T.)
| | - Silvio C. E. Tosatto
- Department of Biomedical Sciences, University of Padua, 35131 Padua, Italy; (A.M.M.); (M.N.); (F.Q.); (G.Z.)
- Correspondence: (D.P.); (S.C.E.T.)
| |
Collapse
|
48
|
Saldaño TE, Freixas VM, Tosatto SCE, Parisi G, Fernandez-Alberti S. Exploring Conformational Space with Thermal Fluctuations Obtained by Normal-Mode Analysis. J Chem Inf Model 2020; 60:3068-3080. [PMID: 32216314 DOI: 10.1021/acs.jcim.9b01136] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Proteins in their native states can be represented as ensembles of conformers in dynamical equilibrium. Thermal fluctuations are responsible for transitions between these conformers. Normal-modes analysis (NMA) using elastic network models (ENMs) provides an efficient procedure to explore global dynamics of proteins commonly associated with conformational transitions. In the present work, we present an iterative approach to explore protein conformational spaces by introducing structural distortions according to their equilibrium dynamics at room temperature. The approach can be used either to perform unbiased explorations of conformational space or to explore guided pathways connecting two different conformations, e.g., apo and holo forms. In order to test its performance, four proteins with different magnitudes of structural distortions upon ligand binding have been tested. In all cases, the conformational selection model has been confirmed and the conformational space between apo and holo forms has been encompassed. Different strategies have been tested that impact on the efficiency either to achieve a desired conformational change or to achieve a balanced exploration of the protein conformational multiplicity.
Collapse
Affiliation(s)
- Tadeo E Saldaño
- Universidad Nacional de Quilmes/CONICET, Roque Saenz Peña 352, B1876BXD Bernal, Argentina
| | - Victor M Freixas
- Universidad Nacional de Quilmes/CONICET, Roque Saenz Peña 352, B1876BXD Bernal, Argentina
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova, Viale G. Colombo 3, 5131 Padova, Italy
| | - Gustavo Parisi
- Universidad Nacional de Quilmes/CONICET, Roque Saenz Peña 352, B1876BXD Bernal, Argentina
| | | |
Collapse
|
49
|
Piovesan D, Hatos A, Minervini G, Quaglia F, Monzon AM, Tosatto SCE. Assessing predictors for new post translational modification sites: A case study on hydroxylation. PLoS Comput Biol 2020; 16:e1007967. [PMID: 32569263 PMCID: PMC7332089 DOI: 10.1371/journal.pcbi.1007967] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Revised: 07/02/2020] [Accepted: 05/19/2020] [Indexed: 12/15/2022] Open
Abstract
Post-translational modification (PTM) sites have become popular for predictor development. However, with the exception of phosphorylation and a handful of other examples, PTMs suffer from a limited number of available training examples and sparsity in protein sequences. Here, proline hydroxylation is taken as an example to compare different methods and evaluate their performance on new experimentally determined sites. As a guide for effective experimental design, predictors require both high specificity and sensitivity. However, the self-reported performance may often not be indicative of prediction quality and detection of new sites is not guaranteed. We have benchmarked seven published hydroxylation site predictors on two newly constructed independent datasets. The self-reported performance is found to widely overestimate the real accuracy measured on independent datasets. No predictor performs better than random on new examples, indicating the refined models do not sufficiently generalize to detect new sites. The number of false positives is high and precision low, in particular for non-collagen proteins whose motifs are not conserved. As hydroxylation site predictors do not generalize for new data, caution is advised when using PTM predictors in the absence of independent evaluations, in particular for highly specific sites involved in signalling. Machine learning methods are extensively used by biologists to design and interpret experiments. Predictors which take the only sequence as input are of particular interest due to the large amount of available sequence data and high self-reported performance. In this work, we evaluated post-translational modification (PTM) predictors for hydroxylation sites and found that they perform no better than random, in strong contrast to performances reported in their original publications. PTMs are chemical amino acid alterations providing the cell with conditional mechanisms to fine tune protein function, regulating complex biological processes such as signalling and cell cycle. Hydroxylation sites are a good PTM test case due to the availability of a range of predictors and an abundance of newly experimentally detected modification sites. Poor performances in our results highlight the overlooked problem of predicting PTMs when best practices are not followed and training data are likely incomplete. Experimentalists should be careful when using PTM predictors blindly and more independent assessments are needed to establish their usefulness in practice.
Collapse
Affiliation(s)
- Damiano Piovesan
- Department of Biomedical Sciences, University of Padua, Padua, Italy
- * E-mail:
| | - Andras Hatos
- Department of Biomedical Sciences, University of Padua, Padua, Italy
| | | | - Federica Quaglia
- Department of Biomedical Sciences, University of Padua, Padua, Italy
| | | | | |
Collapse
|
50
|
Mészáros B, Erdős G, Szabó B, Schád É, Tantos Á, Abukhairan R, Horváth T, Murvai N, Kovács OP, Kovács M, Tosatto SCE, Tompa P, Dosztányi Z, Pancsa R. PhaSePro: the database of proteins driving liquid-liquid phase separation. Nucleic Acids Res 2020; 48:D360-D367. [PMID: 31612960 PMCID: PMC7145634 DOI: 10.1093/nar/gkz848] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Revised: 09/11/2019] [Accepted: 10/07/2019] [Indexed: 11/13/2022] Open
Abstract
Membraneless organelles (MOs) are dynamic liquid condensates that host a variety of specific cellular processes, such as ribosome biogenesis or RNA degradation. MOs form through liquid-liquid phase separation (LLPS), a process that relies on multivalent weak interactions of the constituent proteins and other macromolecules. Since the first discoveries of certain proteins being able to drive LLPS, it emerged as a general mechanism for the effective organization of cellular space that is exploited in all kingdoms of life. While numerous experimental studies report novel cases, the computational identification of LLPS drivers is lagging behind, and many open questions remain about the sequence determinants, composition, regulation and biological relevance of the resulting condensates. Our limited ability to overcome these issues is largely due to the lack of a dedicated LLPS database. Therefore, here we introduce PhaSePro (https://phasepro.elte.hu), an openly accessible, comprehensive, manually curated database of experimentally validated LLPS driver proteins/protein regions. It not only provides a wealth of information on such systems, but improves the standardization of data by introducing novel LLPS-specific controlled vocabularies. PhaSePro can be accessed through an appealing, user-friendly interface and thus has definite potential to become the central resource in this dynamically developing field.
Collapse
Affiliation(s)
- Bálint Mészáros
- MTA-ELTE Momentum Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest H-1117, Hungary
| | - Gábor Erdős
- MTA-ELTE Momentum Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest H-1117, Hungary
| | - Beáta Szabó
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| | - Éva Schád
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| | - Ágnes Tantos
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| | - Rawan Abukhairan
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| | - Tamás Horváth
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| | - Nikoletta Murvai
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| | - Orsolya P Kovács
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| | - Márton Kovács
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova CNR Institute of Neuroscience, Padova, Italy
| | - Péter Tompa
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary.,Structural Biology (CSB), Brussels, Belgium; Structural Biology Brussels (SBB), Vrije Universiteit Brussel (VUB), Brussels 1050, Belgium
| | - Zsuzsanna Dosztányi
- MTA-ELTE Momentum Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest H-1117, Hungary
| | - Rita Pancsa
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| |
Collapse
|