1
|
Hu S, Zhao B. Protein function prediction using GO similarity-based heterogeneous network propagation. Sci Rep 2025; 15:19131. [PMID: 40450118 DOI: 10.1038/s41598-025-04933-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2025] [Accepted: 05/29/2025] [Indexed: 06/03/2025] Open
Abstract
Protein function prediction is a fundamental cornerstone in bioinformatics, providing critical insights into biological processes and disease mechanisms. Despite significant advances, challenges persist due to data sparsity and functional ambiguity. We introduce GOHPro (GO Similarity-based Heterogeneous Network Propagation), a novel method that constructs a heterogeneous network by integrating protein functional similarity (derived from domain profiles and modular complexes) with GO semantic relationships. This method applies a network propagation algorithm to prioritize annotations based on multi-omics context. When evaluated on yeast and human datasets, GOHPro outperformed six state-of-the-art methods. Specifically, it achieved Fmax improvements ranging from 6.8 to 47.5% over methods like exp2GO across the Biological Process (BP), Molecular Function (MF), and Cellular Component (CC) ontologies in both yeast and human species. Rigorous case studies on proteins with shared domains, such as AAA + ATPases, demonstrated GOHPro's ability to resolve functional ambiguity by leveraging contextual interactions and modular complexes. Further validation on the CAFA3 benchmark confirmed its generalizability, with Fmax gains exceeding 62% compared to baseline approaches in human species. Our analysis revealed that homology and network connectivity critically influence prediction robustness, with the modular similarity network compensating for evolutionary gaps in dark proteins. The framework's extensibility to de novo structural predictions highlights its potential to bridge the annotation gap in uncharacterized proteomes.
Collapse
Affiliation(s)
- Sai Hu
- School of Mathematics, Changsha University, Changsha, 410022, Hunan, China
| | - Bihai Zhao
- School of Computer Science and Engineering, Changsha University, Changsha, 410022, Hunan, China.
- Hunan Provincial Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, 410022, Hunan, China.
| |
Collapse
|
2
|
Abele M, Soleymaniniya A, Bayer FP, Lomp N, Doll E, Meng C, Neuhaus K, Scherer S, Wenning M, Wantia N, Kuster B, Wilhelm M, Ludwig C. Proteomic Diversity in Bacteria: Insights and Implications for Bacterial Identification. Mol Cell Proteomics 2025; 24:100917. [PMID: 39880082 PMCID: PMC11919601 DOI: 10.1016/j.mcpro.2025.100917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 12/20/2024] [Accepted: 01/23/2025] [Indexed: 01/31/2025] Open
Abstract
Mass spectrometry-based proteomics has revolutionized bacterial identification and elucidated many molecular mechanisms underlying bacterial growth, community formation, and drug resistance. However, most research has been focused on a few model bacteria, overlooking bacterial diversity. In this study, we present the most extensive bacterial proteomic resource to date, covering 303 species, 119 genera, and five phyla with over 636,000 unique expressed proteins, confirming the existence of over 38,700 hypothetical proteins. Accessible via the public resource ProteomicsDB, this dataset enables quantitative exploration of proteins within and across species. Additionally, we developed MS2Bac, a bacterial identification algorithm that queries NCBI's bacterial proteome space in two iterations. MS2Bac achieved over 99% species-level and 89% strain-level accuracy, surpassing methods like MALDI-TOF and FTIR, as demonstrated with food-derived bacterial isolates. MS2Bac also effectively identified bacteria in clinical samples, highlighting the potential of MS-based proteomics as a routine diagnostic tool.
Collapse
Affiliation(s)
- Miriam Abele
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), TUM School of Life Sciences, Technical University of Munich, Freising, Germany; Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Armin Soleymaniniya
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Florian P Bayer
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Nina Lomp
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Etienne Doll
- Research Department Molecular Life Sciences, TUM School of Life Sciences, Freising, Germany
| | - Chen Meng
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Klaus Neuhaus
- Core Facility Microbiome, ZIEL Institute for Food & Health, Technical University of Munich, Freising, Germany
| | - Siegfried Scherer
- Research Department Molecular Life Sciences, TUM School of Life Sciences, Freising, Germany
| | - Mareike Wenning
- Bavarian Health and Food Safety Authority, Unit for Food Microbiology and Hygiene, Oberschleißheim, Germany
| | - Nina Wantia
- Institut für Medizinische Mikrobiologie, Immunologie und Hygiene, TUM School of Medicine and Health Department Preclinical Medicine, Technical University of Munich, Munich, Germany
| | - Bernhard Kuster
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), TUM School of Life Sciences, Technical University of Munich, Freising, Germany; Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany; Munich Data Science Institute (MDSI), Technical University of Munich, Garching, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany; Munich Data Science Institute (MDSI), Technical University of Munich, Garching, Germany
| | - Christina Ludwig
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
| |
Collapse
|
3
|
Orchard SE. What have Data Standards ever done for us? Mol Cell Proteomics 2025:100933. [PMID: 40024375 DOI: 10.1016/j.mcpro.2025.100933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2024] [Revised: 02/21/2025] [Accepted: 02/24/2025] [Indexed: 03/04/2025] Open
Abstract
The Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has been successfully developing guidelines, data formats, and controlled vocabularies for both the field of molecular interaction and that of mass spectrometry for more than 20 years. This review explores some of the ways that the proteomics community has benefitted from the development of community standards and takes a look at some of the tools and resources that have been improved or developed as a result of the work of the HUPO-PSI.
Collapse
Affiliation(s)
- S E Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| |
Collapse
|
4
|
Bondhus L, Nava AA, Liu IS, Arboleda VA. Epigene functional diversity: isoform usage, disordered domain content, and variable binding partners. Epigenetics Chromatin 2025; 18:8. [PMID: 39893491 PMCID: PMC11786378 DOI: 10.1186/s13072-025-00571-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Accepted: 01/21/2025] [Indexed: 02/04/2025] Open
Abstract
BACKGROUND Epigenes are defined as proteins that perform post-translational modification of histones or DNA, reading of post-translational modifications, form complexes with epigenetic factors or changing the general structure of chromatin. This specialized group of proteins is responsible for controlling the organization of genomic DNA in a cell-type specific fashion, controlling normal development in a spatial and temporal fashion. Moreover, mutations in epigenes have been implicated as causal in germline pediatric disorders and as driver mutations in cancer. Despite their importance to human disease, to date, there has not been a systematic analysis of the sources of functional diversity for epigenes at large. Epigenes' unique functions that require the assembly of pools within the nucleus suggest that their structure and amino acid composition would have been enriched for features that enable efficient assembly of chromatin and DNA for transcription, splicing, and post-translational modifications. RESULTS In this study, we assess the functional diversity stemming from gene structure, isoforms, protein domains, and multiprotein complex formation that drive the functions of established epigenes. We found that there are specific structural features that enable epigenes to perform their variable roles depending on the cellular and environmental context. First, epigenes are significantly larger and have more exons compared with non-epigenes which contributes to increased isoform diversity. Second epigenes participate in more multimeric complexes than non-epigenes. Thirdly, given their proposed importance in membraneless organelles, we show epigenes are enriched for substantially larger intrinsically disordered regions (IDRs). Additionally, we assessed the specificity of their expression profiles and showed epigenes are more ubiquitously expressed consistent with their enrichment in pediatric syndromes with intellectual disability, multiorgan dysfunction, and developmental delay. Finally, in the L1000 dataset, we identify drugs that can potentially be used to modulate expression of these genes. CONCLUSIONS Here we identify significant differences in isoform usage, disordered domain content, and variable binding partners between human epigenes and non-epigenes using various functional genomics datasets from Ensembl, ENCODE, GTEx, HPO, LINCS L1000, and BrainSpan. Our results contribute new knowledge to the growing field focused on developing targeted therapies for diseases caused by epigene mutations, such as chromatinopathies and cancers.
Collapse
Affiliation(s)
- Leroy Bondhus
- Department of Human Genetics, David Geffen School of Medicine, UCLA, 615 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, 90095, USA
| | - Aileen A Nava
- Department of Human Genetics, David Geffen School of Medicine, UCLA, 615 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, 90095, USA
| | - Isabelle S Liu
- Department of Human Genetics, David Geffen School of Medicine, UCLA, 615 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, 90095, USA
| | - Valerie A Arboleda
- Department of Human Genetics, David Geffen School of Medicine, UCLA, 615 Charles E. Young Drive South, Los Angeles, CA, 90095, USA.
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, 90095, USA.
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, 90095, USA.
- Molecular Biology Institute, UCLA, Los Angeles, CA, 90095, USA.
- Jonsson Comprehensive Cancer Center, UCLA, Los Angeles, CA, 90095, USA.
| |
Collapse
|
5
|
The UniProt Consortium, Bateman A, Martin MJ, Orchard S, Magrane M, Adesina A, Ahmad S, Bowler-Barnett EH, Bye-A-Jee H, Carpentier D, Denny P, Fan J, Garmiri P, Gonzales LJDC, Hussein A, Ignatchenko A, Insana G, Ishtiaq R, Joshi V, Jyothi D, Kandasaamy S, Lock A, Luciani A, Luo J, Lussi Y, Marin JSM, Raposo P, Rice DL, Santos R, Speretta E, Stephenson J, Totoo P, Tyagi N, Urakova N, Vasudev P, Warner K, Wijerathne S, Yu CWH, Zaru R, Bridge AJ, Aimo L, Argoud-Puy G, Auchincloss AH, Axelsen KB, Bansal P, Baratin D, Batista Neto TM, Blatter MC, Bolleman JT, Boutet E, Breuza L, Gil BC, Casals-Casas C, Echioukh KC, Coudert E, Cuche B, de Castro E, Estreicher A, Famiglietti ML, Feuermann M, Gasteiger E, Gaudet P, Gehant S, Gerritsen V, Gos A, Gruaz N, Hulo C, Hyka-Nouspikel N, Jungo F, Kerhornou A, Mercier PL, Lieberherr D, Masson P, Morgat A, Paesano S, Pedruzzi I, Pilbout S, Pourcel L, Poux S, Pozzato M, Pruess M, Redaschi N, Rivoire C, Sigrist CJA, Sonesson K, Sundaram S, Sveshnikova A, Wu CH, Arighi CN, Chen C, Chen Y, Huang H, Laiho K, Lehvaslaiho M, McGarvey P, Natale DA, Ross K, Vinayaka CR, Wang Y, Zhang J. UniProt: the Universal Protein Knowledgebase in 2025. Nucleic Acids Res 2025; 53:D609-D617. [PMID: 39552041 PMCID: PMC11701636 DOI: 10.1093/nar/gkae1010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Revised: 10/14/2024] [Accepted: 10/16/2024] [Indexed: 11/19/2024] Open
Abstract
The aim of the UniProt Knowledgebase (UniProtKB; https://www.uniprot.org/) is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this publication, we describe ongoing changes to our production pipeline to limit the sequences available in UniProtKB to high-quality, non-redundant reference proteomes. We continue to manually curate the scientific literature to add the latest functional data and use machine learning techniques. We also encourage community curation to ensure key publications are not missed. We provide an update on the automatic annotation methods used by UniProtKB to predict information for unreviewed entries describing unstudied proteins. Finally, updates to the UniProt website are described, including a new tab linking protein to genomic information. In recognition of its value to the scientific community, the UniProt database has been awarded Global Core Biodata Resource status.
Collapse
|
6
|
Buzzao D, Persson E, Guala D, Sonnhammer ELL. FunCoup 6: advancing functional association networks across species with directed links and improved user experience. Nucleic Acids Res 2025; 53:D658-D671. [PMID: 39530220 PMCID: PMC11701656 DOI: 10.1093/nar/gkae1021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Revised: 10/11/2024] [Accepted: 10/17/2024] [Indexed: 11/16/2024] Open
Abstract
FunCoup 6 (https://funcoup.org) represents a significant advancement in global functional association networks, aiming to provide researchers with a comprehensive view of the functional coupling interactome. This update introduces novel methodologies and integrated tools for improved network inference and analysis. Major new developments in FunCoup 6 include vastly expanding the coverage of gene regulatory links, a new framework for bin-free Bayesian training and a new website. FunCoup 6 integrates a new tool for disease and drug target module identification using the TOPAS algorithm. To expand the utility of the resource for biomedical research, it incorporates pathway enrichment analysis using the ANUBIX and EASE algorithms. The unique comparative interactomics analysis in FunCoup provides insights of network conservation, now allowing users to align orthologs only or query each species network independently. Bin-free training was applied to 23 primary species, and in addition, networks were generated for all remaining 618 species in InParanoiDB 9. Accompanying these advancements, FunCoup 6 features a new redesigned website, together with updated API functionalities, and represents a pivotal step forward in functional genomics research, offering unique capabilities for exploring the complex landscape of protein interactions.
Collapse
Affiliation(s)
- Davide Buzzao
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21Solna, Sweden
| | - Emma Persson
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21Solna, Sweden
| | - Dimitri Guala
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21Solna, Sweden
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 171 21Solna, Sweden
| |
Collapse
|
7
|
Szklarczyk D, Nastou K, Koutrouli M, Kirsch R, Mehryary F, Hachilif R, Hu D, Peluso ME, Huang Q, Fang T, Doncheva NT, Pyysalo S, Bork P, Jensen LJ, von Mering C. The STRING database in 2025: protein networks with directionality of regulation. Nucleic Acids Res 2025; 53:D730-D737. [PMID: 39558183 PMCID: PMC11701646 DOI: 10.1093/nar/gkae1113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Revised: 10/18/2024] [Accepted: 10/29/2024] [Indexed: 11/20/2024] Open
Abstract
Proteins cooperate, regulate and bind each other to achieve their functions. Understanding the complex network of their interactions is essential for a systems-level description of cellular processes. The STRING database compiles, scores and integrates protein-protein association information drawn from experimental assays, computational predictions and prior knowledge. Its goal is to create comprehensive and objective global networks that encompass both physical and functional interactions. Additionally, STRING provides supplementary tools such as network clustering and pathway enrichment analysis. The latest version, STRING 12.5, introduces a new 'regulatory network', for which it gathers evidence on the type and directionality of interactions using curated pathway databases and a fine-tuned language model parsing the literature. This update enables users to visualize and access three distinct network types-functional, physical and regulatory-separately, each applicable to distinct research needs. In addition, the pathway enrichment detection functionality has been updated, with better false discovery rate corrections, redundancy filtering and improved visual displays. The resource now also offers improved annotations of clustered networks and provides users with downloadable network embeddings, which facilitate the use of STRING networks in machine learning and allow cross-species transfer of protein information. The STRING database is available online at https://string-db.org/.
Collapse
Affiliation(s)
- Damian Szklarczyk
- Department of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Amphipôle, Quartier UNIL-Sorge, 1015 Lausanne, Switzerland
| | - Katerina Nastou
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Blegdamsvej 3B, 2200 Copenhagen N, Denmark
| | - Mikaela Koutrouli
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Blegdamsvej 3B, 2200 Copenhagen N, Denmark
| | - Rebecca Kirsch
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Blegdamsvej 3B, 2200 Copenhagen N, Denmark
| | - Farrokh Mehryary
- TurkuNLP Lab, Department of Computing, University of Turku, Vesilinnantie 5, 20014 Turku, Finland
| | - Radja Hachilif
- Department of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Amphipôle, Quartier UNIL-Sorge, 1015 Lausanne, Switzerland
| | - Dewei Hu
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Blegdamsvej 3B, 2200 Copenhagen N, Denmark
| | - Matteo E Peluso
- Department of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Amphipôle, Quartier UNIL-Sorge, 1015 Lausanne, Switzerland
| | - Qingyao Huang
- Department of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Amphipôle, Quartier UNIL-Sorge, 1015 Lausanne, Switzerland
| | - Tao Fang
- Department of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Amphipôle, Quartier UNIL-Sorge, 1015 Lausanne, Switzerland
| | - Nadezhda T Doncheva
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Blegdamsvej 3B, 2200 Copenhagen N, Denmark
| | - Sampo Pyysalo
- TurkuNLP Lab, Department of Computing, University of Turku, Vesilinnantie 5, 20014 Turku, Finland
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany
- Max Delbrück Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13125 Berlin, Germany
- Department of Bioinformatics, Biozentrum, University of Würzburg, Am Hubland, 97074 Würzburg, Germany
| | - Lars J Jensen
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Blegdamsvej 3B, 2200 Copenhagen N, Denmark
| | - Christian von Mering
- Department of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Amphipôle, Quartier UNIL-Sorge, 1015 Lausanne, Switzerland
| |
Collapse
|
8
|
Steinkamp R, Tsitsiridis G, Brauner B, Montrone C, Fobo G, Frishman G, Avram S, Oprea T, Ruepp A. CORUM in 2024: protein complexes as drug targets. Nucleic Acids Res 2025; 53:D651-D657. [PMID: 39526397 PMCID: PMC11701639 DOI: 10.1093/nar/gkae1033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 10/10/2024] [Accepted: 10/22/2024] [Indexed: 11/16/2024] Open
Abstract
CORUM (https://mips.helmholtz-muenchen.de/corum/) is a public database that offers comprehensive information about mammalian protein complexes, including their subunits, functions and associations with human diseases. The newly released CORUM 5.0, encompassing 7193 protein complexes, is the largest dataset of manually curated mammalian protein complexes publicly available. This update represents the most significant upgrade to the database in >15 years. At present, the molecular processes in cells that are influenced by drugs are only incompletely understood. In this latest release, we have begun systematically investigating the impact of drugs on protein complexes. Our studies are based on a dataset from DrugCentral comprising 725 protein drug targets with approved drugs and known mechanisms of action. To date, we have identified 1975 instances from the literature where a drug affects the formation and/or function of a protein complex. Numerous examples highlight the crucial role of understanding drug-protein complex relationships in drug efficacy. The expanded dataset and the inclusion of drug effects on protein complexes are expected to significantly enhance the utility and application potential of CORUM 5.0 in fields such as network medicine and pharmacological research.
Collapse
Affiliation(s)
- Ralph Steinkamp
- Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Ingolstädter Landstr. 1, Neuherberg D-85764, Germany
| | - George Tsitsiridis
- Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Ingolstädter Landstr. 1, Neuherberg D-85764, Germany
| | - Barbara Brauner
- Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Ingolstädter Landstr. 1, Neuherberg D-85764, Germany
| | - Corinna Montrone
- Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Ingolstädter Landstr. 1, Neuherberg D-85764, Germany
| | - Gisela Fobo
- Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Ingolstädter Landstr. 1, Neuherberg D-85764, Germany
| | - Goar Frishman
- Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Ingolstädter Landstr. 1, Neuherberg D-85764, Germany
| | - Sorin Avram
- Department of Computational Chemistry, “Coriolan Dragulescu” Institute of Chemistry, 24 Mihai Viteazu Blvd, Timisoara, Timis 300223, Romania
| | - Tudor I Oprea
- Expert Systems Inc., 12730 High Bluff Drive, San Diego, CA 92130, USA
| | - Andreas Ruepp
- Institute of Experimental Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Ingolstädter Landstr. 1, Neuherberg D-85764, Germany
| |
Collapse
|
9
|
Balu S, Huget S, Medina Reyes JJ, Ragueneau E, Panneerselvam K, Fischer SN, Claussen ER, Kourtis S, Combe C, Meldal BHM, Perfetto L, Rappsilber J, Kustatscher G, Drew K, Orchard S, Hermjakob H. Complex portal 2025: predicted human complexes and enhanced visualisation tools for the comparison of orthologous and paralogous complexes. Nucleic Acids Res 2025; 53:D644-D650. [PMID: 39558156 PMCID: PMC11701666 DOI: 10.1093/nar/gkae1085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Revised: 10/16/2024] [Accepted: 10/24/2024] [Indexed: 11/20/2024] Open
Abstract
The Complex Portal (www.ebi.ac.uk/complexportal) is a manually curated reference database for molecular complexes. It is a unifying web resource linking aggregated data on composition, topology and the function of macromolecular complexes from 28 species. In addition to significantly extending the number of manually curated complexes, we have massively extended the coverage of the human complexome through the incorporation of high confidence assemblies predicted by machine-learning algorithms trained on large-scale experimental data. The current content of the portal comprising 2150 human complexes has been augmented by 14 964 machine-learning (ML) predicted complexes from hu.MAP3.0. We have refactored the website to enable easy search and filtering of these different classes of protein complexes and have implemented the Complex Navigator, a visualisation tool to facilitate comparison of related complexes in the context of orthology or paralogy. We have embedded the Rhea reaction visualisation tool into the website to enable users to view the catalytic activity of enzyme complexes.
Collapse
Affiliation(s)
- Sucharitha Balu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Susie Huget
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Juan Jose Medina Reyes
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Eliot Ragueneau
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kalpana Panneerselvam
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Samantha N Fischer
- Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Erin R Claussen
- Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL 60607, USA
| | | | - Colin W Combe
- Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, UK
| | | | - Livia Perfetto
- University of Rome La Sapienza, department of Biology and Biotechnologies “C. Darwin”, Rome, Italy
| | - Juri Rappsilber
- Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL 60607, USA
- Technische Universität Berlin, Chair of Bioanalytics, 10623 Berlin, Germany
| | - Georg Kustatscher
- Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, UK
| | - Kevin Drew
- Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
10
|
Samarasinghe K, Kotlyar M, Vallet S, Hayes C, Naba A, Jurisica I, Lisacek F, Ricard-Blum S. MatrixDB 2024: an increased coverage of extracellular matrix interactions, a new Network Explorer and a new web interface. Nucleic Acids Res 2025; 53:D1677-D1682. [PMID: 39558161 PMCID: PMC11701626 DOI: 10.1093/nar/gkae1088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Revised: 10/23/2024] [Accepted: 10/27/2024] [Indexed: 11/20/2024] Open
Abstract
MatrixDB, a member of the International Molecular Exchange consortium (IMEx), is a curated interaction database focused on interactions established by extracellular matrix (ECM) constituents including proteins, proteoglycans, glycosaminoglycans and ECM bioactive fragments. The architecture of MatrixDB was upgraded to ease interaction data export, allow versioning and programmatic access and ensure sustainability. The new version of the database includes more than twice the number of manually curated and experimentally-supported interactions. High-confidence predicted interactions were imported from the Integrated Interactions Database to increase the coverage of the ECM interactome. ECM and ECM-associated proteins of five species (human, murine, bovine, avian and zebrafish) were annotated with matrisome divisions and categories, which are used for computational analyses of ECM -omic datasets. Biological pathways from the Reactome Pathway Knowledgebase were also added to the biomolecule description. New transcriptomic and expanded proteomic datasets were imported in MatrixDB to generate cell- and tissue-specific ECM networks using the newly developed in-house Network Explorer integrated in the database. MatrixDB is freely available at https://matrixdb.univ-lyon1.fr.
Collapse
Affiliation(s)
| | - Max Kotlyar
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute and Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, University Health Network, Toronto, ON M5T 0S8, Canada
| | - Sylvain D Vallet
- Institut de Biologie Structurale, UMR 5075, CEA, CNRS, Université Grenoble Alpes, Grenoble 38000, France
| | | | - Alexandra Naba
- Department of Physiology and Biophysics, University of Illinois Chicago, Chicago, IL 60612, USA
| | - Igor Jurisica
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute and Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, University Health Network, Toronto, ON M5T 0S8, Canada
- Departments of Medical Biophysics and Computer Science, and the Faculty of Dentistry, University of Toronto, Toronto, Ontario, Canada
- Institute of Neuroimmunology, Slovak Academy of Sciences, Bratislava, Slovakia
| | | | - Sylvie Ricard-Blum
- Institut de Chimie et Biochimie Moléculaires et Supramoléculaires (ICBMS), UMR 5246, CNRS, Université Lyon 1, Villeurbanne 69622, France
| |
Collapse
|
11
|
Daou L, Hanna EM. Predicting protein complexes in protein interaction networks using Mapper and graph convolution networks. Comput Struct Biotechnol J 2024; 23:3595-3609. [PMID: 39493503 PMCID: PMC11530816 DOI: 10.1016/j.csbj.2024.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 10/04/2024] [Accepted: 10/04/2024] [Indexed: 11/05/2024] Open
Abstract
Protein complexes are groups of interacting proteins that are central to multiple biological processes. Studying protein complexes can enhance our understanding of cellular functions and malfunctions and thus support the development of effective disease treatments. High-throughput experimental techniques allow the generation of large-scale protein-protein interaction datasets. Accordingly, various computational approaches to predict protein complexes from protein-protein interactions were presented in the literature. They are typically based on networks in which nodes and edges represent proteins and their interactions, respectively. State-of-the-art approaches mainly rely on clustering static networks to identify complexes. However, since protein interactions are highly dynamic in nature, recent approaches seek to model such dynamics by typically integrating gene expression data and identifying protein complexes accordingly. We propose MComplex, a method that utilizes time-series gene expression with interaction data to generate a temporal network which is passed to a generative adversarial network whose generator is a graph convolutional network. This creates embeddings which are then analyzed using a modified graph-based version of the Mapper algorithm to predict corresponding protein complexes. We test our approach on multiple benchmark datasets and compare identified complexes against gold-standard protein complex datasets. Our results show that MComplex outperforms existing methods in several evaluation aspects, namely recall and maximum matching ratio as well as a composite score covering aggregated evaluation measures. The code and data are available for free download from https://github.com/LeonardoDaou/MComplex.
Collapse
Affiliation(s)
- Leonardo Daou
- Department of Computer Science and Mathematics, Lebanese American University, Byblos, Lebanon
| | - Eileen Marie Hanna
- Department of Computer Science and Mathematics, Lebanese American University, Byblos, Lebanon
| |
Collapse
|
12
|
Foo B, Amedei H, Kaur S, Jaawan S, Boshnakovska A, Gall T, de Boer RA, Silljé HHW, Urlaub H, Rehling P, Lenz C, Lehnart SE. Unbiased complexome profiling and global proteomics analysis reveals mitochondrial impairment and potential changes at the intercalated disk in presymptomatic R14Δ/+ mice hearts. PLoS One 2024; 19:e0311203. [PMID: 39446877 PMCID: PMC11501035 DOI: 10.1371/journal.pone.0311203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 09/15/2024] [Indexed: 10/26/2024] Open
Abstract
Phospholamban (PLN) is a sarco-endoplasmic reticulum (SER) membrane protein that regulates cardiac contraction/relaxation by reversibly inhibiting the SERCA2a Ca2+-reuptake pump. The R14Δ-PLN mutation causes severe cardiomyopathy that is resistant to conventional treatment. Protein complexes and higher-order supercomplexes such as intercalated disk components and Ca+2-cycling domains underlie many critical cardiac functions, a subset of which may be disrupted by R14Δ-PLN. Complexome profiling (CP) is a proteomics workflow for systematic analysis of high molecular weight (MW) protein complexes and supercomplexes. We hypothesize that R14Δ-PLN may alter a subset of these assemblies, and apply CP workflows to explore these changes in presymptomatic R14Δ/+ mice hearts. Ventricular tissues from presymptomatic 28wk-old WT and R14Δ/+ mice were homogenized under non-denaturing conditions, fractionated by size-exclusion chromatography (SEC) with a linear MW-range exceeding 5 MDa, and subjected to quantitative data-independent acquisition mass spectrometry (DIA-MS) analysis. Unfortunately, current workflows for the systematic analysis of CP data proved ill-suited for use in cardiac samples. Most rely upon curated protein complex databases to provide ground-truth for analysis; however, these are derived primarily from cancerous or immortalized cell lines and, consequently, cell-type specific complexes (including cardiac-specific machinery potentially affected in R14Δ-PLN hearts) are poorly covered. We thus developed PERCOM: a novel CP data-analysis strategy that does not rely upon these databases and can, furthermore, be implemented on widely available spreadsheet software. Applying PERCOM to our CP dataset resulted in the identification of 296 proteins with disrupted elution profiles. Hits were significantly enriched for mitochondrial and intercalated disk (ICD) supercomplex components. Changes to mitochondrial supercomplexes were associated with reduced expression of mitochondrial proteins and maximal oxygen consumption rate. The observed alterations to mitochondrial and ICD supercomplexes were replicated in a second cohort of "juvenile" 9wk-old mice. These early-stage changes to key cardiac machinery may contribute to R14Δ-PLN pathogenesis.
Collapse
Affiliation(s)
- Brian Foo
- Department of Cardiology and Pneumology, Heart Research Center Göttingen, Cellular Biophysics and Translational Cardiology Section, University Medical Center Göttingen, Göttingen, Germany
- Cluster of Excellence “Multiscale Bioimaging: from Molecular Machines to Networks of Excitable Cells” (MBExC), University of Göttingen, Göttingen, Germany
| | - Hugo Amedei
- Department of Clinical Chemistry, University Medical Center Göttingen, Göttingen, Germany
| | - Surmeet Kaur
- Department of Clinical Chemistry, University Medical Center Göttingen, Göttingen, Germany
| | - Samir Jaawan
- Department of Cardiology and Pneumology, Heart Research Center Göttingen, Cellular Biophysics and Translational Cardiology Section, University Medical Center Göttingen, Göttingen, Germany
- Cluster of Excellence “Multiscale Bioimaging: from Molecular Machines to Networks of Excitable Cells” (MBExC), University of Göttingen, Göttingen, Germany
| | - Angela Boshnakovska
- Department of Cellular Biochemistry, University Medical Center Göttingen, Göttingen, Germany
| | - Tanja Gall
- Department of Cellular Biochemistry, University Medical Center Göttingen, Göttingen, Germany
| | - Rudolf A. de Boer
- Department of Cardiology, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
- Department of Cardiology, Erasmus MC, Thorax Center, Cardiovascular Institute, Rotterdam, the Netherlands
| | - Herman H. W. Silljé
- Department of Cardiology, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Henning Urlaub
- Cluster of Excellence “Multiscale Bioimaging: from Molecular Machines to Networks of Excitable Cells” (MBExC), University of Göttingen, Göttingen, Germany
- Department of Clinical Chemistry, University Medical Center Göttingen, Göttingen, Germany
- Bioanalytical Mass Spectrometry Group, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
| | - Peter Rehling
- Department of Cellular Biochemistry, University Medical Center Göttingen, Göttingen, Germany
| | - Christof Lenz
- Cluster of Excellence “Multiscale Bioimaging: from Molecular Machines to Networks of Excitable Cells” (MBExC), University of Göttingen, Göttingen, Germany
- Department of Clinical Chemistry, University Medical Center Göttingen, Göttingen, Germany
- Bioanalytical Mass Spectrometry Group, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
| | - Stephan E. Lehnart
- Department of Cardiology and Pneumology, Heart Research Center Göttingen, Cellular Biophysics and Translational Cardiology Section, University Medical Center Göttingen, Göttingen, Germany
- Cluster of Excellence “Multiscale Bioimaging: from Molecular Machines to Networks of Excitable Cells” (MBExC), University of Göttingen, Göttingen, Germany
| |
Collapse
|
13
|
Jakobson CM, Hartl J, Trébulle P, Mülleder M, Jarosz DF, Ralser M. A genome-to-proteome atlas charts natural variants controlling proteome diversity and forecasts their fitness effects. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.18.619054. [PMID: 39484408 PMCID: PMC11526991 DOI: 10.1101/2024.10.18.619054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
Despite abundant genomic and phenotypic data across individuals and environments, the functional impact of most mutations on phenotype remains unclear. Here, we bridge this gap by linking genome to proteome in 800 meiotic progeny from an intercross between two closely related Saccharomyces cerevisiae isolates adapted to distinct niches. Modest genetic distance between the parents generated remarkable proteomic diversity that was amplified in the progeny and captured by 6,476 genotype-protein associations, over 1,600 of which we resolved to single variants. Proteomic adaptation emerged through the combined action of numerous cis- and trans-regulatory mutations, a regulatory architecture that was conserved across the species. Notably, trans-regulatory variants often arose in proteins not traditionally associated with gene regulation, such as enzymes. Moreover, the proteomic consequences of mutations predicted fitness under various stresses. Our study demonstrates that the collective action of natural genetic variants drives dramatic proteome diversification, with molecular consequences that forecast phenotypic outcomes.
Collapse
Affiliation(s)
- Christopher M. Jakobson
- Depasssrtment of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, CA, USA
| | - Johannes Hartl
- Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
- Department of Biochemistry, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Pauline Trébulle
- Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Michael Mülleder
- Core Facility High-Throughput Mass Spectrometry, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Daniel F. Jarosz
- Depasssrtment of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, CA, USA
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA, USA
| | - Markus Ralser
- Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
- Department of Biochemistry, Charité-Universitätsmedizin Berlin, Berlin, Germany
- Max Planck Institute for Molecular Genetics, Berlin, Germany
| |
Collapse
|
14
|
Dörig C, Marulli C, Peskett T, Volkmar N, Pantolini L, Studer G, Paleari C, Frommelt F, Schwede T, de Souza N, Barral Y, Picotti P. Global profiling of protein complex dynamics with an experimental library of protein interaction markers. Nat Biotechnol 2024:10.1038/s41587-024-02432-8. [PMID: 39415059 DOI: 10.1038/s41587-024-02432-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 09/16/2024] [Indexed: 10/18/2024]
Abstract
Methods to systematically monitor protein complex dynamics are needed. We introduce serial ultrafiltration combined with limited proteolysis-coupled mass spectrometry (FLiP-MS), a structural proteomics workflow that generates a library of peptide markers specific to changes in PPIs by probing differences in protease susceptibility between complex-bound and monomeric forms of proteins. The library includes markers mapping to protein-binding interfaces and markers reporting on structural changes that accompany PPI changes. Integrating the marker library with LiP-MS data allows for global profiling of protein-protein interactions (PPIs) from unfractionated lysates. We apply FLiP-MS to Saccharomyces cerevisiae and probe changes in protein complex dynamics after DNA replication stress, identifying links between Spt-Ada-Gcn5 acetyltransferase activity and the assembly state of several complexes. FLiP-MS enables protein complex dynamics to be probed on any perturbation, proteome-wide, at high throughput, with peptide-level structural resolution and informing on occupancy of binding interfaces, thus providing both global and molecular views of a system under study.
Collapse
Affiliation(s)
- Christian Dörig
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland
| | - Cathy Marulli
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland
| | - Thomas Peskett
- Institute of Biochemistry, Department of Biology, ETH Zurich, Zurich, Switzerland
| | - Norbert Volkmar
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland
| | - Lorenzo Pantolini
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Computational Structural Biology, Basel, Switzerland
| | - Gabriel Studer
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Computational Structural Biology, Basel, Switzerland
| | - Camilla Paleari
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland
| | - Fabian Frommelt
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Computational Structural Biology, Basel, Switzerland
| | - Natalie de Souza
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland
| | - Yves Barral
- Institute of Biochemistry, Department of Biology, ETH Zurich, Zurich, Switzerland
| | - Paola Picotti
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland.
| |
Collapse
|
15
|
Abrusán G, Zelezniak A. Cellular location shapes quaternary structure of enzymes. Nat Commun 2024; 15:8505. [PMID: 39353940 PMCID: PMC11445431 DOI: 10.1038/s41467-024-52662-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Accepted: 09/18/2024] [Indexed: 10/03/2024] Open
Abstract
The main forces driving protein complex evolution are currently not well understood, especially in homomers, where quaternary structure might frequently evolve neutrally. Here we examine the factors determining oligomerisation by analysing the evolution of enzymes in circumstances where homomers rarely evolve. We show that 1) In extracellular environments, most enzymes with known structure are monomers, while in the cytoplasm homomers, indicating that the evolution of oligomers is cellular environment dependent; 2) The evolution of quaternary structure within protein orthogroups is more consistent with the predictions of constructive neutral evolution than an adaptive process: quaternary structure is gained easier than it is lost, and most extracellular monomers evolved from proteins that were monomers also in their ancestral state, without the loss of interfaces. Our results indicate that oligomerisation is context-dependent, and even when adaptive, in many cases it is probably not driven by the intrinsic properties of enzymes, like their biochemical function, but rather the properties of the environment where the enzyme is active. These factors might be macromolecular crowding and excluded volume effects facilitating the evolution of interfaces, and the maintenance of cellular homeostasis through shaping cytoplasm fluidity, protein degradation, or diffusion rates.
Collapse
Affiliation(s)
- György Abrusán
- Randall Centre for Cell and Molecular Biophysics, School of Basic and Medical Biosciences, King's College London, New Hunt's House, London, UK.
| | - Aleksej Zelezniak
- Randall Centre for Cell and Molecular Biophysics, School of Basic and Medical Biosciences, King's College London, New Hunt's House, London, UK
- Department of Life Sciences, Chalmers University of Technology, Gothenburg, Sweden
- Institute of Biotechnology, Life Sciences Centre, Vilnius University, Vilnius, Lithuania
| |
Collapse
|
16
|
Zhang C, Sánchez BJ, Li F, Eiden CWQ, Scott WT, Liebal UW, Blank LM, Mengers HG, Anton M, Rangel AT, Mendoza SN, Zhang L, Nielsen J, Lu H, Kerkhoven EJ. Yeast9: a consensus genome-scale metabolic model for S. cerevisiae curated by the community. Mol Syst Biol 2024; 20:1134-1150. [PMID: 39134886 PMCID: PMC11450192 DOI: 10.1038/s44320-024-00060-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 07/17/2024] [Accepted: 07/31/2024] [Indexed: 10/05/2024] Open
Abstract
Genome-scale metabolic models (GEMs) can facilitate metabolism-focused multi-omics integrative analysis. Since Yeast8, the yeast-GEM of Saccharomyces cerevisiae, published in 2019, has been continuously updated by the community. This has increased the quality and scope of the model, culminating now in Yeast9. To evaluate its predictive performance, we generated 163 condition-specific GEMs constrained by single-cell transcriptomics from osmotic pressure or reference conditions. Comparative flux analysis showed that yeast adapting to high osmotic pressure benefits from upregulating fluxes through central carbon metabolism. Furthermore, combining Yeast9 with proteomics revealed metabolic rewiring underlying its preference for nitrogen sources. Lastly, we created strain-specific GEMs (ssGEMs) constrained by transcriptomics for 1229 mutant strains. Well able to predict the strains' growth rates, fluxomics from those large-scale ssGEMs outperformed transcriptomics in predicting functional categories for all studied genes in machine learning models. Based on those findings we anticipate that Yeast9 will continue to empower systems biology studies of yeast metabolism.
Collapse
Affiliation(s)
- Chengyu Zhang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 200240, Shanghai, China
- State Key Laboratory of Bioreactor Engineering, and School of Biotechnology, East China University of Science and Technology (ECUST), 200237, Shanghai, China
| | - Benjamín J Sánchez
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, DK-2800 Kgs, Lyngby, Denmark
- Department of Biotechnology and Biomedicine, Technical University of Denmark, DK-2800 Kgs, Lyngby, Denmark
| | - Feiran Li
- Institute of Biopharmaceutical and Health Engineering, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Cheng Wei Quan Eiden
- School of Chemistry, Chemical Engineering and Biotechnology, Nanyang Technological University, 62 Nanyang Drive, Singapore, 637459, Singapore
| | - William T Scott
- UNLOCK, Wageningen University & Research, Wageningen, The Netherlands
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, The Netherlands
| | - Ulf W Liebal
- Institute of Applied Microbiology - iAMB, Aachen Biology and Biotechnology - ABBt, RWTH Aachen University, 52074, Aachen, Germany
| | - Lars M Blank
- Institute of Applied Microbiology - iAMB, Aachen Biology and Biotechnology - ABBt, RWTH Aachen University, 52074, Aachen, Germany
| | - Hendrik G Mengers
- Institute of Applied Microbiology - iAMB, Aachen Biology and Biotechnology - ABBt, RWTH Aachen University, 52074, Aachen, Germany
| | - Mihail Anton
- Department of Life Sciences, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Chalmers University of Technology, Gothenburg, SE412 58, Sweden
| | - Albert Tafur Rangel
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, DK-2800 Kgs, Lyngby, Denmark
- Department of Life Sciences, Chalmers University of Technology, Gothenburg, SE412 96, Sweden
| | - Sebastián N Mendoza
- Center for Mathematical Modeling, University of Chile, Santiago, Chile
- Systems Biology Lab, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Lixin Zhang
- State Key Laboratory of Bioreactor Engineering, and School of Biotechnology, East China University of Science and Technology (ECUST), 200237, Shanghai, China
| | - Jens Nielsen
- Department of Life Sciences, Chalmers University of Technology, Gothenburg, SE412 96, Sweden
- BioInnovation Institute, Ole Maaløes Vej 3, DK2200, Copenhagen N, Denmark
| | - Hongzhong Lu
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 200240, Shanghai, China.
| | - Eduard J Kerkhoven
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, DK-2800 Kgs, Lyngby, Denmark.
- Department of Life Sciences, SciLifeLab, Chalmers University of Technology, Gothenburg, SE412 96, Sweden.
| |
Collapse
|
17
|
Csikász-Nagy A, Fichó E, Noto S, Reguly I. Computational tools to predict context-specific protein complexes. Curr Opin Struct Biol 2024; 88:102883. [PMID: 38986166 DOI: 10.1016/j.sbi.2024.102883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Revised: 05/21/2024] [Accepted: 06/19/2024] [Indexed: 07/12/2024]
Abstract
Interactions between thousands of proteins define cells' protein-protein interaction (PPI) network. Some of these interactions lead to the formation of protein complexes. It is challenging to identify a protein complex in a haystack of protein-protein interactions, and it is even more difficult to predict all protein complexes of the complexome. Simulations and machine learning approaches try to crack these problems by looking at the PPI network or predicted protein structures. Clustering of PPI networks led to the first protein complex predictions, while most recently, atomistic models of protein complexes and deep-learning-based structure prediction methods have also emerged. The simulation of PPI level interactions even enables the quantitative prediction of protein complexes. These methods, the required data sources, and their potential future developments are discussed in this review.
Collapse
Affiliation(s)
- Attila Csikász-Nagy
- Cytocast Hungary Kft, Budapest, Hungary; Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary.
| | | | - Santiago Noto
- Cytocast Hungary Kft, Budapest, Hungary; Escola de Matemática Aplicada, Fundação Getúlio Vargas, Rio de Janeiro, Brazil
| | - István Reguly
- Cytocast Hungary Kft, Budapest, Hungary; Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary
| |
Collapse
|
18
|
Vello F, Filippini F, Righetto I. Bioinformatics Goes Viral: I. Databases, Phylogenetics and Phylodynamics Tools for Boosting Virus Research. Viruses 2024; 16:1425. [PMID: 39339901 PMCID: PMC11437414 DOI: 10.3390/v16091425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 08/21/2024] [Accepted: 09/03/2024] [Indexed: 09/30/2024] Open
Abstract
Computer-aided analysis of proteins or nucleic acids seems like a matter of course nowadays; however, the history of Bioinformatics and Computational Biology is quite recent. The advent of high-throughput sequencing has led to the production of "big data", which has also affected the field of virology. The collaboration between the communities of bioinformaticians and virologists already started a few decades ago and it was strongly enhanced by the recent SARS-CoV-2 pandemics. In this article, which is the first in a series on how bioinformatics can enhance virus research, we show that highly useful information is retrievable from selected general and dedicated databases. Indeed, an enormous amount of information-both in terms of nucleotide/protein sequences and their annotation-is deposited in the general databases of international organisations participating in the International Nucleotide Sequence Database Collaboration (INSDC). However, more and more virus-specific databases have been established and are progressively enriched with the contents and features reported in this article. Since viruses are intracellular obligate parasites, a special focus is given to host-pathogen protein-protein interaction databases. Finally, we illustrate several phylogenetic and phylodynamic tools, combining information on algorithms and features with practical information on how to use them and case studies that validate their usefulness. Databases and tools for functional inference will be covered in the next article of this series: Bioinformatics goes viral: II. Sequence-based and structure-based functional analyses for boosting virus research.
Collapse
Affiliation(s)
| | - Francesco Filippini
- Synthetic Biology and Biotechnology Unit, Department of Biology, University of Padua, 35131 Padua, Italy; (F.V.); (I.R.)
| | | |
Collapse
|
19
|
Nastou K, Koutrouli M, Pyysalo S, Jensen LJ. CoNECo: a Corpus for Named Entity recognition and normalization of protein Complexes. BIOINFORMATICS ADVANCES 2024; 4:vbae116. [PMID: 39411448 PMCID: PMC11474106 DOI: 10.1093/bioadv/vbae116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 07/10/2024] [Accepted: 08/04/2024] [Indexed: 10/19/2024]
Abstract
Motivation Despite significant progress in biomedical information extraction, there is a lack of resources for Named Entity Recognition (NER) and Named Entity Normalization (NEN) of protein-containing complexes. Current resources inadequately address the recognition of protein-containing complex names across different organisms, underscoring the crucial need for a dedicated corpus. Results We introduce the Complex Named Entity Corpus (CoNECo), an annotated corpus for NER and NEN of complexes. CoNECo comprises 1621 documents with 2052 entities, 1976 of which are normalized to Gene Ontology. We divided the corpus into training, development, and test sets and trained both a transformer-based and dictionary-based tagger on them. Evaluation on the test set demonstrated robust performance, with F-scores of 73.7% and 61.2%, respectively. Subsequently, we applied the best taggers for comprehensive tagging of the entire openly accessible biomedical literature. Availability and implementation All resources, including the annotated corpus, training data, and code, are available to the community through Zenodo https://zenodo.org/records/11263147 and GitHub https://zenodo.org/records/10693653.
Collapse
Affiliation(s)
- Katerina Nastou
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen 2200, Denmark
| | - Mikaela Koutrouli
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen 2200, Denmark
| | - Sampo Pyysalo
- TurkuNLP Group, Department of Computing, University of Turku, Turku, Finland
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen 2200, Denmark
| |
Collapse
|
20
|
Saranya KR, Vimina ER, Pinto FR. TransNeT-CGP: A cluster-based comorbid gene prioritization by integrating transcriptomics and network-topological features. Comput Biol Chem 2024; 110:108038. [PMID: 38461796 DOI: 10.1016/j.compbiolchem.2024.108038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 01/11/2024] [Accepted: 02/25/2024] [Indexed: 03/12/2024]
Abstract
The local disruptions caused by the genes of one disease can influence the pathways associated with the other diseases resulting in comorbidity. For gene therapies, it is necessary to prioritize the key genes that regulate common biological mechanisms to tackle the issues caused by overlapping diseases. This work proposes a clustering-based computational approach for prioritising the comorbid genes within the overlapping disease modules by analyzing Protein-Protein Interaction networks. For this, a sub-network with gene interactions of the disease pair was extracted from the interactome. The edge weights are assigned by combining the pairwise gene expression correlation and betweenness centrality scores. Further, a weighted graph clustering algorithm is applied and dominant nodes of high-density clusters are ranked based on clustering coefficients and neighborhood connectivity. Case studies based on neurodegenerative diseases such as Amyotrophic Lateral Sclerosis- Spinal Muscular Atrophy (ALS-SMA) pair and cancers such as Ovarian Carcinoma-Invasive Ductal Breast Carcinoma (OC-IDBC) pair were conducted to examine the efficacy of the proposed method. To identify the mechanistic role of top-ranked genes, we used Functional and Pathway enrichment analysis, connectivity analysis with leave-one-out (LOO) method, analysis of associated disease-related protein complexes, and prioritization tools such as TOPPGENE and Heml2.0. From pathway analysis, it was observed that the top 10 genes obtained using the proposed method were associated with 10 pathways in ALS-SMA comorbidity and 15 in the case of OC-IDBC, while that in similar methods like SAPDSB and S2B were 4, 6 respectively for ALS-SMA and 9, 10 respectively for OC-IDBC. In both case studies, 70 % of the disease-specific benchmark protein complexes were linked to top-ranked genes of the proposed method while that of SAPDSB and S2B were 55 % and 60 % respectively. Additionally, it was found that the removal of the top 10 genes disconnect the network into 14 distinct components in the case of ALS-SMA and 9 in the case of OC-IDBC. The experimental results shows that the proposed method can be effectively used for identifying key genes in comorbidity and can offer insights about the intricate molecular relationship driving comorbid diseases.
Collapse
Affiliation(s)
- K R Saranya
- Department of Computer Science & IT, School of Computing, Amrita Vishwa Vidyapeetham, Kochi Campus, India.
| | - E R Vimina
- Department of Computer Science & IT, School of Computing, Amrita Vishwa Vidyapeetham, Kochi Campus, India.
| | - F R Pinto
- Chemistry and Biochemistry Department, Faculty of Sciences, University of Lisbon, Portugal.
| |
Collapse
|
21
|
Fang T, Szklarczyk D, Hachilif R, von Mering C. Enhancing coevolutionary signals in protein-protein interaction prediction through clade-wise alignment integration. Sci Rep 2024; 14:6009. [PMID: 38472223 PMCID: PMC10933411 DOI: 10.1038/s41598-024-55655-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 02/26/2024] [Indexed: 03/14/2024] Open
Abstract
Protein-protein interactions (PPIs) play essential roles in most biological processes. The binding interfaces between interacting proteins impose evolutionary constraints that have successfully been employed to predict PPIs from multiple sequence alignments (MSAs). To construct MSAs, critical choices have to be made: how to ensure the reliable identification of orthologs, and how to optimally balance the need for large alignments versus sufficient alignment quality. Here, we propose a divide-and-conquer strategy for MSA generation: instead of building a single, large alignment for each protein, multiple distinct alignments are constructed under distinct clades in the tree of life. Coevolutionary signals are searched separately within these clades, and are only subsequently integrated using machine learning techniques. We find that this strategy markedly improves overall prediction performance, concomitant with better alignment quality. Using the popular DCA algorithm to systematically search pairs of such alignments, a genome-wide all-against-all interaction scan in a bacterial genome is demonstrated. Given the recent successes of AlphaFold in predicting direct PPIs at atomic detail, a discover-and-refine approach is proposed: our method could provide a fast and accurate strategy for pre-screening the entire genome, submitting to AlphaFold only promising interaction candidates-thus reducing false positives as well as computation time.
Collapse
Affiliation(s)
- Tao Fang
- Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Damian Szklarczyk
- Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Radja Hachilif
- Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Christian von Mering
- Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland.
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.
| |
Collapse
|
22
|
Yang S, Zong W, Shi L, Li R, Ma Z, Ma S, Si J, Wu Z, Zhai J, Ma Y, Fan Z, Chen S, Huang H, Zhang D, Bao Y, Li R, Xie J. PPGR: a comprehensive perennial plant genomes and regulation database. Nucleic Acids Res 2024; 52:D1588-D1596. [PMID: 37933857 PMCID: PMC10767873 DOI: 10.1093/nar/gkad963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/21/2023] [Accepted: 10/13/2023] [Indexed: 11/08/2023] Open
Abstract
Perennial woody plants hold vital ecological significance, distinguished by their unique traits. While significant progress has been made in their genomic and functional studies, a major challenge persists: the absence of a comprehensive reference platform for collection, integration and in-depth analysis of the vast amount of data. Here, we present PPGR (Resource for Perennial Plant Genomes and Regulation; https://ngdc.cncb.ac.cn/ppgr/) to address this critical gap, by collecting, integrating, analyzing and visualizing genomic, gene regulation and functional data of perennial plants. PPGR currently includes 60 species, 847 million protein-protein/TF (transcription factor)-target interactions, 9016 transcriptome samples under various environmental conditions and genetic backgrounds. Noteworthy is the focus on genes that regulate wood production, seasonal dormancy, terpene biosynthesis and leaf senescence representing a wealth of information derived from experimental data, literature mining, public databases and genomic predictions. Furthermore, PPGR incorporates a range of multi-omics search and analysis tools to facilitate browsing and application of these extensive datasets. PPGR represents a comprehensive and high-quality resource for perennial plants, substantiated by an illustrative case study that demonstrates its capacity in unraveling gene functions and shedding light on potential regulatory processes.
Collapse
Affiliation(s)
- Sen Yang
- State Key Laboratory of Tree Genetics and Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- The Tree and Ornamental Plant Breeding and Biotechnology Laboratory of National Forestry and Grassland Administration, Beijing Forestry University, Beijing 100083, China
| | - Wenting Zong
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lingling Shi
- State Key Laboratory of Tree Genetics and Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- The Tree and Ornamental Plant Breeding and Biotechnology Laboratory of National Forestry and Grassland Administration, Beijing Forestry University, Beijing 100083, China
| | - Ruisi Li
- State Key Laboratory of Tree Genetics and Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- The Tree and Ornamental Plant Breeding and Biotechnology Laboratory of National Forestry and Grassland Administration, Beijing Forestry University, Beijing 100083, China
| | - Zhenshu Ma
- State Key Laboratory of Tree Genetics and Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- The Tree and Ornamental Plant Breeding and Biotechnology Laboratory of National Forestry and Grassland Administration, Beijing Forestry University, Beijing 100083, China
| | - Shubao Ma
- State Key Laboratory of Tree Genetics and Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- The Tree and Ornamental Plant Breeding and Biotechnology Laboratory of National Forestry and Grassland Administration, Beijing Forestry University, Beijing 100083, China
| | - Jingna Si
- State Key Laboratory of Tree Genetics and Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- The Tree and Ornamental Plant Breeding and Biotechnology Laboratory of National Forestry and Grassland Administration, Beijing Forestry University, Beijing 100083, China
| | - Zhijing Wu
- State Key Laboratory of Tree Genetics and Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- The Tree and Ornamental Plant Breeding and Biotechnology Laboratory of National Forestry and Grassland Administration, Beijing Forestry University, Beijing 100083, China
| | - Jinglan Zhai
- State Key Laboratory of Tree Genetics and Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- The Tree and Ornamental Plant Breeding and Biotechnology Laboratory of National Forestry and Grassland Administration, Beijing Forestry University, Beijing 100083, China
| | - Yingke Ma
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
| | - Zhuojing Fan
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
| | - Sisi Chen
- State Key Laboratory of Tree Genetics and Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- The Tree and Ornamental Plant Breeding and Biotechnology Laboratory of National Forestry and Grassland Administration, Beijing Forestry University, Beijing 100083, China
| | - Huahong Huang
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, Lin’an, Hangzhou 311300, China
| | - Deqiang Zhang
- State Key Laboratory of Tree Genetics and Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- The Tree and Ornamental Plant Breeding and Biotechnology Laboratory of National Forestry and Grassland Administration, Beijing Forestry University, Beijing 100083, China
| | - Yiming Bao
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Rujiao Li
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jianbo Xie
- State Key Laboratory of Tree Genetics and Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- The Tree and Ornamental Plant Breeding and Biotechnology Laboratory of National Forestry and Grassland Administration, Beijing Forestry University, Beijing 100083, China
| |
Collapse
|
23
|
Mohr SE, Kim AR, Hu Y, Perrimon N. Finding information about uncharacterized Drosophila melanogaster genes. Genetics 2023; 225:iyad187. [PMID: 37933691 PMCID: PMC10697813 DOI: 10.1093/genetics/iyad187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 10/02/2023] [Indexed: 11/08/2023] Open
Abstract
Genes that have been identified in the genome but remain uncharacterized with regards to function offer an opportunity to uncover novel biological information. Novelty is exciting but can also be a barrier. If nothing is known, how does one start planning and executing experiments? Here, we provide a recommended information-mining workflow and a corresponding guide to accessing information about uncharacterized Drosophila melanogaster genes, such as those assigned only a systematic coding gene identifier. The available information can provide insights into where and when the gene is expressed, what the function of the gene might be, whether there are similar genes in other species, whether there are known relationships to other genes, and whether any other features have already been determined. In addition, available information about relevant reagents can inspire and facilitate experimental studies. Altogether, mining available information can help prioritize genes for further study, as well as provide starting points for experimental assays and other analyses.
Collapse
Affiliation(s)
- Stephanie E Mohr
- Department of Genetics, Blavatnik Institute, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Ah-Ram Kim
- Department of Genetics, Blavatnik Institute, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Yanhui Hu
- Department of Genetics, Blavatnik Institute, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Norbert Perrimon
- Department of Genetics, Blavatnik Institute, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
- Howard Hughes Medical Institute, Boston, MA 02115, USA
| |
Collapse
|
24
|
Appasamy SD, Berrisford J, Gaborova R, Nair S, Anyango S, Grudinin S, Deshpande M, Armstrong D, Pidruchna I, Ellaway JIJ, Leines GD, Gupta D, Harrus D, Varadi M, Velankar S. Annotating Macromolecular Complexes in the Protein Data Bank: Improving the FAIRness of Structure Data. Sci Data 2023; 10:853. [PMID: 38040737 PMCID: PMC10692154 DOI: 10.1038/s41597-023-02778-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 11/23/2023] [Indexed: 12/03/2023] Open
Abstract
Macromolecular complexes are essential functional units in nearly all cellular processes, and their atomic-level understanding is critical for elucidating and modulating molecular mechanisms. The Protein Data Bank (PDB) serves as the global repository for experimentally determined structures of macromolecules. Structural data in the PDB offer valuable insights into the dynamics, conformation, and functional states of biological assemblies. However, the current annotation practices lack standardised naming conventions for assemblies in the PDB, complicating the identification of instances representing the same assembly. In this study, we introduce a method leveraging resources external to PDB, such as the Complex Portal, UniProt and Gene Ontology, to describe assemblies and contextualise them within their biological settings accurately. Employing the proposed approach, we assigned standard names to over 90% of unique assemblies in the PDB and provided persistent identifiers for each assembly. This standardisation of assembly data enhances the PDB, facilitating a deeper understanding of macromolecular complexes. Furthermore, the data standardisation improves the PDB's FAIR attributes, fostering more effective basic and translational research and scientific education.
Collapse
Affiliation(s)
- Sri Devan Appasamy
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - John Berrisford
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Romana Gaborova
- CEITEC - Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Sreenath Nair
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Stephen Anyango
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Sergei Grudinin
- Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK, 38000, Grenoble, France
| | - Mandar Deshpande
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - David Armstrong
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Ivanna Pidruchna
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Joseph I J Ellaway
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Grisell Díaz Leines
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Deepti Gupta
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Deborah Harrus
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Mihaly Varadi
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
25
|
Bowler-Barnett EH, Fan J, Luo J, Magrane M, Martin MJ, Orchard S. UniProt and Mass Spectrometry-Based Proteomics-A 2-Way Working Relationship. Mol Cell Proteomics 2023; 22:100591. [PMID: 37301379 PMCID: PMC10404557 DOI: 10.1016/j.mcpro.2023.100591] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 05/20/2023] [Accepted: 06/07/2023] [Indexed: 06/12/2023] Open
Abstract
The human proteome comprises of all of the proteins produced by the sequences translated from the human genome with additional modifications in both sequence and function caused by nonsynonymous variants and posttranslational modifications including cleavage of the initial transcript into smaller peptides and polypeptides. The UniProtKB database (www.uniprot.org) is the world's leading high-quality, comprehensive and freely accessible resource of protein sequence and functional information and presents a summary of experimentally verified, or computationally predicted, functional information added by our expert biocuration team for each protein in the proteome. Researchers in the field of mass spectrometry-based proteomics both consume and add to the body of data available in UniProtKB, and this review highlights the information we provide to this community and the knowledge we in turn obtain from groups via deposition of large-scale datasets in public domain databases.
Collapse
Affiliation(s)
- E H Bowler-Barnett
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - J Fan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - J Luo
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - M Magrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - M J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - S Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom.
| |
Collapse
|
26
|
Hu Y, Comjean A, Attrill H, Antonazzo G, Thurmond J, Chen W, Li F, Chao T, Mohr SE, Brown NH, Perrimon N. PANGEA: a new gene set enrichment tool for Drosophila and common research organisms. Nucleic Acids Res 2023; 51:W419-W426. [PMID: 37125646 PMCID: PMC10320058 DOI: 10.1093/nar/gkad331] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 03/28/2023] [Accepted: 04/29/2023] [Indexed: 05/02/2023] Open
Abstract
Gene set enrichment analysis (GSEA) plays an important role in large-scale data analysis, helping scientists discover the underlying biological patterns over-represented in a gene list resulting from, for example, an 'omics' study. Gene Ontology (GO) annotation is the most frequently used classification mechanism for gene set definition. Here we present a new GSEA tool, PANGEA (PAthway, Network and Gene-set Enrichment Analysis; https://www.flyrnai.org/tools/pangea/), developed to allow a more flexible and configurable approach to data analysis using a variety of classification sets. PANGEA allows GO analysis to be performed on different sets of GO annotations, for example excluding high-throughput studies. Beyond GO, gene sets for pathway annotation and protein complex data from various resources as well as expression and disease annotation from the Alliance of Genome Resources (Alliance). In addition, visualizations of results are enhanced by providing an option to view network of gene set to gene relationships. The tool also allows comparison of multiple input gene lists and accompanying visualisation tools for quick and easy comparison. This new tool will facilitate GSEA for Drosophila and other major model organisms based on high-quality annotated information available for these species.
Collapse
Affiliation(s)
- Yanhui Hu
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Aram Comjean
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Helen Attrill
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Giulia Antonazzo
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Jim Thurmond
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Weihang Chen
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Fangge Li
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Tiffany Chao
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Stephanie E Mohr
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Nicholas H Brown
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Norbert Perrimon
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
- Howard Hughes Medical Institute, Boston, MA 02138, USA
| |
Collapse
|
27
|
Mazein A, Acencio ML, Balaur I, Rougny A, Welter D, Niarakis A, Ramirez Ardila D, Dogrusoz U, Gawron P, Satagopam V, Gu W, Kremer A, Schneider R, Ostaszewski M. A guide for developing comprehensive systems biology maps of disease mechanisms: planning, construction and maintenance. FRONTIERS IN BIOINFORMATICS 2023; 3:1197310. [PMID: 37426048 PMCID: PMC10325725 DOI: 10.3389/fbinf.2023.1197310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 06/09/2023] [Indexed: 07/11/2023] Open
Abstract
As a conceptual model of disease mechanisms, a disease map integrates available knowledge and is applied for data interpretation, predictions and hypothesis generation. It is possible to model disease mechanisms on different levels of granularity and adjust the approach to the goals of a particular project. This rich environment together with requirements for high-quality network reconstruction makes it challenging for new curators and groups to be quickly introduced to the development methods. In this review, we offer a step-by-step guide for developing a disease map within its mainstream pipeline that involves using the CellDesigner tool for creating and editing diagrams and the MINERVA Platform for online visualisation and exploration. We also describe how the Neo4j graph database environment can be used for managing and querying efficiently such a resource. For assessing the interoperability and reproducibility we apply FAIR principles.
Collapse
Affiliation(s)
- Alexander Mazein
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Marcio Luis Acencio
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Irina Balaur
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | | | - Danielle Welter
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Anna Niarakis
- Université Paris-Saclay, Laboratoire Européen de Recherche Pour la Polyarthrite Rhumatoïde–Genhotel, University Evry, Evry, France
- Lifeware Group, Inria Saclay-Ile de France, Palaiseau, France
| | - Diana Ramirez Ardila
- ITTM Information Technology for Translational Medicine, Esch-sur-Alzette, Luxemburg
| | - Ugur Dogrusoz
- Computer Engineering Department, Bilkent University, Ankara, Türkiye
| | - Piotr Gawron
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Venkata Satagopam
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| | - Wei Gu
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| | - Andreas Kremer
- ITTM Information Technology for Translational Medicine, Esch-sur-Alzette, Luxemburg
| | - Reinhard Schneider
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| | - Marek Ostaszewski
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| |
Collapse
|
28
|
Wong ED, Miyasato SR, Aleksander S, Karra K, Nash RS, Skrzypek MS, Weng S, Engel SR, Cherry JM. Saccharomyces genome database update: server architecture, pan-genome nomenclature, and external resources. Genetics 2023; 224:iyac191. [PMID: 36607068 PMCID: PMC10158836 DOI: 10.1093/genetics/iyac191] [Citation(s) in RCA: 52] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 11/16/2022] [Accepted: 12/21/2022] [Indexed: 01/07/2023] Open
Abstract
As one of the first model organism knowledgebases, Saccharomyces Genome Database (SGD) has been supporting the scientific research community since 1993. As technologies and research evolve, so does SGD: from updates in software architecture, to curation of novel data types, to incorporation of data from, and collaboration with, other knowledgebases. We are continuing to make steps toward providing the community with an S. cerevisiae pan-genome. Here, we describe software upgrades, a new nomenclature system for genes not found in the reference strain, and additions to gene pages. With these improvements, we aim to remain a leading resource for students, researchers, and the broader scientific community.
Collapse
Affiliation(s)
- Edith D Wong
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Stuart R Miyasato
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Suzi Aleksander
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Kalpana Karra
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Robert S Nash
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Marek S Skrzypek
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Shuai Weng
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Stacia R Engel
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - J Michael Cherry
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
29
|
Hu Y, Comjean A, Attrill H, Antonazzo G, Thurmond J, Li F, Chao T, Mohr SE, Brown NH, Perrimon N. PANGEA: A New Gene Set Enrichment Tool for Drosophila and Common Research Organisms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.20.529262. [PMID: 36865134 PMCID: PMC9980003 DOI: 10.1101/2023.02.20.529262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
Abstract
Gene set enrichment analysis (GSEA) plays an important role in large-scale data analysis, helping scientists discover the underlying biological patterns over-represented in a gene list resulting from, for example, an 'omics' study. Gene Ontology (GO) annotation is the most frequently used classification mechanism for gene set definition. Here we present a new GSEA tool, PANGEA (PAthway, Network and Gene-set Enrichment Analysis; https://www.flyrnai.org/tools/pangea/ ), developed to allow a more flexible and configurable approach to data analysis using a variety of classification sets. PANGEA allows GO analysis to be performed on different sets of GO annotations, for example excluding high-throughput studies. Beyond GO, gene sets for pathway annotation and protein complex data from various resources as well as expression and disease annotation from the Alliance of Genome Resources (Alliance). In addition, visualisations of results are enhanced by providing an option to view network of gene set to gene relationships. The tool also allows comparison of multiple input gene lists and accompanying visualisation tools for quick and easy comparison. This new tool will facilitate GSEA for Drosophila and other major model organisms based on high-quality annotated information available for these species.
Collapse
Affiliation(s)
- Yanhui Hu
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Aram Comjean
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Helen Attrill
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Giulia Antonazzo
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Jim Thurmond
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Fangge Li
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Tiffany Chao
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Stephanie E. Mohr
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Nicholas H. Brown
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Norbert Perrimon
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
- Howard Hughes Medical Institute, Boston, MA 02138, USA
| |
Collapse
|
30
|
Lemire BD, Uppuluri P. Coding Sequence Insertions in Fungal Genomes are Intrinsically Disordered and can Impart Functionally-Important Properties on the Host Protein. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.06.535715. [PMID: 37066283 PMCID: PMC10104129 DOI: 10.1101/2023.04.06.535715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Insertion and deletion mutations (indels) are important mechanisms of generating protein diversity. Indels in coding sequences are under considerable selective pressure to maintain reading frames and to preserve protein function, but once generated, indels provide raw material for the acquisition of new protein properties and functions. We reported recently that coding sequence insertions in the Candida albicans NDU1 protein, a mitochondrial protein involved in the assembly of the NADH:ubiquinone oxidoreductase are imperative for respiration, biofilm formation and pathogenesis. NDU1 inserts are specific to CTG-clade fungi, absent in human ortholog and successfully harnessed as drug targets. Here, we present the first comprehensive report investigating indels and clade-defining insertions (CDIs) in fungal proteomes. We investigated 80 ascomycete proteomes encompassing CTG clade species, the Saccharomycetaceae family, the Aspergillaceae family and the Herpotrichiellaceae (black yeasts) family. We identified over 30,000 insertions, 4,000 CDIs and 2,500 clade-defining deletions (CDDs). Insert sizes range from 1 to over 1,000 residues in length, while maximum deletion length is 19 residues. Inserts are strikingly over-represented in protein kinases, and excluded from structural domains and transmembrane segments. Inserts are predicted to be highly disordered. The amino acid compositions of the inserts are highly depleted in hydrophobic residues and enriched in polar residues. An indel in the Saccharomyces cerevisiae Sth1 protein, the catalytic subunit of the RSC (Remodel the Structure of Chromatin) complex is predicted to be disordered until it forms a ß-strand upon interaction. This interaction performs a vital role in RSC-mediated transcriptional regulation, thereby expanding protein function.
Collapse
Affiliation(s)
- Bernard D. Lemire
- Department of Biochemistry, University of Alberta, Edmonton, Canada (retired)
| | - Priya Uppuluri
- Institute for Infection and Immunity, Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, USA
- David Geffen School of Medicine at UCLA, Los Angeles, California, USA
| |
Collapse
|
31
|
Rogers JR, Nikolényi G, AlQuraishi M. Growing ecosystem of deep learning methods for modeling protein-protein interactions. Protein Eng Des Sel 2023; 36:gzad023. [PMID: 38102755 DOI: 10.1093/protein/gzad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 12/06/2023] [Accepted: 12/07/2023] [Indexed: 12/17/2023] Open
Abstract
Numerous cellular functions rely on protein-protein interactions. Efforts to comprehensively characterize them remain challenged however by the diversity of molecular recognition mechanisms employed within the proteome. Deep learning has emerged as a promising approach for tackling this problem by exploiting both experimental data and basic biophysical knowledge about protein interactions. Here, we review the growing ecosystem of deep learning methods for modeling protein interactions, highlighting the diversity of these biophysically informed models and their respective trade-offs. We discuss recent successes in using representation learning to capture complex features pertinent to predicting protein interactions and interaction sites, geometric deep learning to reason over protein structures and predict complex structures, and generative modeling to design de novo protein assemblies. We also outline some of the outstanding challenges and promising new directions. Opportunities abound to discover novel interactions, elucidate their physical mechanisms, and engineer binders to modulate their functions using deep learning and, ultimately, unravel how protein interactions orchestrate complex cellular behaviors.
Collapse
Affiliation(s)
- Julia R Rogers
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Gergő Nikolényi
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | | |
Collapse
|
32
|
Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, Gable AL, Fang T, Doncheva N, Pyysalo S, Bork P, Jensen L, von Mering C. The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res 2023; 51:D638-D646. [PMID: 36370105 PMCID: PMC9825434 DOI: 10.1093/nar/gkac1000] [Citation(s) in RCA: 2988] [Impact Index Per Article: 1494.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/10/2022] [Accepted: 10/19/2022] [Indexed: 11/13/2022] Open
Abstract
Much of the complexity within cells arises from functional and regulatory interactions among proteins. The core of these interactions is increasingly known, but novel interactions continue to be discovered, and the information remains scattered across different database resources, experimental modalities and levels of mechanistic detail. The STRING database (https://string-db.org/) systematically collects and integrates protein-protein interactions-both physical interactions as well as functional associations. The data originate from a number of sources: automated text mining of the scientific literature, computational interaction predictions from co-expression, conserved genomic context, databases of interaction experiments and known complexes/pathways from curated sources. All of these interactions are critically assessed, scored, and subsequently automatically transferred to less well-studied organisms using hierarchical orthology information. The data can be accessed via the website, but also programmatically and via bulk downloads. The most recent developments in STRING (version 12.0) are: (i) it is now possible to create, browse and analyze a full interaction network for any novel genome of interest, by submitting its complement of encoded proteins, (ii) the co-expression channel now uses variational auto-encoders to predict interactions, and it covers two new sources, single-cell RNA-seq and experimental proteomics data and (iii) the confidence in each experimentally derived interaction is now estimated based on the detection method used, and communicated to the user in the web-interface. Furthermore, STRING continues to enhance its facilities for functional enrichment analysis, which are now fully available also for user-submitted genomes.
Collapse
Affiliation(s)
- Damian Szklarczyk
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Rebecca Kirsch
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Mikaela Koutrouli
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Katerina Nastou
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Farrokh Mehryary
- TurkuNLP lab, Department of Computing, University of Turku, 20014 Turku, Finland
| | - Radja Hachilif
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Annika L Gable
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Tao Fang
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Nadezhda T Doncheva
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Sampo Pyysalo
- TurkuNLP lab, Department of Computing, University of Turku, 20014 Turku, Finland
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
- Yonsei Frontier Lab (YFL), Yonsei University, Seoul 03722, South Korea
- Max Delbrück Centre for Molecular Medicine, 13125 Berlin, Germany
- Department of Bioinformatics, Biozentrum, University of Würzburg, 97074 Würzburg, Germany
| | - Lars J Jensen
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Christian von Mering
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
33
|
Ricard-Blum S. Building, Visualizing, and Analyzing Glycosaminoglycan-Protein Interaction Networks. Methods Mol Biol 2023; 2619:211-224. [PMID: 36662472 DOI: 10.1007/978-1-0716-2946-8_15] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
This chapter describes how to generate, visualize, and analyze interaction networks of glycosaminoglycans (GAGs), which are linear polyanionic polysaccharides mostly located at the cell surface and in the extracellular matrix. The protocol is divided into three major steps: (1) the collection of GAG-mediated interaction data, (2) the visualization of GAG interaction networks, and (3) the computational enrichment analyses of these networks to identify their overrepresented features (e.g., protein domains, location, molecular functions, and biological pathways) compared to a reference proteome. These analyses are critical to interpret GAG interactomic datasets, decipher their specificities and functions, and ultimately identify GAG-protein interactions to target for therapeutic purpose.
Collapse
Affiliation(s)
- Sylvie Ricard-Blum
- ICBMS, UMR 5246 University Lyon 1, CNRS, Institute of Molecular and Supramolecular Chemistry and Biochemistry, Villeurbanne Cedex, France.
| |
Collapse
|
34
|
Tsitsiridis G, Steinkamp R, Giurgiu M, Brauner B, Fobo G, Frishman G, Montrone C, Ruepp A. CORUM: the comprehensive resource of mammalian protein complexes-2022. Nucleic Acids Res 2022; 51:D539-D545. [PMID: 36382402 PMCID: PMC9825459 DOI: 10.1093/nar/gkac1015] [Citation(s) in RCA: 78] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 10/18/2022] [Accepted: 10/21/2022] [Indexed: 11/17/2022] Open
Abstract
The CORUM database has been providing comprehensive reference information about experimentally characterized, mammalian protein complexes and their associated biological and biomedical properties since 2007. Given that most catalytic and regulatory functions of the cell are carried out by protein complexes, their composition and characterization is of greatest importance in basic and disease biology. The new CORUM 4.0 release encompasses 5204 protein complexes offering the largest and most comprehensive publicly available dataset of manually curated mammalian protein complexes. The CORUM dataset is built from 5299 different genes, representing 26% of the protein coding genes in humans. Complex information from 3354 scientific articles is mainly obtained from human (70%), mouse (16%) and rat (9%) cells and tissues. Recent curation work includes sets of protein complexes, Functional Complex Groups, that offer comprehensive collections of published data in specific biological processes and molecular functions. In addition, a new graphical analysis tool was implemented that displays co-expression data from the subunits of protein complexes. CORUM is freely accessible at http://mips.helmholtz-muenchen.de/corum/.
Collapse
Affiliation(s)
- George Tsitsiridis
- Institute of Experimental Genetics, Helmholtz Center Munich (GmbH), German research Center for environmental Health, Neuherberg D-85764, Germany
| | - Ralph Steinkamp
- Institute of Experimental Genetics, Helmholtz Center Munich (GmbH), German research Center for environmental Health, Neuherberg D-85764, Germany
| | - Madalina Giurgiu
- Experimental and Clinical Research Center, Max Delbrück Center for Molecular Medicine and Charité Universitätsmedizin Berlin, Berlin 13125, Germany
| | - Barbara Brauner
- Institute of Experimental Genetics, Helmholtz Center Munich (GmbH), German research Center for environmental Health, Neuherberg D-85764, Germany
| | - Gisela Fobo
- Institute of Experimental Genetics, Helmholtz Center Munich (GmbH), German research Center for environmental Health, Neuherberg D-85764, Germany
| | - Goar Frishman
- Institute of Experimental Genetics, Helmholtz Center Munich (GmbH), German research Center for environmental Health, Neuherberg D-85764, Germany
| | - Corinna Montrone
- Institute of Experimental Genetics, Helmholtz Center Munich (GmbH), German research Center for environmental Health, Neuherberg D-85764, Germany
| | - Andreas Ruepp
- To whom correspondence should be addressed. Tel: +49 89 3187 3189; Fax: +49 89 3187 3500;
| |
Collapse
|
35
|
Yadav Y, Subbaroyan A, Martin OC, Samal A. Relative importance of composition structures and biologically meaningful logics in bipartite Boolean models of gene regulation. Sci Rep 2022; 12:18156. [PMID: 36307465 PMCID: PMC9616893 DOI: 10.1038/s41598-022-22654-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Accepted: 10/18/2022] [Indexed: 12/31/2022] Open
Abstract
Boolean networks have been widely used to model gene networks. However, such models are coarse-grained to an extent that they abstract away molecular specificities of gene regulation. Alternatively, bipartite Boolean network models of gene regulation explicitly distinguish genes from transcription factors (TFs). In such bipartite models, multiple TFs may simultaneously contribute to gene regulation by forming heteromeric complexes, thus giving rise to composition structures. Since bipartite Boolean models are relatively recent, an empirical investigation of their biological plausibility is lacking. Here, we estimate the prevalence of composition structures arising through heteromeric complexes. Moreover, we present an additional mechanism where composition structures may arise as a result of multiple TFs binding to cis-regulatory regions and provide empirical support for this mechanism. Next, we compare the restriction in BFs imposed by composition structures and by biologically meaningful properties. We find that though composition structures can severely restrict the number of Boolean functions (BFs) driving a gene, the two types of minimally complex BFs, namely nested canalyzing functions (NCFs) and read-once functions (RoFs), are comparatively more restrictive. Finally, we find that composition structures are highly enriched in real networks, but this enrichment most likely comes from NCFs and RoFs.
Collapse
Affiliation(s)
- Yasharth Yadav
- The Institute of Mathematical Sciences (IMSc), Chennai, 600113, India
| | - Ajay Subbaroyan
- The Institute of Mathematical Sciences (IMSc), Chennai, 600113, India
- Homi Bhabha National Institute (HBNI), Mumbai, 400094, India
| | - Olivier C Martin
- Université Paris-Saclay, CNRS, INRAE, Univ Evry, Institute of Plant Sciences Paris-Saclay (IPS2), 91190, Gif sur Yvette, France.
- Université Paris Cité, CNRS, INRAE, Institute of Plant Sciences Paris-Saclay (IPS2), 91190, Gif sur Yvette, France.
| | - Areejit Samal
- The Institute of Mathematical Sciences (IMSc), Chennai, 600113, India.
- Homi Bhabha National Institute (HBNI), Mumbai, 400094, India.
| |
Collapse
|
36
|
Wilken SE, Besançon M, Kratochvíl M, Foko Kuate CA, Trefois C, Gu W, Ebenhöh O. Interrogating the effect of enzyme kinetics on metabolism using differentiable constraint-based models. Metab Eng 2022; 74:72-82. [PMID: 36152931 DOI: 10.1016/j.ymben.2022.09.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 09/08/2022] [Accepted: 09/10/2022] [Indexed: 10/31/2022]
Abstract
Metabolic models are typically characterized by a large number of parameters. Traditionally, metabolic control analysis is applied to differential equation-based models to investigate the sensitivity of predictions to parameters. A corresponding theory for constraint-based models is lacking, due to their formulation as optimization problems. Here, we show that optimal solutions of optimization problems can be efficiently differentiated using constrained optimization duality and implicit differentiation. We use this to calculate the sensitivities of predicted reaction fluxes and enzyme concentrations to turnover numbers in an enzyme-constrained metabolic model of Escherichia coli. The sensitivities quantitatively identify rate limiting enzymes and are mathematically precise, unlike current finite difference based approaches used for sensitivity analysis. Further, efficient differentiation of constraint-based models unlocks the ability to use gradient information for parameter estimation. We demonstrate this by improving, genome-wide, the state-of-the-art turnover number estimates for E. coli. Finally, we show that this technique can be generalized to arbitrarily complex models. By differentiating the optimal solution of a model incorporating both thermodynamic and kinetic rate equations, the effect of metabolite concentrations on biomass growth can be elucidated. We benchmark these metabolite sensitivities against a large experimental gene knockdown study, and find good alignment between the predicted sensitivities and in vivo metabolome changes. In sum, we demonstrate several applications of differentiating optimal solutions of constraint-based metabolic models, and show how it connects to classic metabolic control analysis.
Collapse
Affiliation(s)
- St Elmo Wilken
- Institute of Quantitative and Theoretical Biology, Heinrich-Heine-Universität Düsseldorf, Universitätsstraße 1, 40225, Düsseldorf, Germany; Cluster of Excellence on Plant Sciences, Heinrich-Heine-Universität Düsseldorf, Universitätsstraße 1, 40225, Düsseldorf, Germany.
| | - Mathieu Besançon
- Department for AI in Society, Science, and Technology, Zuse Institute Berlin, Takustraße 7, 14195, Berlin, Germany
| | - Miroslav Kratochvíl
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Campus Belval, L-4367, Belvaux, Luxembourg
| | - Chilperic Armel Foko Kuate
- Institute of Quantitative and Theoretical Biology, Heinrich-Heine-Universität Düsseldorf, Universitätsstraße 1, 40225, Düsseldorf, Germany
| | - Christophe Trefois
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Campus Belval, L-4367, Belvaux, Luxembourg
| | - Wei Gu
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Campus Belval, L-4367, Belvaux, Luxembourg
| | - Oliver Ebenhöh
- Institute of Quantitative and Theoretical Biology, Heinrich-Heine-Universität Düsseldorf, Universitätsstraße 1, 40225, Düsseldorf, Germany; Cluster of Excellence on Plant Sciences, Heinrich-Heine-Universität Düsseldorf, Universitätsstraße 1, 40225, Düsseldorf, Germany
| |
Collapse
|
37
|
Cabrera-Orefice A, Potter A, Evers F, Hevler JF, Guerrero-Castillo S. Complexome Profiling-Exploring Mitochondrial Protein Complexes in Health and Disease. Front Cell Dev Biol 2022; 9:796128. [PMID: 35096826 PMCID: PMC8790184 DOI: 10.3389/fcell.2021.796128] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 12/08/2021] [Indexed: 12/14/2022] Open
Abstract
Complexome profiling (CP) is a state-of-the-art approach that combines separation of native proteins by electrophoresis, size exclusion chromatography or density gradient centrifugation with tandem mass spectrometry identification and quantification. Resulting data are computationally clustered to visualize the inventory, abundance and arrangement of multiprotein complexes in a biological sample. Since its formal introduction a decade ago, this method has been mostly applied to explore not only the composition and abundance of mitochondrial oxidative phosphorylation (OXPHOS) complexes in several species but also to identify novel protein interactors involved in their assembly, maintenance and functions. Besides, complexome profiling has been utilized to study the dynamics of OXPHOS complexes, as well as the impact of an increasing number of mutations leading to mitochondrial disorders or rearrangements of the whole mitochondrial complexome. Here, we summarize the major findings obtained by this approach; emphasize its advantages and current limitations; discuss multiple examples on how this tool could be applied to further investigate pathophysiological mechanisms and comment on the latest advances and opportunity areas to keep developing this methodology.
Collapse
Affiliation(s)
- Alfredo Cabrera-Orefice
- Center for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Alisa Potter
- Department of Pediatrics, Radboud Center for Mitochondrial Medicine, Radboud University Medical Center, Nijmegen, Netherlands
| | - Felix Evers
- Department of Medical Microbiology, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Johannes F Hevler
- Biomolecular Mass Spectrometry and Proteomics, University of Utrecht, Utrecht, Netherlands.,Bijvoet Center for Biomolecular Research, University of Utrecht, Utrecht, Netherlands.,Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, Netherlands.,Netherlands Proteomics Center, Utrecht, Netherlands
| | - Sergio Guerrero-Castillo
- University Children's Research@Kinder-UKE, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| |
Collapse
|
38
|
Rigden DJ, Fernández XM. The 2022 Nucleic Acids Research database issue and the online molecular biology database collection. Nucleic Acids Res 2022; 50:D1-D10. [PMID: 34986604 PMCID: PMC8728296 DOI: 10.1093/nar/gkab1195] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
The 2022 Nucleic Acids Research Database Issue contains 185 papers, including 87 papers reporting on new databases and 85 updates from resources previously published in the Issue. Thirteen additional manuscripts provide updates on databases most recently published elsewhere. Seven new databases focus specifically on COVID-19 and SARS-CoV-2, including SCoV2-MD, the first of the Issue's Breakthrough Articles. Major nucleic acid databases reporting updates include MODOMICS, JASPAR and miRTarBase. The AlphaFold Protein Structure Database, described in the second Breakthrough Article, is the stand-out in the protein section, where the Human Proteoform Atlas and GproteinDb are other notable new arrivals. Updates from DisProt, FuzDB and ELM comprehensively cover disordered proteins. Under the metabolism and signalling section Reactome, ConsensusPathDB, HMDB and CAZy are major returning resources. In microbial and viral genomes taxonomy and systematics are well covered by LPSN, TYGS and GTDB. Genomics resources include Ensembl, Ensembl Genomes and UCSC Genome Browser. Major returning pharmacology resource names include the IUPHAR/BPS guide and the Therapeutic Target Database. New plant databases include PlantGSAD for gene lists and qPTMplants for post-translational modifications. The entire Database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). Our latest update to the NAR online Molecular Biology Database Collection brings the total number of entries to 1645. Following last year's major cleanup, we have updated 317 entries, listing 89 new resources and trimming 80 discontinued URLs. The current release is available at http://www.oxfordjournals.org/nar/database/c/.
Collapse
Affiliation(s)
- Daniel J Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK
| | | |
Collapse
|
39
|
Cantelli G, Bateman A, Brooksbank C, Petrov AI, Malik-Sheriff R, Ide-Smith M, Hermjakob H, Flicek P, Apweiler R, Birney E, McEntyre J. The European Bioinformatics Institute (EMBL-EBI) in 2021. Nucleic Acids Res 2022; 50:D11-D19. [PMID: 34850134 PMCID: PMC8690175 DOI: 10.1093/nar/gkab1127] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/14/2021] [Accepted: 11/23/2021] [Indexed: 11/28/2022] Open
Abstract
The European Bioinformatics Institute (EMBL-EBI) maintains a comprehensive range of freely available and up-to-date molecular data resources, which includes over 40 resources covering every major data type in the life sciences. This year's service update for EMBL-EBI includes new resources, PGS Catalog and AlphaFold DB, and updates on existing resources, including the COVID-19 Data Platform, trRosetta and RoseTTAfold models introduced in Pfam and InterPro, and the launch of Genome Integrations with Function and Sequence by UniProt and Ensembl. Furthermore, we highlight projects through which EMBL-EBI has contributed to the development of community-driven data standards and guidelines, including the Recommended Metadata for Biological Images (REMBI), and the BioModels Reproducibility Scorecard. Training is one of EMBL-EBI's core missions and a key component of the provision of bioinformatics services to users: this year's update includes many of the improvements that have been developed to EMBL-EBI's online training offering.
Collapse
Affiliation(s)
- Gaia Cantelli
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cath Brooksbank
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Anton I Petrov
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rahuman S Malik-Sheriff
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Michele Ide-Smith
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rolf Apweiler
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Johanna McEntyre
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|