1
|
Fang T, Szklarczyk D, Hachilif R, von Mering C. Enhancing coevolutionary signals in protein-protein interaction prediction through clade-wise alignment integration. Sci Rep 2024; 14:6009. [PMID: 38472223 PMCID: PMC10933411 DOI: 10.1038/s41598-024-55655-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 02/26/2024] [Indexed: 03/14/2024] Open
Abstract
Protein-protein interactions (PPIs) play essential roles in most biological processes. The binding interfaces between interacting proteins impose evolutionary constraints that have successfully been employed to predict PPIs from multiple sequence alignments (MSAs). To construct MSAs, critical choices have to be made: how to ensure the reliable identification of orthologs, and how to optimally balance the need for large alignments versus sufficient alignment quality. Here, we propose a divide-and-conquer strategy for MSA generation: instead of building a single, large alignment for each protein, multiple distinct alignments are constructed under distinct clades in the tree of life. Coevolutionary signals are searched separately within these clades, and are only subsequently integrated using machine learning techniques. We find that this strategy markedly improves overall prediction performance, concomitant with better alignment quality. Using the popular DCA algorithm to systematically search pairs of such alignments, a genome-wide all-against-all interaction scan in a bacterial genome is demonstrated. Given the recent successes of AlphaFold in predicting direct PPIs at atomic detail, a discover-and-refine approach is proposed: our method could provide a fast and accurate strategy for pre-screening the entire genome, submitting to AlphaFold only promising interaction candidates-thus reducing false positives as well as computation time.
Collapse
Affiliation(s)
- Tao Fang
- Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Damian Szklarczyk
- Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Radja Hachilif
- Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Christian von Mering
- Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland.
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.
| |
Collapse
|
2
|
Saranya KR, Vimina ER, Pinto FR. TransNeT-CGP: A cluster-based comorbid gene prioritization by integrating transcriptomics and network-topological features. Comput Biol Chem 2024; 110:108038. [PMID: 38461796 DOI: 10.1016/j.compbiolchem.2024.108038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 01/11/2024] [Accepted: 02/25/2024] [Indexed: 03/12/2024]
Abstract
The local disruptions caused by the genes of one disease can influence the pathways associated with the other diseases resulting in comorbidity. For gene therapies, it is necessary to prioritize the key genes that regulate common biological mechanisms to tackle the issues caused by overlapping diseases. This work proposes a clustering-based computational approach for prioritising the comorbid genes within the overlapping disease modules by analyzing Protein-Protein Interaction networks. For this, a sub-network with gene interactions of the disease pair was extracted from the interactome. The edge weights are assigned by combining the pairwise gene expression correlation and betweenness centrality scores. Further, a weighted graph clustering algorithm is applied and dominant nodes of high-density clusters are ranked based on clustering coefficients and neighborhood connectivity. Case studies based on neurodegenerative diseases such as Amyotrophic Lateral Sclerosis- Spinal Muscular Atrophy (ALS-SMA) pair and cancers such as Ovarian Carcinoma-Invasive Ductal Breast Carcinoma (OC-IDBC) pair were conducted to examine the efficacy of the proposed method. To identify the mechanistic role of top-ranked genes, we used Functional and Pathway enrichment analysis, connectivity analysis with leave-one-out (LOO) method, analysis of associated disease-related protein complexes, and prioritization tools such as TOPPGENE and Heml2.0. From pathway analysis, it was observed that the top 10 genes obtained using the proposed method were associated with 10 pathways in ALS-SMA comorbidity and 15 in the case of OC-IDBC, while that in similar methods like SAPDSB and S2B were 4, 6 respectively for ALS-SMA and 9, 10 respectively for OC-IDBC. In both case studies, 70 % of the disease-specific benchmark protein complexes were linked to top-ranked genes of the proposed method while that of SAPDSB and S2B were 55 % and 60 % respectively. Additionally, it was found that the removal of the top 10 genes disconnect the network into 14 distinct components in the case of ALS-SMA and 9 in the case of OC-IDBC. The experimental results shows that the proposed method can be effectively used for identifying key genes in comorbidity and can offer insights about the intricate molecular relationship driving comorbid diseases.
Collapse
Affiliation(s)
- K R Saranya
- Department of Computer Science & IT, School of Computing, Amrita Vishwa Vidyapeetham, Kochi Campus, India.
| | - E R Vimina
- Department of Computer Science & IT, School of Computing, Amrita Vishwa Vidyapeetham, Kochi Campus, India.
| | - F R Pinto
- Chemistry and Biochemistry Department, Faculty of Sciences, University of Lisbon, Portugal.
| |
Collapse
|
3
|
Yang S, Zong W, Shi L, Li R, Ma Z, Ma S, Si J, Wu Z, Zhai J, Ma Y, Fan Z, Chen S, Huang H, Zhang D, Bao Y, Li R, Xie J. PPGR: a comprehensive perennial plant genomes and regulation database. Nucleic Acids Res 2024; 52:D1588-D1596. [PMID: 37933857 PMCID: PMC10767873 DOI: 10.1093/nar/gkad963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/21/2023] [Accepted: 10/13/2023] [Indexed: 11/08/2023] Open
Abstract
Perennial woody plants hold vital ecological significance, distinguished by their unique traits. While significant progress has been made in their genomic and functional studies, a major challenge persists: the absence of a comprehensive reference platform for collection, integration and in-depth analysis of the vast amount of data. Here, we present PPGR (Resource for Perennial Plant Genomes and Regulation; https://ngdc.cncb.ac.cn/ppgr/) to address this critical gap, by collecting, integrating, analyzing and visualizing genomic, gene regulation and functional data of perennial plants. PPGR currently includes 60 species, 847 million protein-protein/TF (transcription factor)-target interactions, 9016 transcriptome samples under various environmental conditions and genetic backgrounds. Noteworthy is the focus on genes that regulate wood production, seasonal dormancy, terpene biosynthesis and leaf senescence representing a wealth of information derived from experimental data, literature mining, public databases and genomic predictions. Furthermore, PPGR incorporates a range of multi-omics search and analysis tools to facilitate browsing and application of these extensive datasets. PPGR represents a comprehensive and high-quality resource for perennial plants, substantiated by an illustrative case study that demonstrates its capacity in unraveling gene functions and shedding light on potential regulatory processes.
Collapse
Affiliation(s)
- Sen Yang
- State Key Laboratory of Tree Genetics and Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- The Tree and Ornamental Plant Breeding and Biotechnology Laboratory of National Forestry and Grassland Administration, Beijing Forestry University, Beijing 100083, China
| | - Wenting Zong
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lingling Shi
- State Key Laboratory of Tree Genetics and Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- The Tree and Ornamental Plant Breeding and Biotechnology Laboratory of National Forestry and Grassland Administration, Beijing Forestry University, Beijing 100083, China
| | - Ruisi Li
- State Key Laboratory of Tree Genetics and Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- The Tree and Ornamental Plant Breeding and Biotechnology Laboratory of National Forestry and Grassland Administration, Beijing Forestry University, Beijing 100083, China
| | - Zhenshu Ma
- State Key Laboratory of Tree Genetics and Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- The Tree and Ornamental Plant Breeding and Biotechnology Laboratory of National Forestry and Grassland Administration, Beijing Forestry University, Beijing 100083, China
| | - Shubao Ma
- State Key Laboratory of Tree Genetics and Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- The Tree and Ornamental Plant Breeding and Biotechnology Laboratory of National Forestry and Grassland Administration, Beijing Forestry University, Beijing 100083, China
| | - Jingna Si
- State Key Laboratory of Tree Genetics and Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- The Tree and Ornamental Plant Breeding and Biotechnology Laboratory of National Forestry and Grassland Administration, Beijing Forestry University, Beijing 100083, China
| | - Zhijing Wu
- State Key Laboratory of Tree Genetics and Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- The Tree and Ornamental Plant Breeding and Biotechnology Laboratory of National Forestry and Grassland Administration, Beijing Forestry University, Beijing 100083, China
| | - Jinglan Zhai
- State Key Laboratory of Tree Genetics and Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- The Tree and Ornamental Plant Breeding and Biotechnology Laboratory of National Forestry and Grassland Administration, Beijing Forestry University, Beijing 100083, China
| | - Yingke Ma
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
| | - Zhuojing Fan
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
| | - Sisi Chen
- State Key Laboratory of Tree Genetics and Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- The Tree and Ornamental Plant Breeding and Biotechnology Laboratory of National Forestry and Grassland Administration, Beijing Forestry University, Beijing 100083, China
| | - Huahong Huang
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, Lin’an, Hangzhou 311300, China
| | - Deqiang Zhang
- State Key Laboratory of Tree Genetics and Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- The Tree and Ornamental Plant Breeding and Biotechnology Laboratory of National Forestry and Grassland Administration, Beijing Forestry University, Beijing 100083, China
| | - Yiming Bao
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Rujiao Li
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jianbo Xie
- State Key Laboratory of Tree Genetics and Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
- The Tree and Ornamental Plant Breeding and Biotechnology Laboratory of National Forestry and Grassland Administration, Beijing Forestry University, Beijing 100083, China
| |
Collapse
|
4
|
Mohr SE, Kim AR, Hu Y, Perrimon N. Finding information about uncharacterized Drosophila melanogaster genes. Genetics 2023; 225:iyad187. [PMID: 37933691 PMCID: PMC10697813 DOI: 10.1093/genetics/iyad187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 10/02/2023] [Indexed: 11/08/2023] Open
Abstract
Genes that have been identified in the genome but remain uncharacterized with regards to function offer an opportunity to uncover novel biological information. Novelty is exciting but can also be a barrier. If nothing is known, how does one start planning and executing experiments? Here, we provide a recommended information-mining workflow and a corresponding guide to accessing information about uncharacterized Drosophila melanogaster genes, such as those assigned only a systematic coding gene identifier. The available information can provide insights into where and when the gene is expressed, what the function of the gene might be, whether there are similar genes in other species, whether there are known relationships to other genes, and whether any other features have already been determined. In addition, available information about relevant reagents can inspire and facilitate experimental studies. Altogether, mining available information can help prioritize genes for further study, as well as provide starting points for experimental assays and other analyses.
Collapse
Affiliation(s)
- Stephanie E Mohr
- Department of Genetics, Blavatnik Institute, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Ah-Ram Kim
- Department of Genetics, Blavatnik Institute, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Yanhui Hu
- Department of Genetics, Blavatnik Institute, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Norbert Perrimon
- Department of Genetics, Blavatnik Institute, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
- Howard Hughes Medical Institute, Boston, MA 02115, USA
| |
Collapse
|
5
|
Appasamy SD, Berrisford J, Gaborova R, Nair S, Anyango S, Grudinin S, Deshpande M, Armstrong D, Pidruchna I, Ellaway JIJ, Leines GD, Gupta D, Harrus D, Varadi M, Velankar S. Annotating Macromolecular Complexes in the Protein Data Bank: Improving the FAIRness of Structure Data. Sci Data 2023; 10:853. [PMID: 38040737 PMCID: PMC10692154 DOI: 10.1038/s41597-023-02778-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 11/23/2023] [Indexed: 12/03/2023] Open
Abstract
Macromolecular complexes are essential functional units in nearly all cellular processes, and their atomic-level understanding is critical for elucidating and modulating molecular mechanisms. The Protein Data Bank (PDB) serves as the global repository for experimentally determined structures of macromolecules. Structural data in the PDB offer valuable insights into the dynamics, conformation, and functional states of biological assemblies. However, the current annotation practices lack standardised naming conventions for assemblies in the PDB, complicating the identification of instances representing the same assembly. In this study, we introduce a method leveraging resources external to PDB, such as the Complex Portal, UniProt and Gene Ontology, to describe assemblies and contextualise them within their biological settings accurately. Employing the proposed approach, we assigned standard names to over 90% of unique assemblies in the PDB and provided persistent identifiers for each assembly. This standardisation of assembly data enhances the PDB, facilitating a deeper understanding of macromolecular complexes. Furthermore, the data standardisation improves the PDB's FAIR attributes, fostering more effective basic and translational research and scientific education.
Collapse
Affiliation(s)
- Sri Devan Appasamy
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - John Berrisford
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Romana Gaborova
- CEITEC - Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Sreenath Nair
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Stephen Anyango
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Sergei Grudinin
- Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK, 38000, Grenoble, France
| | - Mandar Deshpande
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - David Armstrong
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Ivanna Pidruchna
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Joseph I J Ellaway
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Grisell Díaz Leines
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Deepti Gupta
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Deborah Harrus
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Mihaly Varadi
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
6
|
Bowler-Barnett EH, Fan J, Luo J, Magrane M, Martin MJ, Orchard S. UniProt and Mass Spectrometry-Based Proteomics-A 2-Way Working Relationship. Mol Cell Proteomics 2023; 22:100591. [PMID: 37301379 PMCID: PMC10404557 DOI: 10.1016/j.mcpro.2023.100591] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 05/20/2023] [Accepted: 06/07/2023] [Indexed: 06/12/2023] Open
Abstract
The human proteome comprises of all of the proteins produced by the sequences translated from the human genome with additional modifications in both sequence and function caused by nonsynonymous variants and posttranslational modifications including cleavage of the initial transcript into smaller peptides and polypeptides. The UniProtKB database (www.uniprot.org) is the world's leading high-quality, comprehensive and freely accessible resource of protein sequence and functional information and presents a summary of experimentally verified, or computationally predicted, functional information added by our expert biocuration team for each protein in the proteome. Researchers in the field of mass spectrometry-based proteomics both consume and add to the body of data available in UniProtKB, and this review highlights the information we provide to this community and the knowledge we in turn obtain from groups via deposition of large-scale datasets in public domain databases.
Collapse
Affiliation(s)
- E H Bowler-Barnett
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - J Fan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - J Luo
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - M Magrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - M J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - S Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom.
| |
Collapse
|
7
|
Hu Y, Comjean A, Attrill H, Antonazzo G, Thurmond J, Chen W, Li F, Chao T, Mohr SE, Brown NH, Perrimon N. PANGEA: a new gene set enrichment tool for Drosophila and common research organisms. Nucleic Acids Res 2023; 51:W419-W426. [PMID: 37125646 PMCID: PMC10320058 DOI: 10.1093/nar/gkad331] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 03/28/2023] [Accepted: 04/29/2023] [Indexed: 05/02/2023] Open
Abstract
Gene set enrichment analysis (GSEA) plays an important role in large-scale data analysis, helping scientists discover the underlying biological patterns over-represented in a gene list resulting from, for example, an 'omics' study. Gene Ontology (GO) annotation is the most frequently used classification mechanism for gene set definition. Here we present a new GSEA tool, PANGEA (PAthway, Network and Gene-set Enrichment Analysis; https://www.flyrnai.org/tools/pangea/), developed to allow a more flexible and configurable approach to data analysis using a variety of classification sets. PANGEA allows GO analysis to be performed on different sets of GO annotations, for example excluding high-throughput studies. Beyond GO, gene sets for pathway annotation and protein complex data from various resources as well as expression and disease annotation from the Alliance of Genome Resources (Alliance). In addition, visualizations of results are enhanced by providing an option to view network of gene set to gene relationships. The tool also allows comparison of multiple input gene lists and accompanying visualisation tools for quick and easy comparison. This new tool will facilitate GSEA for Drosophila and other major model organisms based on high-quality annotated information available for these species.
Collapse
Affiliation(s)
- Yanhui Hu
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Aram Comjean
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Helen Attrill
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Giulia Antonazzo
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Jim Thurmond
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Weihang Chen
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Fangge Li
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Tiffany Chao
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Stephanie E Mohr
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Nicholas H Brown
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Norbert Perrimon
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
- Howard Hughes Medical Institute, Boston, MA 02138, USA
| |
Collapse
|
8
|
Mazein A, Acencio ML, Balaur I, Rougny A, Welter D, Niarakis A, Ramirez Ardila D, Dogrusoz U, Gawron P, Satagopam V, Gu W, Kremer A, Schneider R, Ostaszewski M. A guide for developing comprehensive systems biology maps of disease mechanisms: planning, construction and maintenance. Front Bioinform 2023; 3:1197310. [PMID: 37426048 PMCID: PMC10325725 DOI: 10.3389/fbinf.2023.1197310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 06/09/2023] [Indexed: 07/11/2023] Open
Abstract
As a conceptual model of disease mechanisms, a disease map integrates available knowledge and is applied for data interpretation, predictions and hypothesis generation. It is possible to model disease mechanisms on different levels of granularity and adjust the approach to the goals of a particular project. This rich environment together with requirements for high-quality network reconstruction makes it challenging for new curators and groups to be quickly introduced to the development methods. In this review, we offer a step-by-step guide for developing a disease map within its mainstream pipeline that involves using the CellDesigner tool for creating and editing diagrams and the MINERVA Platform for online visualisation and exploration. We also describe how the Neo4j graph database environment can be used for managing and querying efficiently such a resource. For assessing the interoperability and reproducibility we apply FAIR principles.
Collapse
Affiliation(s)
- Alexander Mazein
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Marcio Luis Acencio
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Irina Balaur
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | | | - Danielle Welter
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Anna Niarakis
- Université Paris-Saclay, Laboratoire Européen de Recherche Pour la Polyarthrite Rhumatoïde–Genhotel, University Evry, Evry, France
- Lifeware Group, Inria Saclay-Ile de France, Palaiseau, France
| | - Diana Ramirez Ardila
- ITTM Information Technology for Translational Medicine, Esch-sur-Alzette, Luxemburg
| | - Ugur Dogrusoz
- Computer Engineering Department, Bilkent University, Ankara, Türkiye
| | - Piotr Gawron
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Venkata Satagopam
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| | - Wei Gu
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| | - Andreas Kremer
- ITTM Information Technology for Translational Medicine, Esch-sur-Alzette, Luxemburg
| | - Reinhard Schneider
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| | - Marek Ostaszewski
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| |
Collapse
|
9
|
Wong ED, Miyasato SR, Aleksander S, Karra K, Nash RS, Skrzypek MS, Weng S, Engel SR, Cherry JM. Saccharomyces genome database update: server architecture, pan-genome nomenclature, and external resources. Genetics 2023; 224:iyac191. [PMID: 36607068 PMCID: PMC10158836 DOI: 10.1093/genetics/iyac191] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 11/16/2022] [Accepted: 12/21/2022] [Indexed: 01/07/2023] Open
Abstract
As one of the first model organism knowledgebases, Saccharomyces Genome Database (SGD) has been supporting the scientific research community since 1993. As technologies and research evolve, so does SGD: from updates in software architecture, to curation of novel data types, to incorporation of data from, and collaboration with, other knowledgebases. We are continuing to make steps toward providing the community with an S. cerevisiae pan-genome. Here, we describe software upgrades, a new nomenclature system for genes not found in the reference strain, and additions to gene pages. With these improvements, we aim to remain a leading resource for students, researchers, and the broader scientific community.
Collapse
Affiliation(s)
- Edith D Wong
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Stuart R Miyasato
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Suzi Aleksander
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Kalpana Karra
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Robert S Nash
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Marek S Skrzypek
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Shuai Weng
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Stacia R Engel
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - J Michael Cherry
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
10
|
Hu Y, Comjean A, Attrill H, Antonazzo G, Thurmond J, Li F, Chao T, Mohr SE, Brown NH, Perrimon N. PANGEA: A New Gene Set Enrichment Tool for Drosophila and Common Research Organisms. bioRxiv 2023:2023.02.20.529262. [PMID: 36865134 PMCID: PMC9980003 DOI: 10.1101/2023.02.20.529262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
Abstract
Gene set enrichment analysis (GSEA) plays an important role in large-scale data analysis, helping scientists discover the underlying biological patterns over-represented in a gene list resulting from, for example, an 'omics' study. Gene Ontology (GO) annotation is the most frequently used classification mechanism for gene set definition. Here we present a new GSEA tool, PANGEA (PAthway, Network and Gene-set Enrichment Analysis; https://www.flyrnai.org/tools/pangea/ ), developed to allow a more flexible and configurable approach to data analysis using a variety of classification sets. PANGEA allows GO analysis to be performed on different sets of GO annotations, for example excluding high-throughput studies. Beyond GO, gene sets for pathway annotation and protein complex data from various resources as well as expression and disease annotation from the Alliance of Genome Resources (Alliance). In addition, visualisations of results are enhanced by providing an option to view network of gene set to gene relationships. The tool also allows comparison of multiple input gene lists and accompanying visualisation tools for quick and easy comparison. This new tool will facilitate GSEA for Drosophila and other major model organisms based on high-quality annotated information available for these species.
Collapse
Affiliation(s)
- Yanhui Hu
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Aram Comjean
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Helen Attrill
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Giulia Antonazzo
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Jim Thurmond
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Fangge Li
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Tiffany Chao
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Stephanie E. Mohr
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Nicholas H. Brown
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Norbert Perrimon
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
- Howard Hughes Medical Institute, Boston, MA 02138, USA
| |
Collapse
|
11
|
Lemire BD, Uppuluri P. Coding Sequence Insertions in Fungal Genomes are Intrinsically Disordered and can Impart Functionally-Important Properties on the Host Protein. bioRxiv 2023:2023.04.06.535715. [PMID: 37066283 PMCID: PMC10104129 DOI: 10.1101/2023.04.06.535715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Insertion and deletion mutations (indels) are important mechanisms of generating protein diversity. Indels in coding sequences are under considerable selective pressure to maintain reading frames and to preserve protein function, but once generated, indels provide raw material for the acquisition of new protein properties and functions. We reported recently that coding sequence insertions in the Candida albicans NDU1 protein, a mitochondrial protein involved in the assembly of the NADH:ubiquinone oxidoreductase are imperative for respiration, biofilm formation and pathogenesis. NDU1 inserts are specific to CTG-clade fungi, absent in human ortholog and successfully harnessed as drug targets. Here, we present the first comprehensive report investigating indels and clade-defining insertions (CDIs) in fungal proteomes. We investigated 80 ascomycete proteomes encompassing CTG clade species, the Saccharomycetaceae family, the Aspergillaceae family and the Herpotrichiellaceae (black yeasts) family. We identified over 30,000 insertions, 4,000 CDIs and 2,500 clade-defining deletions (CDDs). Insert sizes range from 1 to over 1,000 residues in length, while maximum deletion length is 19 residues. Inserts are strikingly over-represented in protein kinases, and excluded from structural domains and transmembrane segments. Inserts are predicted to be highly disordered. The amino acid compositions of the inserts are highly depleted in hydrophobic residues and enriched in polar residues. An indel in the Saccharomyces cerevisiae Sth1 protein, the catalytic subunit of the RSC (Remodel the Structure of Chromatin) complex is predicted to be disordered until it forms a ß-strand upon interaction. This interaction performs a vital role in RSC-mediated transcriptional regulation, thereby expanding protein function.
Collapse
Affiliation(s)
- Bernard D. Lemire
- Department of Biochemistry, University of Alberta, Edmonton, Canada (retired)
| | - Priya Uppuluri
- Institute for Infection and Immunity, Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, USA
- David Geffen School of Medicine at UCLA, Los Angeles, California, USA
| |
Collapse
|
12
|
Rogers JR, Nikolényi G, AlQuraishi M. Growing ecosystem of deep learning methods for modeling protein-protein interactions. Protein Eng Des Sel 2023; 36:gzad023. [PMID: 38102755 DOI: 10.1093/protein/gzad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 12/06/2023] [Accepted: 12/07/2023] [Indexed: 12/17/2023] Open
Abstract
Numerous cellular functions rely on protein-protein interactions. Efforts to comprehensively characterize them remain challenged however by the diversity of molecular recognition mechanisms employed within the proteome. Deep learning has emerged as a promising approach for tackling this problem by exploiting both experimental data and basic biophysical knowledge about protein interactions. Here, we review the growing ecosystem of deep learning methods for modeling protein interactions, highlighting the diversity of these biophysically informed models and their respective trade-offs. We discuss recent successes in using representation learning to capture complex features pertinent to predicting protein interactions and interaction sites, geometric deep learning to reason over protein structures and predict complex structures, and generative modeling to design de novo protein assemblies. We also outline some of the outstanding challenges and promising new directions. Opportunities abound to discover novel interactions, elucidate their physical mechanisms, and engineer binders to modulate their functions using deep learning and, ultimately, unravel how protein interactions orchestrate complex cellular behaviors.
Collapse
Affiliation(s)
- Julia R Rogers
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Gergő Nikolényi
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | | |
Collapse
|
13
|
Ricard-Blum S. Building, Visualizing, and Analyzing Glycosaminoglycan-Protein Interaction Networks. Methods Mol Biol 2023; 2619:211-224. [PMID: 36662472 DOI: 10.1007/978-1-0716-2946-8_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
This chapter describes how to generate, visualize, and analyze interaction networks of glycosaminoglycans (GAGs), which are linear polyanionic polysaccharides mostly located at the cell surface and in the extracellular matrix. The protocol is divided into three major steps: (1) the collection of GAG-mediated interaction data, (2) the visualization of GAG interaction networks, and (3) the computational enrichment analyses of these networks to identify their overrepresented features (e.g., protein domains, location, molecular functions, and biological pathways) compared to a reference proteome. These analyses are critical to interpret GAG interactomic datasets, decipher their specificities and functions, and ultimately identify GAG-protein interactions to target for therapeutic purpose.
Collapse
Affiliation(s)
- Sylvie Ricard-Blum
- ICBMS, UMR 5246 University Lyon 1, CNRS, Institute of Molecular and Supramolecular Chemistry and Biochemistry, Villeurbanne Cedex, France.
| |
Collapse
|
14
|
Tsitsiridis G, Steinkamp R, Giurgiu M, Brauner B, Fobo G, Frishman G, Montrone C, Ruepp A. CORUM: the comprehensive resource of mammalian protein complexes-2022. Nucleic Acids Res 2022; 51:D539-D545. [PMID: 36382402 PMCID: PMC9825459 DOI: 10.1093/nar/gkac1015] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 10/18/2022] [Accepted: 10/21/2022] [Indexed: 11/17/2022] Open
Abstract
The CORUM database has been providing comprehensive reference information about experimentally characterized, mammalian protein complexes and their associated biological and biomedical properties since 2007. Given that most catalytic and regulatory functions of the cell are carried out by protein complexes, their composition and characterization is of greatest importance in basic and disease biology. The new CORUM 4.0 release encompasses 5204 protein complexes offering the largest and most comprehensive publicly available dataset of manually curated mammalian protein complexes. The CORUM dataset is built from 5299 different genes, representing 26% of the protein coding genes in humans. Complex information from 3354 scientific articles is mainly obtained from human (70%), mouse (16%) and rat (9%) cells and tissues. Recent curation work includes sets of protein complexes, Functional Complex Groups, that offer comprehensive collections of published data in specific biological processes and molecular functions. In addition, a new graphical analysis tool was implemented that displays co-expression data from the subunits of protein complexes. CORUM is freely accessible at http://mips.helmholtz-muenchen.de/corum/.
Collapse
Affiliation(s)
- George Tsitsiridis
- Institute of Experimental Genetics, Helmholtz Center Munich (GmbH), German research Center for environmental Health, Neuherberg D-85764, Germany
| | - Ralph Steinkamp
- Institute of Experimental Genetics, Helmholtz Center Munich (GmbH), German research Center for environmental Health, Neuherberg D-85764, Germany
| | - Madalina Giurgiu
- Experimental and Clinical Research Center, Max Delbrück Center for Molecular Medicine and Charité Universitätsmedizin Berlin, Berlin 13125, Germany
| | - Barbara Brauner
- Institute of Experimental Genetics, Helmholtz Center Munich (GmbH), German research Center for environmental Health, Neuherberg D-85764, Germany
| | - Gisela Fobo
- Institute of Experimental Genetics, Helmholtz Center Munich (GmbH), German research Center for environmental Health, Neuherberg D-85764, Germany
| | - Goar Frishman
- Institute of Experimental Genetics, Helmholtz Center Munich (GmbH), German research Center for environmental Health, Neuherberg D-85764, Germany
| | - Corinna Montrone
- Institute of Experimental Genetics, Helmholtz Center Munich (GmbH), German research Center for environmental Health, Neuherberg D-85764, Germany
| | - Andreas Ruepp
- To whom correspondence should be addressed. Tel: +49 89 3187 3189; Fax: +49 89 3187 3500;
| |
Collapse
|
15
|
Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, Gable AL, Fang T, Doncheva N, Pyysalo S, Bork P, Jensen L, von Mering C. The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res 2022; 51:D638-D646. [PMID: 36370105 PMCID: PMC9825434 DOI: 10.1093/nar/gkac1000] [Citation(s) in RCA: 785] [Impact Index Per Article: 392.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/10/2022] [Accepted: 10/19/2022] [Indexed: 11/13/2022] Open
Abstract
Much of the complexity within cells arises from functional and regulatory interactions among proteins. The core of these interactions is increasingly known, but novel interactions continue to be discovered, and the information remains scattered across different database resources, experimental modalities and levels of mechanistic detail. The STRING database (https://string-db.org/) systematically collects and integrates protein-protein interactions-both physical interactions as well as functional associations. The data originate from a number of sources: automated text mining of the scientific literature, computational interaction predictions from co-expression, conserved genomic context, databases of interaction experiments and known complexes/pathways from curated sources. All of these interactions are critically assessed, scored, and subsequently automatically transferred to less well-studied organisms using hierarchical orthology information. The data can be accessed via the website, but also programmatically and via bulk downloads. The most recent developments in STRING (version 12.0) are: (i) it is now possible to create, browse and analyze a full interaction network for any novel genome of interest, by submitting its complement of encoded proteins, (ii) the co-expression channel now uses variational auto-encoders to predict interactions, and it covers two new sources, single-cell RNA-seq and experimental proteomics data and (iii) the confidence in each experimentally derived interaction is now estimated based on the detection method used, and communicated to the user in the web-interface. Furthermore, STRING continues to enhance its facilities for functional enrichment analysis, which are now fully available also for user-submitted genomes.
Collapse
Affiliation(s)
- Damian Szklarczyk
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Rebecca Kirsch
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Mikaela Koutrouli
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Katerina Nastou
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Farrokh Mehryary
- TurkuNLP lab, Department of Computing, University of Turku, 20014 Turku, Finland
| | - Radja Hachilif
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Annika L Gable
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Tao Fang
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Nadezhda T Doncheva
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Sampo Pyysalo
- TurkuNLP lab, Department of Computing, University of Turku, 20014 Turku, Finland
| | - Peer Bork
- Correspondence may also be addressed to Peer Bork. Tel: +49 6221 387 8526; Fax: +49 6221 387 517;
| | - Lars J Jensen
- Correspondence may also be addressed to Lars J. Jensen. Tel: +45 3 532 5025;
| | - Christian von Mering
- To whom correspondence should be addressed. Tel: +41 44 6353147; Fax: +41 44 6356864;
| |
Collapse
|
16
|
Yadav Y, Subbaroyan A, Martin OC, Samal A. Relative importance of composition structures and biologically meaningful logics in bipartite Boolean models of gene regulation. Sci Rep 2022; 12:18156. [PMID: 36307465 PMCID: PMC9616893 DOI: 10.1038/s41598-022-22654-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Accepted: 10/18/2022] [Indexed: 12/31/2022] Open
Abstract
Boolean networks have been widely used to model gene networks. However, such models are coarse-grained to an extent that they abstract away molecular specificities of gene regulation. Alternatively, bipartite Boolean network models of gene regulation explicitly distinguish genes from transcription factors (TFs). In such bipartite models, multiple TFs may simultaneously contribute to gene regulation by forming heteromeric complexes, thus giving rise to composition structures. Since bipartite Boolean models are relatively recent, an empirical investigation of their biological plausibility is lacking. Here, we estimate the prevalence of composition structures arising through heteromeric complexes. Moreover, we present an additional mechanism where composition structures may arise as a result of multiple TFs binding to cis-regulatory regions and provide empirical support for this mechanism. Next, we compare the restriction in BFs imposed by composition structures and by biologically meaningful properties. We find that though composition structures can severely restrict the number of Boolean functions (BFs) driving a gene, the two types of minimally complex BFs, namely nested canalyzing functions (NCFs) and read-once functions (RoFs), are comparatively more restrictive. Finally, we find that composition structures are highly enriched in real networks, but this enrichment most likely comes from NCFs and RoFs.
Collapse
Affiliation(s)
- Yasharth Yadav
- The Institute of Mathematical Sciences (IMSc), Chennai, 600113, India
| | - Ajay Subbaroyan
- The Institute of Mathematical Sciences (IMSc), Chennai, 600113, India
- Homi Bhabha National Institute (HBNI), Mumbai, 400094, India
| | - Olivier C Martin
- Université Paris-Saclay, CNRS, INRAE, Univ Evry, Institute of Plant Sciences Paris-Saclay (IPS2), 91190, Gif sur Yvette, France.
- Université Paris Cité, CNRS, INRAE, Institute of Plant Sciences Paris-Saclay (IPS2), 91190, Gif sur Yvette, France.
| | - Areejit Samal
- The Institute of Mathematical Sciences (IMSc), Chennai, 600113, India.
- Homi Bhabha National Institute (HBNI), Mumbai, 400094, India.
| |
Collapse
|
17
|
Wilken SE, Besançon M, Kratochvíl M, Foko Kuate CA, Trefois C, Gu W, Ebenhöh O. Interrogating the effect of enzyme kinetics on metabolism using differentiable constraint-based models. Metab Eng 2022; 74:72-82. [PMID: 36152931 DOI: 10.1016/j.ymben.2022.09.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 09/08/2022] [Accepted: 09/10/2022] [Indexed: 10/31/2022]
Abstract
Metabolic models are typically characterized by a large number of parameters. Traditionally, metabolic control analysis is applied to differential equation-based models to investigate the sensitivity of predictions to parameters. A corresponding theory for constraint-based models is lacking, due to their formulation as optimization problems. Here, we show that optimal solutions of optimization problems can be efficiently differentiated using constrained optimization duality and implicit differentiation. We use this to calculate the sensitivities of predicted reaction fluxes and enzyme concentrations to turnover numbers in an enzyme-constrained metabolic model of Escherichia coli. The sensitivities quantitatively identify rate limiting enzymes and are mathematically precise, unlike current finite difference based approaches used for sensitivity analysis. Further, efficient differentiation of constraint-based models unlocks the ability to use gradient information for parameter estimation. We demonstrate this by improving, genome-wide, the state-of-the-art turnover number estimates for E. coli. Finally, we show that this technique can be generalized to arbitrarily complex models. By differentiating the optimal solution of a model incorporating both thermodynamic and kinetic rate equations, the effect of metabolite concentrations on biomass growth can be elucidated. We benchmark these metabolite sensitivities against a large experimental gene knockdown study, and find good alignment between the predicted sensitivities and in vivo metabolome changes. In sum, we demonstrate several applications of differentiating optimal solutions of constraint-based metabolic models, and show how it connects to classic metabolic control analysis.
Collapse
Affiliation(s)
- St Elmo Wilken
- Institute of Quantitative and Theoretical Biology, Heinrich-Heine-Universität Düsseldorf, Universitätsstraße 1, 40225, Düsseldorf, Germany; Cluster of Excellence on Plant Sciences, Heinrich-Heine-Universität Düsseldorf, Universitätsstraße 1, 40225, Düsseldorf, Germany.
| | - Mathieu Besançon
- Department for AI in Society, Science, and Technology, Zuse Institute Berlin, Takustraße 7, 14195, Berlin, Germany
| | - Miroslav Kratochvíl
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Campus Belval, L-4367, Belvaux, Luxembourg
| | - Chilperic Armel Foko Kuate
- Institute of Quantitative and Theoretical Biology, Heinrich-Heine-Universität Düsseldorf, Universitätsstraße 1, 40225, Düsseldorf, Germany
| | - Christophe Trefois
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Campus Belval, L-4367, Belvaux, Luxembourg
| | - Wei Gu
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Campus Belval, L-4367, Belvaux, Luxembourg
| | - Oliver Ebenhöh
- Institute of Quantitative and Theoretical Biology, Heinrich-Heine-Universität Düsseldorf, Universitätsstraße 1, 40225, Düsseldorf, Germany; Cluster of Excellence on Plant Sciences, Heinrich-Heine-Universität Düsseldorf, Universitätsstraße 1, 40225, Düsseldorf, Germany
| |
Collapse
|
18
|
Cabrera-Orefice A, Potter A, Evers F, Hevler JF, Guerrero-Castillo S. Complexome Profiling-Exploring Mitochondrial Protein Complexes in Health and Disease. Front Cell Dev Biol 2022; 9:796128. [PMID: 35096826 PMCID: PMC8790184 DOI: 10.3389/fcell.2021.796128] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 12/08/2021] [Indexed: 12/14/2022] Open
Abstract
Complexome profiling (CP) is a state-of-the-art approach that combines separation of native proteins by electrophoresis, size exclusion chromatography or density gradient centrifugation with tandem mass spectrometry identification and quantification. Resulting data are computationally clustered to visualize the inventory, abundance and arrangement of multiprotein complexes in a biological sample. Since its formal introduction a decade ago, this method has been mostly applied to explore not only the composition and abundance of mitochondrial oxidative phosphorylation (OXPHOS) complexes in several species but also to identify novel protein interactors involved in their assembly, maintenance and functions. Besides, complexome profiling has been utilized to study the dynamics of OXPHOS complexes, as well as the impact of an increasing number of mutations leading to mitochondrial disorders or rearrangements of the whole mitochondrial complexome. Here, we summarize the major findings obtained by this approach; emphasize its advantages and current limitations; discuss multiple examples on how this tool could be applied to further investigate pathophysiological mechanisms and comment on the latest advances and opportunity areas to keep developing this methodology.
Collapse
Affiliation(s)
- Alfredo Cabrera-Orefice
- Center for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Alisa Potter
- Department of Pediatrics, Radboud Center for Mitochondrial Medicine, Radboud University Medical Center, Nijmegen, Netherlands
| | - Felix Evers
- Department of Medical Microbiology, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Johannes F Hevler
- Biomolecular Mass Spectrometry and Proteomics, University of Utrecht, Utrecht, Netherlands.,Bijvoet Center for Biomolecular Research, University of Utrecht, Utrecht, Netherlands.,Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, Netherlands.,Netherlands Proteomics Center, Utrecht, Netherlands
| | - Sergio Guerrero-Castillo
- University Children's Research@Kinder-UKE, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| |
Collapse
|
19
|
Abstract
The 2022 Nucleic Acids Research Database Issue contains 185 papers, including 87 papers reporting on new databases and 85 updates from resources previously published in the Issue. Thirteen additional manuscripts provide updates on databases most recently published elsewhere. Seven new databases focus specifically on COVID-19 and SARS-CoV-2, including SCoV2-MD, the first of the Issue's Breakthrough Articles. Major nucleic acid databases reporting updates include MODOMICS, JASPAR and miRTarBase. The AlphaFold Protein Structure Database, described in the second Breakthrough Article, is the stand-out in the protein section, where the Human Proteoform Atlas and GproteinDb are other notable new arrivals. Updates from DisProt, FuzDB and ELM comprehensively cover disordered proteins. Under the metabolism and signalling section Reactome, ConsensusPathDB, HMDB and CAZy are major returning resources. In microbial and viral genomes taxonomy and systematics are well covered by LPSN, TYGS and GTDB. Genomics resources include Ensembl, Ensembl Genomes and UCSC Genome Browser. Major returning pharmacology resource names include the IUPHAR/BPS guide and the Therapeutic Target Database. New plant databases include PlantGSAD for gene lists and qPTMplants for post-translational modifications. The entire Database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). Our latest update to the NAR online Molecular Biology Database Collection brings the total number of entries to 1645. Following last year's major cleanup, we have updated 317 entries, listing 89 new resources and trimming 80 discontinued URLs. The current release is available at http://www.oxfordjournals.org/nar/database/c/.
Collapse
Affiliation(s)
- Daniel J Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK
| | | |
Collapse
|
20
|
Cantelli G, Bateman A, Brooksbank C, Petrov AI, Malik-Sheriff R, Ide-Smith M, Hermjakob H, Flicek P, Apweiler R, Birney E, McEntyre J. The European Bioinformatics Institute (EMBL-EBI) in 2021. Nucleic Acids Res 2022; 50:D11-D19. [PMID: 34850134 PMCID: PMC8690175 DOI: 10.1093/nar/gkab1127] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/14/2021] [Accepted: 11/23/2021] [Indexed: 11/28/2022] Open
Abstract
The European Bioinformatics Institute (EMBL-EBI) maintains a comprehensive range of freely available and up-to-date molecular data resources, which includes over 40 resources covering every major data type in the life sciences. This year's service update for EMBL-EBI includes new resources, PGS Catalog and AlphaFold DB, and updates on existing resources, including the COVID-19 Data Platform, trRosetta and RoseTTAfold models introduced in Pfam and InterPro, and the launch of Genome Integrations with Function and Sequence by UniProt and Ensembl. Furthermore, we highlight projects through which EMBL-EBI has contributed to the development of community-driven data standards and guidelines, including the Recommended Metadata for Biological Images (REMBI), and the BioModels Reproducibility Scorecard. Training is one of EMBL-EBI's core missions and a key component of the provision of bioinformatics services to users: this year's update includes many of the improvements that have been developed to EMBL-EBI's online training offering.
Collapse
Affiliation(s)
- Gaia Cantelli
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cath Brooksbank
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Anton I Petrov
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rahuman S Malik-Sheriff
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Michele Ide-Smith
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rolf Apweiler
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Johanna McEntyre
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|