1
|
A systems-biology approach connects aging mechanisms with Alzheimer's disease pathogenesis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.17.585262. [PMID: 38559190 PMCID: PMC10980014 DOI: 10.1101/2024.03.17.585262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Age is the strongest risk factor for developing Alzheimer's disease, the most common neurodegenerative disorder. However, the mechanisms connecting advancing age to neurodegeneration in Alzheimer's disease are incompletely understood. We conducted an unbiased, genome-scale, forward genetic screen for age-associated neurodegeneration in Drosophila to identify the underlying biological processes required for maintenance of aging neurons. To connect genetic screen hits to Alzheimer's disease pathways, we measured proteomics, phosphoproteomics, and metabolomics in Drosophila models of Alzheimer's disease. We further identified Alzheimer's disease human genetic variants that modify expression in disease-vulnerable neurons. Through multi-omic, multi-species network integration of these data, we identified relationships between screen hits and tau-mediated neurotoxicity. Furthermore, we computationally and experimentally identified relationships between screen hits and DNA damage in Drosophila and human iPSC-derived neural progenitor cells. Our work identifies candidate pathways that could be targeted to attenuate the effects of age on neurodegeneration and Alzheimer's disease.
Collapse
|
2
|
A paradigm for post-embryonic Oct4 re-expression: E7-induced hydroxymethylation regulates Oct4 expression in cervical cancer. J Med Virol 2023; 95:e29264. [PMID: 38054553 DOI: 10.1002/jmv.29264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 11/08/2023] [Accepted: 11/11/2023] [Indexed: 12/07/2023]
Abstract
The Octamer-binding transcription factor-4 (Oct4) is upregulated in different malignancies, yet a paradigm for mechanisms of Oct4 post-embryonic re-expression is inadequately understood. In cervical cancer, Oct4 expression is higher in human papillomavirus (HPV)-related than HPV-unrelated cervical cancers and this upregulation correlates with the expression of the E7 oncogene. We have reported that E7 affects the Oct4-transcriptional output and Oct4-related phenotypes in cervical cancer, however, the underlying mechanism remains elusive. Here, we characterize the Oct4-protein interactions in cervical cancer cells via computational analyses and Mass Spectrometry and reveal that Methyl-binding proteins (MBD2 and MBD3), are determinants of Oct4-driven transcription. E7 triggers MBD2 downregulation and TET1 upregulation, thereby disrupting the methylation status of the Oct4 gene. This coincides with an increase in the total DNA hydroxymethylation leading to the re-expression of Oct4 in cervical cancer and likely affecting broader transcriptional patterns. Our findings reveal a previously unreported mechanism by which the E7 oncogene can regulate Oct4 re-expression and global transcriptional patterns by increasing DNA hydroxymethylation and lowering the barrier to cellular plasticity during carcinogenesis.
Collapse
|
3
|
The landscape of microRNA interaction annotation: analysis of three rare disorders as a case study. Database (Oxford) 2023; 2023:baad066. [PMID: 37819683 PMCID: PMC10566539 DOI: 10.1093/database/baad066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 08/29/2023] [Accepted: 09/15/2023] [Indexed: 10/13/2023]
Abstract
In recent years, a huge amount of data on ncRNA interactions has been described in scientific papers and databases. Although considerable effort has been made to annotate the available knowledge in public repositories, there are still significant discrepancies in how different resources capture and interpret data on ncRNA functional and physical associations. In the present paper, we present a collection of microRNA-mRNA interactions annotated from the scientific literature following recognized standard criteria and focused on microRNAs, which regulate genes associated with rare diseases as a case study. The list of protein-coding genes with a known role in specific rare diseases was retrieved from the Genome England PanelApp, and associated microRNA-mRNA interactions were annotated in the IntAct database and compared with other datasets. RNAcentral identifiers were used for unambiguous, stable identification of ncRNAs. The information about the interaction was enhanced by a detailed description of the cell types and experimental conditions, providing a computer-interpretable summary of the published data, integrated with the huge amount of protein interactions already gathered in the database. Furthermore, for each interaction, the binding sites of the microRNA are precisely mapped on a well-defined mRNA transcript of the target gene. This information is crucial to conceive and design optimal microRNA mimics or inhibitors to interfere in vivo with a deregulated process. As these approaches become more feasible, high-quality, reliable networks of microRNA interactions are needed to help, for instance, in the selection of the best target to be inhibited and to predict potential secondary off-target effects. Database URL https://www.ebi.ac.uk/intact.
Collapse
|
4
|
A network paradigm predicts drug synergistic effects using downstream protein-protein interactions. CPT Pharmacometrics Syst Pharmacol 2022; 11:1527-1538. [PMID: 36204824 PMCID: PMC9662203 DOI: 10.1002/psp4.12861] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 08/05/2022] [Accepted: 08/11/2022] [Indexed: 11/16/2022] Open
Abstract
In some cases, drug combinations affect adverse outcome phenotypes by binding the same protein; however, drug-binding proteins are associated through protein-protein interaction (PPI) networks within the cell, suggesting that drug phenotypes may result from long-range network effects. We first used PPI network analysis to classify drugs based on proteins downstream of their targets and next predicted drug combination effects where drugs shared network proteins but had distinct binding proteins (e.g., targets, enzymes, or transporters). By classifying drugs using their downstream proteins, we had an 80.7% sensitivity for predicting rare drug combination effects documented in gold-standard datasets. We further measured the effect of predicted drug combinations on adverse outcome phenotypes using novel observational studies in the electronic health record. We tested predictions for 60 network-drug classes on seven adverse outcomes and measured changes in clinical outcomes for predicted combinations. These results demonstrate a novel paradigm for anticipating drug synergistic effects using proteins downstream of drug targets.
Collapse
|
5
|
Overview of methods for characterization and visualization of a protein–protein interaction network in a multi-omics integration context. Front Mol Biosci 2022; 9:962799. [PMID: 36158572 PMCID: PMC9494275 DOI: 10.3389/fmolb.2022.962799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 08/16/2022] [Indexed: 11/26/2022] Open
Abstract
At the heart of the cellular machinery through the regulation of cellular functions, protein–protein interactions (PPIs) have a significant role. PPIs can be analyzed with network approaches. Construction of a PPI network requires prediction of the interactions. All PPIs form a network. Different biases such as lack of data, recurrence of information, and false interactions make the network unstable. Integrated strategies allow solving these different challenges. These approaches have shown encouraging results for the understanding of molecular mechanisms, drug action mechanisms, and identification of target genes. In order to give more importance to an interaction, it is evaluated by different confidence scores. These scores allow the filtration of the network and thus facilitate the representation of the network, essential steps to the identification and understanding of molecular mechanisms. In this review, we will discuss the main computational methods for predicting PPI, including ones confirming an interaction as well as the integration of PPIs into a network, and we will discuss visualization of these complex data.
Collapse
|
6
|
Multilayered Networks of SalmoNet2 Enable Strain Comparisons of the Salmonella Genus on a Molecular Level. mSystems 2022; 7:e0149321. [PMID: 35913188 PMCID: PMC9426430 DOI: 10.1128/msystems.01493-21] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Serovars of the genus Salmonella primarily evolved as gastrointestinal pathogens in a wide range of hosts. Some serotypes later evolved further, adopting a more invasive lifestyle in a narrower host range associated with systemic infections. A system-level knowledge of these pathogens could identify the complex adaptations associated with the evolution of serovars with distinct pathogenicity, host range, and risk to human health. This promises to aid the design of interventions and serve as a knowledge base in the Salmonella research community. Here, we present SalmoNet2, a major update to SalmoNet1, the first multilayered interaction resource for Salmonella strains, containing protein-protein, transcriptional regulatory, and enzyme-enzyme interactions. The new version extends the number of Salmonella networks from 11 to 20. We now include a strain from the second species in the Salmonella genus, a strain from the Salmonella enterica subspecies arizonae and additional strains of importance from the subspecies enterica, including S. Typhimurium strain D23580, an epidemic multidrug-resistant strain associated with invasive nontyphoidal salmonellosis (iNTS). The database now uses strain specific metabolic models instead of a generalized model to highlight differences between strains. The update has increased the coverage of high-quality protein-protein interactions, and enhanced interoperability with other computational resources by adopting standardized formats. The resource website has been updated with tutorials to help researchers analyze their Salmonella data using molecular interaction networks from SalmoNet2. SalmoNet2 is accessible at http://salmonet.org/. IMPORTANCE Multilayered network databases collate interaction information from multiple sources, and are powerful both as a knowledge base and subject of analysis. Here, we present SalmoNet2, an integrated network resource containing protein-protein, transcriptional regulatory, and metabolic interactions for 20 Salmonella strains. Key improvements to the update include expanding the number of strains, strain-specific metabolic networks, an increase in high-quality protein-protein interactions, community standard computational formats to help interoperability, and online tutorials to help users analyze their data using SalmoNet2.
Collapse
|
7
|
The Intricacy of the Viral-Human Protein Interaction Networks: Resources, Data, and Analyses. Front Microbiol 2022; 13:849781. [PMID: 35531299 PMCID: PMC9069133 DOI: 10.3389/fmicb.2022.849781] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Accepted: 03/11/2022] [Indexed: 11/18/2022] Open
Abstract
Viral infections are one of the major causes of human diseases that cause yearly millions of deaths and seriously threaten global health, as we have experienced with the COVID-19 pandemic. Numerous approaches have been adopted to understand viral diseases and develop pharmacological treatments. Among them, the study of virus-host protein-protein interactions is a powerful strategy to comprehend the molecular mechanisms employed by the virus to infect the host cells and to interact with their components. Experimental protein-protein interactions described in the scientific literature have been systematically captured into several molecular interaction databases. These data are organized in structured formats and can be easily downloaded by users to perform further bioinformatic and network studies. Network analysis of available virus-host interactomes allow us to understand how the host interactome is perturbed upon viral infection and what are the key host proteins targeted by the virus and the main cellular pathways that are subverted. In this review, we give an overview of publicly available viral-human protein-protein interactions resources and the community standards, curation rules and adopted ontologies. A description of the main virus-human interactome available is provided, together with the main network analyses that have been performed. We finally discuss the main limitations and future challenges to assess the quality and reliability of protein-protein interaction datasets and resources.
Collapse
|
8
|
Deep Learning-Powered Prediction of Human-Virus Protein-Protein Interactions. Front Microbiol 2022; 13:842976. [PMID: 35495666 PMCID: PMC9051481 DOI: 10.3389/fmicb.2022.842976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 03/25/2022] [Indexed: 11/13/2022] Open
Abstract
Identifying human-virus protein-protein interactions (PPIs) is an essential step for understanding viral infection mechanisms and antiviral response of the human host. Recent advances in high-throughput experimental techniques enable the significant accumulation of human-virus PPI data, which have further fueled the development of machine learning-based human-virus PPI prediction methods. Emerging as a very promising method to predict human-virus PPIs, deep learning shows the powerful ability to integrate large-scale datasets, learn complex sequence-structure relationships of proteins and convert the learned patterns into final prediction models with high accuracy. Focusing on the recent progresses of deep learning-powered human-virus PPI predictions, we review technical details of these newly developed methods, including dataset preparation, deep learning architectures, feature engineering, and performance assessment. Moreover, we discuss the current challenges and potential solutions and provide future perspectives of human-virus PPI prediction in the coming post-AlphaFold2 era.
Collapse
|
9
|
From random to predictive: a context-specific interaction framework improves selection of drug protein-protein interactions for unknown drug pathways. Integr Biol (Camb) 2022; 14:13-24. [PMID: 35293584 DOI: 10.1093/intbio/zyac002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 02/01/2022] [Accepted: 02/03/2022] [Indexed: 12/20/2022]
Abstract
With high drug attrition, protein-protein interaction (PPI) network models are attractive as efficient methods for predicting drug outcomes by analyzing proteins downstream of drug targets. Unfortunately, these methods tend to overpredict associations and they have low precision and prediction performance; performance is often no better than random (AUROC ~0.5). Typically, PPI models identify ranked phenotypes associated with downstream proteins, yet methods differ in prioritization of downstream proteins. Most methods apply global approaches for assessing all phenotypes. We hypothesized that a per-phenotype analysis could improve prediction performance. We compared two global approaches-statistical and distance-based-and our novel per-phenotype approach, 'context-specific interaction' (CSI) analysis, on severe side effect prediction. We used a novel dataset of adverse events (or designated medical events, DMEs) and discovered that CSI had a 50% improvement over global approaches (AUROC 0.77 compared to 0.51), and a 76-95% improvement in average precision (0.499 compared to 0.284, 0.256). Our results provide a quantitative rationale for considering downstream proteins on a per-phenotype basis when using PPI network methods to predict drug phenotypes.
Collapse
|
10
|
Towards a reproducible interactome: semantic-based detection of redundancies to unify protein-protein interaction databases. Bioinformatics 2022; 38:1685-1691. [PMID: 35015827 DOI: 10.1093/bioinformatics/btac013] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 11/29/2021] [Accepted: 01/06/2022] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION Information on protein-protein interactions is collected in numerous primary databases with their own curation process. Several meta-databases aggregate primary databases to provide more exhaustive datasets. In addition to exhaustivity, aggregation contributes to reliability by providing an overview of the various studies and detection methods supporting an interaction. However, interactions listed in different primary databases are partly redundant because some publications reporting protein-protein interactions have been curated by multiple primary databases. Mere aggregation can thus introduce a bias if these redundancies are not identified and eliminated. To overcome this bias, meta-databases rely on the Molecular Interaction ontology that describes interaction detection methods, but they do not fully take advantage of the ontology's rich semantics, which leads to systematically overestimating interaction reproducibility. RESULTS We propose a precise definition of explicit and implicit redundancy and show that both can be easily detected using Semantic Web technologies. We apply this process to a dataset from the Agile Protein Interactomes DataServer (APID) meta-database and show that while explicit redundancies were detected by the APID aggregation process, about 15% of APID entries are implicitly redundant and should not be taken into account when presenting confidence-related metrics. More than 90% of implicit redundancies result from the aggregation of distinct primary databases, whereas the remaining occurs between entries of a single database. Finally, we build a 'reproducible interactome' with interactions that have been reproduced by multiple methods or publications. The size of the reproducible interactome is drastically impacted by removing redundancies for both yeast (-59%) and human (-56%), and we show that this is largely due to implicit redundancies. AVAILABILITY AND IMPLEMENTATION Software, data and results are available at https://gitlab.com/nnet56/reproducible-interactome, https://reproducible-interactome.genouest.org/, Zenodo (https://doi.org/10.5281/zenodo.5595037) and NDEx (https://doi.org/10.18119/N94302 and https://doi.org/10.18119/N97S4D). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
11
|
Abstract
Molecular interaction databases aim to systematically capture and organize the experimental interaction information described in the scientific literature. These data can then be used to perform network analysis, to assign putative roles to uncharacterized proteins and to investigate their involvement in cellular pathways.This chapter gives a brief overview of publicly available molecular interaction databases and focuses on the members of the IMEx Consortium, on their curation policies and standard data formats. All of the goals achieved by IMEx databases over the last 15 years, the data types provided and the many different ways in which such data can be utilized by the research community, are described in detail. The IMEx databases curate molecular interaction data to the highest caliber, following a detailed curation model and supplying rich metadata by employing common curation rules and harmonized standards. The IMEx Consortium provides comprehensively annotated molecular interaction data integrated into a single, non-redundant, open access dataset.
Collapse
|
12
|
The IntAct database: efficient access to fine-grained molecular interaction data. Nucleic Acids Res 2021; 50:D648-D653. [PMID: 34761267 PMCID: PMC8728211 DOI: 10.1093/nar/gkab1006] [Citation(s) in RCA: 73] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 10/06/2021] [Accepted: 10/21/2021] [Indexed: 01/18/2023] Open
Abstract
The IntAct molecular interaction database (https://www.ebi.ac.uk/intact) is a curated resource of molecular interactions, derived from the scientific literature and from direct data depositions. As of August 2021, IntAct provides more than one million binary interactions, curated by twelve global partners of the International Molecular Exchange consortium, for which the IntAct database provides a shared curation and dissemination platform. The IMEx curation policy has always emphasised a fine-grained data and curation model, aiming to capture the relevant experimental detail essential for the interpretation of the provided molecular interaction data. Here, we present recent curation focus and progress, as well as a completely redeveloped website which presents IntAct data in a much more user-friendly and detailed way.
Collapse
|
13
|
In silico predictions of protein interactions between Zika virus and human host. PeerJ 2021; 9:e11770. [PMID: 34513323 PMCID: PMC8395582 DOI: 10.7717/peerj.11770] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 06/23/2021] [Indexed: 11/20/2022] Open
Abstract
Background The ZIKA virus (ZIKV) belongs to the Flaviviridae family, was first isolated in the 1940s, and remained underreported until its global threat in 2016, where drastic consequences were reported as Guillan-Barre syndrome and microcephaly in newborns. Understanding molecular interactions of ZIKV proteins during the host infection is important to develop treatments and prophylactic measures; however, large-scale experimental approaches normally used to detect protein-protein interaction (PPI) are onerous and labor-intensive. On the other hand, computational methods may overcome these challenges and guide traditional approaches on one or few protein molecules. The prediction of PPIs can be used to study host-parasite interactions at the protein level and reveal key pathways that allow viral infection. Results Applying Random Forest and Support Vector Machine (SVM) algorithms, we performed predictions of PPI between two ZIKV strains and human proteomes. The consensus number of predictions of both algorithms was 17,223 pairs of proteins. Functional enrichment analyses were executed with the predicted networks to access the biological meanings of the protein interactions. Some pathways related to viral infection and neurological development were found for both ZIKV strains in the enrichment analysis, but the JAK-STAT pathway was observed only for strain PE243 when compared with the FSS13025 strain. Conclusions The consensus network of PPI predictions made by Random Forest and SVM algorithms allowed an enrichment analysis that corroborates many aspects of ZIKV infection. The enrichment results are mainly related to viral infection, neuronal development, and immune response, and presented differences among the two compared ZIKV strains. Strain PE243 presented more predicted interactions between proteins from the JAK-STAT signaling pathway, which could lead to a more inflammatory immune response when compared with the FSS13025 strain. These results show that the methodology employed in this study can potentially reveal new interactions between the ZIKV and human cells.
Collapse
|
14
|
Integration of transcription coregulator complexes with sequence-specific DNA-binding factor interactomes. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2021; 1864:194749. [PMID: 34425241 PMCID: PMC10359485 DOI: 10.1016/j.bbagrm.2021.194749] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 08/13/2021] [Accepted: 08/19/2021] [Indexed: 12/22/2022]
Abstract
The domain of transcription regulation has been notoriously difficult to annotate in the Gene Ontology, partly because of the intricacies of gene regulation which involve molecular interactions with DNA as well as amongst protein complexes. The molecular function 'transcription coregulator activity' is a part of the biological process 'regulation of transcription, DNA-templated' that occurs in the cellular component 'chromatin'. It can mechanistically link sequence-specific DNA-binding transcription factor (dbTF) regulatory DNA target sites to coactivator and corepressor target sites through the molecular function 'cis-regulatory region sequence-specific DNA binding'. Many questions arise about transcription coregulators (coTF). Here, we asked how many unannotated, putative coregulators can be identified in protein complexes? Therefore, we mined the CORUM and hu.MAP protein complex databases with known and strongly presumed human transcription coregulators. In addition, we trawled the BioGRID and IntAct molecular interaction databases for interactors of the known 1457 human dbTFs annotated by the GREEKC and GO consortia. This yielded 1093 putative transcription factor coregulator complex subunits, of which 954 interact directly with a dbTF. This substantially expands the set of coTFs that could be annotated to 'transcription coregulator activity' and sets the stage for renewed annotation and wet-lab research efforts. To this end, we devised a prioritisation score based on existing GO annotations of already curated transcription coregulators as well as interactome representation. Since all the proteins that we mined are parts of protein complexes, we propose to concomitantly engage in annotation of the putative transcription coregulator-containing complexes in the Complex Portal database.
Collapse
|
15
|
BioERP: biomedical heterogeneous network-based self-supervised representation learning approach for entity relationship predictions. Bioinformatics 2021; 37:4793-4800. [PMID: 34329382 DOI: 10.1093/bioinformatics/btab565] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Revised: 07/18/2021] [Accepted: 07/29/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Predicting entity relationship can greatly benefit important biomedical problems. Recently, a large amount of biomedical heterogeneous networks (BioHNs) are generated and offer opportunities for developing network-based learning approaches to predict relationships among entities. However, current researches slightly explored BioHNs-based self-supervised representation learning methods, and are hard to simultaneously capturing local- and global-level association information among entities. RESULTS In this study, we propose a biomedical heterogeneous network-based self-supervised representation learning approach for entity relationship predictions, termed BioERP. A self-supervised meta path detection mechanism is proposed to train a deep Transformer encoder model that can capture the global structure and semantic feature in BioHNs. Meanwhile, a biomedical entity mask learning strategy is designed to reflect local associations of vertices. Finally, the representations from different task models are concatenated to generate two-level representation vectors for predicting relationships among entities. The results on eight datasets show BioERP outperforms 30 state-of-the-art methods. In particular, BioERP reveals great performance with results close to 1 in terms of AUC and AUPR on the drug-target interaction predictions. In summary, BioERP is a promising bio-entity relationship prediction approach. AVAILABILITY Source code and data can be downloaded from https://github.com/pengsl-lab/BioERP.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
16
|
IntAct App: a Cytoscape application for molecular interaction network visualisation and analysis. Bioinformatics 2021; 37:3684-3685. [PMID: 33961020 PMCID: PMC8545338 DOI: 10.1093/bioinformatics/btab319] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Revised: 04/08/2021] [Accepted: 04/27/2021] [Indexed: 01/24/2023] Open
Abstract
Summary IntAct App is a Cytoscape 3 application that grants in-depth access to IntAct’s molecular interaction data. It build networks where nodes are interacting molecules (mainly proteins, but also genes, RNA, chemicals…) and edges represent evidence of interaction. Users can query a network by providing its molecules, identified by different fields and optionally include all their interacting partners in the resulting network. The app offers three visualizations: one only displaying interactions, another representing every evidence and the last one emphasizing evidence where mutated versions of proteins were used. Users can also filter networks and click on nodes and edges to access all their related details. Finally, the application supports automation of its main features via Cytoscape commands. Availability and implementation Implementation available at https://apps.cytoscape.org/apps/intactapp, while the source code is available at https://github.com/EBI-IntAct/IntactApp.
Collapse
|
17
|
Current status and future perspectives of computational studies on human-virus protein-protein interactions. Brief Bioinform 2021; 22:6161422. [PMID: 33693490 DOI: 10.1093/bib/bbab029] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 01/14/2021] [Accepted: 01/20/2021] [Indexed: 12/19/2022] Open
Abstract
The protein-protein interactions (PPIs) between human and viruses mediate viral infection and host immunity processes. Therefore, the study of human-virus PPIs can help us understand the principles of human-virus relationships and can thus guide the development of highly effective drugs to break the transmission of viral infectious diseases. Recent years have witnessed the rapid accumulation of experimentally identified human-virus PPI data, which provides an unprecedented opportunity for bioinformatics studies revolving around human-virus PPIs. In this article, we provide a comprehensive overview of computational studies on human-virus PPIs, especially focusing on the method development for human-virus PPI predictions. We briefly introduce the experimental detection methods and existing database resources of human-virus PPIs, and then discuss the research progress in the development of computational prediction methods. In particular, we elaborate the machine learning-based prediction methods and highlight the need to embrace state-of-the-art deep-learning algorithms and new feature engineering techniques (e.g. the protein embedding technique derived from natural language processing). To further advance the understanding in this research topic, we also outline the practical applications of the human-virus interactome in fundamental biological discovery and new antiviral therapy development.
Collapse
|
18
|
DeepViral: prediction of novel virus-host interactions from protein sequences and infectious disease phenotypes. Bioinformatics 2021; 37:2722-2729. [PMID: 33682875 PMCID: PMC8428617 DOI: 10.1093/bioinformatics/btab147] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 01/18/2021] [Accepted: 03/01/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Infectious diseases caused by novel viruses have become a major public health concern. Rapid identification of virus-host interactions can reveal mechanistic insights into infectious diseases and shed light on potential treatments. Current computational prediction methods for novel viruses are based mainly on protein sequences. However, it is not clear to what extent other important features, such as the symptoms caused by the viruses, could contribute to a predictor. Disease phenotypes (i.e., signs and symptoms) are readily accessible from clinical diagnosis and we hypothesize that they may act as a potential proxy and an additional source of information for the underlying molecular interactions between the pathogens and hosts. RESULTS We developed DeepViral, a deep learning based method that predicts protein-protein interactions (PPI) between humans and viruses. Motivated by the potential utility of infectious disease phenotypes, we first embedded human proteins and viruses in a shared space using their associated phenotypes and functions, supported by formalized background knowledge from biomedical ontologies. By jointly learning from protein sequences and phenotype features, DeepViral significantly improves over existing sequence-based methods for intra- and inter-species PPI prediction. AVAILABILITY Code and datasets for reproduction and customization are available at https://github.com/bio-ontology-research-group/DeepViral. Prediction results for 14 virus families are available at https://doi.org/10.5281/zenodo.4429824.
Collapse
|
19
|
Towards a unified open access dataset of molecular interactions. Nat Commun 2020; 11:6144. [PMID: 33262342 PMCID: PMC7708836 DOI: 10.1038/s41467-020-19942-z] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Accepted: 11/09/2020] [Indexed: 12/16/2022] Open
Abstract
The International Molecular Exchange (IMEx) Consortium provides scientists with a single body of experimentally verified protein interactions curated in rich contextual detail to an internationally agreed standard. In this update to the work of the IMEx Consortium, we discuss how this initiative has been working in practice, how it has ensured database sustainability, and how it is meeting emerging annotation challenges through the introduction of new interactor types and data formats. Additionally, we provide examples of how IMEx data are being used by biomedical researchers and integrated in other bioinformatic tools and resources.
Collapse
|
20
|
Multilevel regulation of muscle-specific transcription factor hlh-1 during Caenorhabditis elegans embryogenesis. Dev Genes Evol 2020; 230:265-278. [PMID: 32556563 PMCID: PMC7371654 DOI: 10.1007/s00427-020-00662-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2020] [Accepted: 05/31/2020] [Indexed: 11/29/2022]
Abstract
hlh-1 is a myogenic transcription factor required for body-wall muscle specification during embryogenesis in Caenorhabditis elegans. Despite its well-known role in muscle specification, comprehensive regulatory control upstream of hlh-1 remains poorly defined. Here, we first established a statistical reference for the spatiotemporal expression of hlh-1 at single-cell resolution up to the second last round of divisions for most of the cell lineages (from 4- to 350-cell stage) using 13 wild-type embryos. We next generated lineal expression of hlh-1 after RNA interference (RNAi) perturbation of 65 genes, which were selected based on their degree of conservation, mutant phenotypes, and known roles in development. We then compared the expression profiles between wild-type and RNAi embryos by clustering according to their lineal expression patterns using mean-shift and density-based clustering algorithms, which not only confirmed the roles of existing genes but also uncovered the potential functions of novel genes in muscle specification at multiple levels, including cellular, lineal, and embryonic levels. By combining the public data on protein-protein interactions, protein-DNA interactions, and genetic interactions with our RNAi data, we inferred regulatory pathways upstream of hlh-1 that function globally or locally. This work not only revealed diverse and multilevel regulatory mechanisms coordinating muscle differentiation during C. elegans embryogenesis but also laid a foundation for further characterizing the regulatory pathways controlling muscle specification at the cellular, lineal (local), or embryonic (global) level.
Collapse
|
21
|
Guilt-by-Association - Functional Insights Gained From Studying the LRRK2 Interactome. Front Neurosci 2020; 14:485. [PMID: 32508578 PMCID: PMC7251075 DOI: 10.3389/fnins.2020.00485] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Accepted: 04/20/2020] [Indexed: 12/11/2022] Open
Abstract
The Parkinson's disease-associated Leucine-rich repeat kinase 2 (LRRK2) is a complex multi-domain protein belonging to the Roco protein family, a unique group of G-proteins. Variants of this gene are associated with an increased risk of Parkinson's disease. Besides its well-characterized enzymatic activities, conferred by its GTPase and kinase domains, and a central dimerization domain, it contains four predicted repeat domains, which are, based on their structure, commonly involved in protein-protein interactions (PPIs). In the past decades, tremendous progress has been made in determining comprehensive interactome maps for the human proteome. Knowledge of PPIs has been instrumental in assigning functions to proteins involved in human disease and helped to understand the connectivity between different disease pathways and also significantly contributed to the functional understanding of LRRK2. In addition to an increased kinase activity observed for proteins containing PD-associated variants, various studies helped to establish LRRK2 as a large scaffold protein in the interface between cytoskeletal dynamics and the vesicular transport. This review first discusses a number of specific LRRK2-associated PPIs for which a functional consequence can at least be speculated upon, and then considers the representation of LRRK2 protein interactions in public repositories, providing an outlook on open research questions and challenges in this field.
Collapse
|
22
|
A Coordinated Approach by Public Domain Bioinformatics Resources to Aid the Fight Against Alzheimer's Disease Through Expert Curation of Key Protein Targets. J Alzheimers Dis 2020; 77:257-273. [PMID: 32716361 PMCID: PMC7592670 DOI: 10.3233/jad-200206] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/05/2020] [Indexed: 01/08/2023]
Abstract
BACKGROUND The analysis and interpretation of data generated from patient-derived clinical samples relies on access to high-quality bioinformatics resources. These are maintained and updated by expert curators extracting knowledge from unstructured biological data described in free-text journal articles and converting this into more structured, computationally-accessible forms. This enables analyses such as functional enrichment of sets of genes/proteins using the Gene Ontology, and makes the searching of data more productive by managing issues such as gene/protein name synonyms, identifier mapping, and data quality. OBJECTIVE To undertake a coordinated annotation update of key public-domain resources to better support Alzheimer's disease research. METHODS We have systematically identified target proteins critical to disease process, in part by accessing informed input from the clinical research community. RESULTS Data from 954 papers have been added to the UniProtKB, Gene Ontology, and the International Molecular Exchange Consortium (IMEx) databases, with 299 human proteins and 279 orthologs updated in UniProtKB. 745 binary interactions were added to the IMEx human molecular interaction dataset. CONCLUSION This represents a significant enhancement in the expert curated data pertinent to Alzheimer's disease available in a number of biomedical databases. Relevant protein entries have been updated in UniProtKB and concomitantly in the Gene Ontology. Molecular interaction networks have been significantly extended in the IMEx Consortium dataset and a set of reference protein complexes created. All the resources described are open-source and freely available to the research community and we provide examples of how these data could be exploited by researchers.
Collapse
|
23
|
Conserved Central Intraviral Protein Interactome of the Herpesviridae Family. mSystems 2019; 4:e00295-19. [PMID: 31575665 PMCID: PMC6774017 DOI: 10.1128/msystems.00295-19] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Accepted: 09/09/2019] [Indexed: 01/08/2023] Open
Abstract
Protein interactions are major driving forces behind the functional phenotypes of biological processes. As such, evolutionary footprints are reflected in system-level collections of protein-protein interactions (PPIs), i.e., protein interactomes. We conducted a comparative analysis of intraviral protein interactomes for representative species of each of the three subfamilies of herpesviruses (herpes simplex virus 1, human cytomegalovirus, and Epstein-Barr virus), which are highly prevalent etiologic agents of important human diseases. The intraviral interactomes were reconstructed by combining experimentally supported and computationally predicted protein-protein interactions. Using cross-species network comparison, we then identified family-wise conserved interactions and protein complexes, which we defined as a herpesviral "central" intraviral protein interactome. A large number of widely accepted conserved herpesviral protein complexes are present in this central intraviral interactome, encouragingly supporting the biological coherence of our results. Importantly, these protein complexes represent most, if not all, of the essential steps required during a productive life cycle. Hence the central intraviral protein interactome could plausibly represent a minimal infectious interactome of the herpesvirus family across a variety of hosts. Our data, which have been integrated into our herpesvirus interactomics database, HVint2.0, could assist in creating comprehensive system-level computational models of this viral lineage.IMPORTANCE Herpesviruses are an important socioeconomic burden for both humans and livestock. Throughout their long evolutionary history, individual herpesvirus species have developed remarkable host specificity, while collectively the Herpesviridae family has evolved to infect a large variety of eukaryotic hosts. The development of approaches to fight herpesvirus infections has been hampered by the complexity of herpesviruses' genomes, proteomes, and structural features. The data and insights generated by our study add to the understanding of the functional organization of herpesvirus-encoded proteins, specifically of family-wise conserved features defining essential components required for a productive infectious cycle across different hosts, which can contribute toward the conceptualization of antiherpetic infection strategies with an effect on a broader range of target species. All of the generated data have been made freely available through our HVint2.0 database, a dedicated resource of curated herpesvirus interactomics purposely created to promote and assist future studies in the field.
Collapse
|
24
|
Abstract
Alzheimer's disease and other types of dementia are the top cause for disabilities in later life and various types of experiments have been performed to understand the underlying mechanisms of the disease with the aim of coming up with potential drug targets. These experiments have been carried out by scientists working in different domains such as proteomics, molecular biology, clinical diagnostics and genomics. The results of such experiments are stored in the databases designed for collecting data of similar types. However, in order to get a systematic view of the disease from these independent but complementary data sets, it is necessary to combine them. In this study we describe a heterogeneous network-based data set for Alzheimer's disease (HENA). Additionally, we demonstrate the application of state-of-the-art graph convolutional networks, i.e. deep learning methods for the analysis of such large heterogeneous biological data sets. We expect HENA to allow scientists to explore and analyze their own results in the broader context of Alzheimer's disease research.
Collapse
|
25
|
Protein interactions and consensus clustering analysis uncover insights into herpesvirus virion structure and function relationships. PLoS Biol 2019; 17:e3000316. [PMID: 31199794 PMCID: PMC6594648 DOI: 10.1371/journal.pbio.3000316] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 06/26/2019] [Accepted: 05/23/2019] [Indexed: 01/08/2023] Open
Abstract
Infections with human herpesviruses are ubiquitous and a public health concern worldwide. Current treatments reduce the severity of some symptoms associated to herpetic infections but neither remove the viral reservoir from the infected host nor protect from the recurrent symptom outbreaks that characterise herpetic infections. The difficulty in therapeutically tackling these viral systems stems in part from their remarkably large proteomes and the complex networks of physical and functional associations that they tailor. This study presents our efforts to unravel the complexity of the interactome of herpes simplex virus type 1 (HSV1), the prototypical herpesvirus species. Inspired by our previous work, we present an improved and more integrative computational pipeline for the protein–protein interaction (PPI) network reconstruction in HSV1, together with a newly developed consensus clustering framework, which allowed us to extend the analysis beyond binary physical interactions and revealed a system-level layout of higher-order functional associations in the virion proteome. Additionally, the analysis provided new functional annotation for the currently undercharacterised protein pUS10. In-depth bioinformatics sequence analysis unravelled structural features in pUS10 reminiscent of those observed in some capsid-associated proteins in tailed bacteriophages, with which herpesviruses are believed to share a common ancestry. Using immunoaffinity purification (IP)–mass spectrometry (MS), we obtained additional support for our bioinformatically predicted interaction between pUS10 and the inner tegument protein pUL37, which binds cytosolic capsids, contributing to initial tegumentation and eventually virion maturation. In summary, this study unveils new, to our knowledge, insights at both the system and molecular levels that can help us better understand the complexity behind herpesvirus infections. Consensus clustering of protein-protein interaction networks provides insights into the assembly mechanism of herpes simplex virus type 1 (HSV1) virions and structure-function relationships underlying herpesvirus infection.
Collapse
|
26
|
Phospho-peptide binding domains in S. cerevisiae model organism. Biochimie 2019; 163:117-127. [PMID: 31194995 DOI: 10.1016/j.biochi.2019.06.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2019] [Accepted: 06/06/2019] [Indexed: 02/07/2023]
Abstract
Protein phosphorylation is one of the main mechanisms by which signals are transmitted in eukaryotic cells, and it plays a crucial regulatory role in almost all cellular processes. In yeast, more than half of the proteins are phosphorylated in at least one site, and over 20,000 phosphopeptides have been experimentally verified. However, the functional consequences of these phosphorylation events for most of the identified phosphosites are unknown. A family of protein interaction domains selectively recognises phosphorylated motifs to recruit regulatory proteins and activate signalling pathways. Nine classes of dedicated modules are coded by the yeast genome: 14-3-3, FHA, WD40, BRCT, WW, PBD, and SH2. The recognition specificity relies on a few residues on the target protein and has coevolved with kinase specificity. In the present study, we review the current knowledge concerning yeast phospho-binding domains and their networks. We emphasise the relevance of both positive and negative amino acid selection to orchestrate the highly regulated outcomes of inter- and intra-molecular interactions. Finally, we hypothesise that only a small fraction of yeast phosphorylation events leads to the creation of a docking site on the target molecule, while many have a direct effect on the protein or, as has been proposed, have no function at all.
Collapse
|
27
|
Leveraging Experimental Details for an Improved Understanding of Host-Pathogen Interactome. ACTA ACUST UNITED AC 2019; 61:8.26.1-8.26.12. [PMID: 30040202 DOI: 10.1002/cpbi.44] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
An increasing proportion of curated host-pathogen interaction (HPI) information is becoming available in interaction databases. These data represent detailed, experimentally-verified, molecular interaction data, which may be used to better understand infectious diseases. By their very nature, HPIs are context dependent, where the outcome of two proteins as interacting or not depends on the precise biological conditions studied and approaches used for identifying these interactions. The associated biology and the technical details of the experiments identifying interacting protein molecules are increasing being curated using defined curation standards but are overlooked in current HPI network modeling. Given the increase in data size and complexity, awareness of the process and variables included in HPI identification and curation, and their effect on data analysis and interpretation is crucial in understanding pathogenesis. We describe the use of HPI data for network modeling, aspects of curation that can help researchers to more accurately model specific infection conditions, and provide examples to illustrate these principles. © 2018 by John Wiley & Sons, Inc.
Collapse
|
28
|
PathFX provides mechanistic insights into drug efficacy and safety for regulatory review and therapeutic development. PLoS Comput Biol 2018; 14:e1006614. [PMID: 30532240 PMCID: PMC6285459 DOI: 10.1371/journal.pcbi.1006614] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2018] [Accepted: 10/31/2018] [Indexed: 12/14/2022] Open
Abstract
Failure to demonstrate efficacy and safety issues are important reasons that drugs do not reach the market. An incomplete understanding of how drugs exert their effects hinders regulatory and pharmaceutical industry projections of a drug's benefits and risks. Signaling pathways mediate drug response and while many signaling molecules have been characterized for their contribution to disease or their role in drug side effects, our knowledge of these pathways is incomplete. To better understand all signaling molecules involved in drug response and the phenotype associations of these molecules, we created a novel method, PathFX, a non-commercial entity, to identify these pathways and drug-related phenotypes. We benchmarked PathFX by identifying drugs' marketed disease indications and reported a sensitivity of 41%, a 2.7-fold improvement over similar approaches. We then used PathFX to strengthen signals for drug-adverse event pairs occurring in the FDA Adverse Event Reporting System (FAERS) and also identified opportunities for drug repurposing for new diseases based on interaction paths that associated a marketed drug to that disease. By discovering molecular interaction pathways, PathFX improved our understanding of drug associations to safety and efficacy phenotypes. The algorithm may provide a new means to improve regulatory and therapeutic development decisions.
Collapse
|
29
|
Hepatic Dysfunction Caused by Consumption of a High-Fat Diet. Cell Rep 2018; 21:3317-3328. [PMID: 29241556 DOI: 10.1016/j.celrep.2017.11.059] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Revised: 11/11/2017] [Accepted: 11/16/2017] [Indexed: 12/16/2022] Open
Abstract
Obesity is a major human health crisis that promotes insulin resistance and, ultimately, type 2 diabetes. The molecular mechanisms that mediate this response occur across many highly complex biological regulatory levels that are incompletely understood. Here, we present a comprehensive molecular systems biology study of hepatic responses to high-fat feeding in mice. We interrogated diet-induced epigenomic, transcriptomic, proteomic, and metabolomic alterations using high-throughput omic methods and used a network modeling approach to integrate these diverse molecular signals. Our model indicated that disruption of hepatic architecture and enhanced hepatocyte apoptosis are among the numerous biological processes that contribute to early liver dysfunction and low-grade inflammation during the development of diet-induced metabolic syndrome. We validated these model findings with additional experiments on mouse liver sections. In total, we present an integrative systems biology study of diet-induced hepatic insulin resistance that uncovered molecular features promoting the development and maintenance of metabolic disease.
Collapse
|
30
|
JAMI: a Java library for molecular interactions and data interoperability. BMC Bioinformatics 2018; 19:133. [PMID: 29642846 PMCID: PMC5896107 DOI: 10.1186/s12859-018-2119-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2017] [Accepted: 03/20/2018] [Indexed: 11/22/2022] Open
Abstract
Background A number of different molecular interactions data download formats now exist, designed to allow access to these valuable data by diverse user groups. These formats include the PSI-XML and MITAB standard interchange formats developed by Molecular Interaction workgroup of the HUPO-PSI in addition to other, use-specific downloads produced by other resources. The onus is currently on the user to ensure that a piece of software is capable of read/writing all necessary versions of each format. This problem may increase, as data providers strive to meet ever more sophisticated user demands and data types. Results A collaboration between EMBL-EBI and the University of Cambridge has produced JAMI, a single library to unify standard molecular interaction data formats such as PSI-MI XML and PSI-MITAB. The JAMI free, open-source library enables the development of molecular interaction computational tools and pipelines without the need to produce different versions of software to read different versions of the data formats. Conclusion Software and tools developed on top of the JAMI framework are able to integrate and support both PSI-MI XML and PSI-MITAB. The use of JAMI avoids the requirement to chain conversions between formats in order to reach a desired output format and prevents code and unit test duplication as the code becomes more modular. JAMI’s model interfaces are abstracted from the underlying format, hiding the complexity and requirements of each data format from developers using JAMI as a library.
Collapse
|
31
|
Discovering Altered Regulation and Signaling Through Network-based Integration of Transcriptomic, Epigenomic, and Proteomic Tumor Data. Methods Mol Biol 2018; 1711:13-26. [PMID: 29344883 PMCID: PMC6309679 DOI: 10.1007/978-1-4939-7493-1_2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
With the extraordinary rise in available biological data, biologists and clinicians need unbiased tools for data integration in order to reach accurate, succinct conclusions. Network biology provides one such method for high-throughput data integration, but comes with its own set of algorithmic problems and needed expertise. We provide a step-by-step guide for using Omics Integrator, a software package designed for the integration of transcriptomic, epigenomic, and proteomic data. Omics Integrator can be found at http://fraenkel.mit.edu/omicsintegrator .
Collapse
|
32
|
Functional Genomics Approach Identifies Novel Signaling Regulators of TGFα Ectodomain Shedding. Mol Cancer Res 2018; 16:147-161. [PMID: 29018056 PMCID: PMC5859574 DOI: 10.1158/1541-7786.mcr-17-0140] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2017] [Revised: 08/16/2017] [Accepted: 10/04/2017] [Indexed: 11/16/2022]
Abstract
Ectodomain shedding of cell-surface precursor proteins by metalloproteases generates important cellular signaling molecules. Of importance for disease is the release of ligands that activate the EGFR, such as TGFα, which is mostly carried out by ADAM17 [a member of the A-disintegrin and metalloprotease (ADAM) domain family]. EGFR ligand shedding has been linked to many diseases, in particular cancer development, growth and metastasis, as well as resistance to cancer therapeutics. Excessive EGFR ligand release can outcompete therapeutic EGFR inhibition or the inhibition of other growth factor pathways by providing bypass signaling via EGFR activation. Drugging metalloproteases directly have failed clinically because it indiscriminately affected shedding of numerous substrates. It is therefore essential to identify regulators for EGFR ligand cleavage. Here, integration of a functional shRNA genomic screen, computational network analysis, and dedicated validation tests succeeded in identifying several key signaling pathways as novel regulators of TGFα shedding in cancer cells. Most notably, a cluster of genes with NFκB pathway regulatory functions was found to strongly influence TGFα release, albeit independent of their NFκB regulatory functions. Inflammatory regulators thus also govern cancer cell growth-promoting ectodomain cleavage, lending mechanistic understanding to the well-known connection between inflammation and cancer.Implications: Using genomic screens and network analysis, this study defines targets that regulate ectodomain shedding and suggests new treatment opportunities for EGFR-driven cancers. Mol Cancer Res; 16(1); 147-61. ©2017 AACR.
Collapse
|
33
|
Abstract
This article describes the creation of the first expert manually curated noncoding RNA interaction networks for S. cerevisiae The RNA-RNA and RNA-protein interaction networks have been carefully extracted from the experimental literature and made available through the IntAct database (www.ebi.ac.uk/intact). We provide an initial network analysis and compare their properties to the much larger protein-protein interaction network. We find that the proteins that bind to ncRNAs in the network contain only a small proportion of classical RNA binding domains. We also see an enrichment of WD40 domains suggesting their direct involvement in ncRNA interactions. We discuss the challenges in collecting noncoding RNA interaction data and the opportunities for worldwide collaboration to fill the unmet need for this data.
Collapse
|
34
|
Development of an in silico method for the identification of subcomplexes involved in the biogenesis of multiprotein complexes in Saccharomyces cerevisiae. BMC SYSTEMS BIOLOGY 2017; 11:67. [PMID: 28693620 PMCID: PMC5504824 DOI: 10.1186/s12918-017-0442-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Accepted: 06/28/2017] [Indexed: 11/23/2022]
Abstract
Background Large sets of protein-protein interaction data coming either from biological experiments or predictive methods are available and can be combined to construct networks from which information about various cell processes can be extracted. We have developed an in silico approach based on these information to model the biogenesis of multiprotein complexes in the yeast Saccharomyces cerevisiae. Results Firstly, we have built three protein interaction networks by collecting the protein-protein interactions, which involved the subunits of three complexes, from different databases. The protein-protein interactions come from different kinds of biological experiments or are predicted. We have chosen the elongator and the mediator head complexes that are soluble and exhibit an architecture with subcomplexes that could be functional modules, and the mitochondrial bc1 complex, which is an integral membrane complex and for which a late assembly subcomplex has been described. Secondly, by applying a clustering strategy to these networks, we were able to identify subcomplexes involved in the biogenesis of the complexes as well as the proteins interacting with each subcomplex. Thirdly, in order to validate our in silico results for the cytochrome bc1 complex we have analysed the physical interactions existing between three subunits by performing immunoprecipitation experiments in several genetic context. Conclusions For the two soluble complexes (the elongator and mediator head), our model shows a strong clustering of subunits that belong to a known subcomplex or module. For the membrane bc1 complex, our approach has suggested new interactions between subunits in the early steps of the assembly pathway that were experimentally confirmed. Scripts can be downloaded from the site: http://bim.igmors.u-psud.fr/isips. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0442-0) contains supplementary material, which is available to authorized users.
Collapse
|
35
|
Pathway-based network modeling finds hidden genes in shRNA screen for regulators of acute lymphoblastic leukemia. Integr Biol (Camb) 2016; 8:761-74. [PMID: 27315426 PMCID: PMC5224708 DOI: 10.1039/c6ib00040a] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2016] [Accepted: 05/31/2016] [Indexed: 12/30/2022]
Abstract
Data integration stands to improve interpretation of RNAi screens which, as a result of off-target effects, typically yield numerous gene hits of which only a few validate. These off-target effects can result from seed matches to unintended gene targets (reagent-based) or cellular pathways, which can compensate for gene perturbations (biology-based). We focus on the biology-based effects and use network modeling tools to discover pathways de novo around RNAi hits. By looking at hits in a functional context, we can uncover novel biology not identified from any individual 'omics measurement. We leverage multiple 'omic measurements using the Simultaneous Analysis of Multiple Networks (SAMNet) computational framework to model a genome scale shRNA screen investigating Acute Lymphoblastic Leukemia (ALL) progression in vivo. Our network model is enriched for cellular processes associated with hematopoietic differentiation and homeostasis even though none of the individual 'omic sets showed this enrichment. The model identifies genes associated with the TGF-beta pathway and predicts a role in ALL progression for many genes without this functional annotation. We further experimentally validate the hidden genes - Wwp1, a ubiquitin ligase, and Hgs, a multi-vesicular body associated protein - for their role in ALL progression. Our ALL pathway model includes genes with roles in multiple types of leukemia and roles in hematological development. We identify a tumor suppressor role for Wwp1 in ALL progression. This work demonstrates that network integration approaches can compensate for off-target effects, and that these methods can uncover novel biology retroactively on existing screening data. We anticipate that this framework will be valuable to multiple functional genomic technologies - siRNA, shRNA, and CRISPR - generally, and will improve the utility of functional genomic studies.
Collapse
|
36
|
HVint: A Strategy for Identifying Novel Protein-Protein Interactions in Herpes Simplex Virus Type 1. Mol Cell Proteomics 2016; 15:2939-53. [PMID: 27384951 PMCID: PMC5013309 DOI: 10.1074/mcp.m116.058552] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2016] [Indexed: 11/12/2022] Open
Abstract
Human herpesviruses are widespread human pathogens with a remarkable impact on worldwide public health. Despite intense decades of research, the molecular details in many aspects of their function remain to be fully characterized. To unravel the details of how these viruses operate, a thorough understanding of the relationships between the involved components is key. Here, we present HVint, a novel protein-protein intraviral interaction resource for herpes simplex virus type 1 (HSV-1) integrating data from five external sources. To assess each interaction, we used a scoring scheme that takes into consideration aspects such as the type of detection method and the number of lines of evidence. The coverage of the initial interactome was further increased using evolutionary information, by importing interactions reported for other human herpesviruses. These latter interactions constitute, therefore, computational predictions for potential novel interactions in HSV-1. An independent experimental analysis was performed to confirm a subset of our predicted interactions. This subset covers proteins that contribute to nuclear egress and primary envelopment events, including VP26, pUL31, pUL40, and the recently characterized pUL32 and pUL21. Our findings support a coordinated crosstalk between VP26 and proteins such as pUL31, pUS9, and the CSVC complex, contributing to the development of a model describing the nuclear egress and primary envelopment pathways of newly synthesized HSV-1 capsids. The results are also consistent with recent findings on the involvement of pUL32 in capsid maturation and early tegumentation events. Further, they open the door to new hypotheses on virus-specific regulators of pUS9-dependent transport. To make this repository of interactions readily accessible for the scientific community, we also developed a user-friendly and interactive web interface. Our approach demonstrates the power of computational predictions to assist in the design of targeted experiments for the discovery of novel protein-protein interactions.
Collapse
|
37
|
Gene regulation knowledge commons: community action takes care of DNA binding transcription factors. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw088. [PMID: 27270715 PMCID: PMC4911790 DOI: 10.1093/database/baw088] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2015] [Accepted: 05/05/2016] [Indexed: 12/23/2022]
Abstract
A large gap remains between the amount of knowledge in scientific literature and the fraction that gets curated into standardized databases, despite many curation initiatives. Yet the availability of comprehensive knowledge in databases is crucial for exploiting existing background knowledge, both for designing follow-up experiments and for interpreting new experimental data. Structured resources also underpin the computational integration and modeling of regulatory pathways, which further aids our understanding of regulatory dynamics. We argue how cooperation between the scientific community and professional curators can increase the capacity of capturing precise knowledge from literature. We demonstrate this with a project in which we mobilize biological domain experts who curate large amounts of DNA binding transcription factors, and show that they, although new to the field of curation, can make valuable contributions by harvesting reported knowledge from scientific papers. Such community curation can enhance the scientific epistemic process. Database URL: http://www.tfcheckpoint.org
Collapse
|
38
|
Using biological networks to integrate, visualize and analyze genomics data. Genet Sel Evol 2016; 48:27. [PMID: 27036106 PMCID: PMC4818439 DOI: 10.1186/s12711-016-0205-1] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2015] [Accepted: 03/16/2016] [Indexed: 12/22/2022] Open
Abstract
Network biology is a rapidly developing area of biomedical research and reflects the current view that complex phenotypes, such as disease susceptibility, are not the result of single gene mutations that act in isolation but are rather due to the perturbation of a gene’s network context. Understanding the topology of these molecular interaction networks and identifying the molecules that play central roles in their structure and regulation is a key to understanding complex systems. The falling cost of next-generation sequencing is now enabling researchers to routinely catalogue the molecular components of these networks at a genome-wide scale and over a large number of different conditions. In this review, we describe how to use publicly available bioinformatics tools to integrate genome-wide ‘omics’ data into a network of experimentally-supported molecular interactions. In addition, we describe how to visualize and analyze these networks to identify topological features of likely functional relevance, including network hubs, bottlenecks and modules. We show that network biology provides a powerful conceptual approach to integrate and find patterns in genome-wide genomic data but we also discuss the limitations and caveats of these methods, of which researchers adopting these methods must remain aware.
Collapse
|
39
|
HitPredict version 4: comprehensive reliability scoring of physical protein-protein interactions from more than 100 species. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav117. [PMID: 26708988 PMCID: PMC4691340 DOI: 10.1093/database/bav117] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 11/19/2015] [Indexed: 01/08/2023]
Abstract
HitPredict is a consolidated resource of experimentally identified, physical protein–protein interactions with confidence scores to indicate their reliability. The study of genes and their inter-relationships using methods such as network and pathway analysis requires high quality protein–protein interaction information. Extracting reliable interactions from most of the existing databases is challenging because they either contain only a subset of the available interactions, or a mixture of physical, genetic and predicted interactions. Automated integration of interactions is further complicated by varying levels of accuracy of database content and lack of adherence to standard formats. To address these issues, the latest version of HitPredict provides a manually curated dataset of 398 696 physical associations between 70 808 proteins from 105 species. Manual confirmation was used to resolve all issues encountered during data integration. For improved reliability assessment, this version combines a new score derived from the experimental information of the interactions with the original score based on the features of the interacting proteins. The combined interaction score performs better than either of the individual scores in HitPredict as well as the reliability score of another similar database. HitPredict provides a web interface to search proteins and visualize their interactions, and the data can be downloaded for offline analysis. Data usability has been enhanced by mapping protein identifiers across multiple reference databases. Thus, the latest version of HitPredict provides a significantly larger, more reliable and usable dataset of protein–protein interactions from several species for the study of gene groups. Database URL: http://hintdb.hgc.jp/htp
Collapse
|
40
|
Exploring novel mechanistic insights in Alzheimer's disease by assessing reliability of protein interactions. Sci Rep 2015; 5:13634. [PMID: 26346705 PMCID: PMC4562155 DOI: 10.1038/srep13634] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2014] [Accepted: 08/03/2015] [Indexed: 01/08/2023] Open
Abstract
Protein interaction networks are widely used in computational biology as a graphical means of representing higher-level systemic functions in a computable form. Although, many algorithms exist that seamlessly collect and measure protein interaction information in network models, they often do not provide novel mechanistic insights using quantitative criteria. Measuring information content and knowledge representation in network models about disease mechanisms becomes crucial particularly when exploring new target candidates in a well-defined functional context of a potential disease mechanism. To this end, we have developed a knowledge-based scoring approach that uses literature-derived protein interaction features to quantify protein interaction confidence. Thereby, we introduce the novel concept of knowledge cliffs, regions of the interaction network where a significant gap between high scoring and low scoring interactions is observed, representing a divide between established and emerging knowledge on disease mechanism. To show the application of this approach, we constructed and assessed reliability of a protein-protein interaction model specific to Alzheimer’s disease, which led to screening, and prioritization of four novel protein candidates. Evaluation of the identified candidates showed that two of them are already followed in clinical trials for testing potential AD drugs.
Collapse
|
41
|
Detecting modules in biological networks by edge weight clustering and entropy significance. Front Genet 2015; 6:265. [PMID: 26379697 PMCID: PMC4551098 DOI: 10.3389/fgene.2015.00265] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 07/30/2015] [Indexed: 12/04/2022] Open
Abstract
Detection of the modular structure of biological networks is of interest to researchers adopting a systems perspective for the analysis of omics data. Computational systems biology has provided a rich array of methods for network clustering. To date, the majority of approaches address this task through a network node classification based on topological or external quantifiable properties of network nodes. Conversely, numerical properties of network edges are underused, even though the information content which can be associated with network edges has augmented due to steady advances in molecular biology technology over the last decade. Properly accounting for network edges in the development of clustering approaches can become crucial to improve quantitative interpretation of omics data, finally resulting in more biologically plausible models. In this study, we present a novel technique for network module detection, named WG-Cluster (Weighted Graph CLUSTERing). WG-Cluster's notable features, compared to current approaches, lie in: (1) the simultaneous exploitation of network node and edge weights to improve the biological interpretability of the connected components detected, (2) the assessment of their statistical significance, and (3) the identification of emerging topological properties in the detected connected components. WG-Cluster utilizes three major steps: (i) an unsupervised version of k-means edge-based algorithm detects sub-graphs with similar edge weights, (ii) a fast-greedy algorithm detects connected components which are then scored and selected according to the statistical significance of their scores, and (iii) an analysis of the convolution between sub-graph mean edge weight and connected component score provides a summarizing view of the connected components. WG-Cluster can be applied to directed and undirected networks of different types of interacting entities and scales up to large omics data sets. Here, we show that WG-Cluster can be successfully used in the differential analysis of physical protein–protein interaction (PPI) networks. Specifically, applying WG-Cluster to a PPI network weighted by measurements of differential gene expression permits to explore the changes in network topology under two distinct (normal vs. tumor) conditions. WG-Cluster code is available at https://sites.google.com/site/paolaleccapersonalpage/.
Collapse
|
42
|
A visual review of the interactome of LRRK2: Using deep-curated molecular interaction data to represent biology. Proteomics 2015; 15:1390-404. [PMID: 25648416 PMCID: PMC4415485 DOI: 10.1002/pmic.201400390] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2014] [Revised: 01/15/2015] [Accepted: 01/29/2015] [Indexed: 02/04/2023]
Abstract
Molecular interaction databases are essential resources that enable access to a wealth of information on associations between proteins and other biomolecules. Network graphs generated from these data provide an understanding of the relationships between different proteins in the cell, and network analysis has become a widespread tool supporting –omics analysis. Meaningfully representing this information remains far from trivial and different databases strive to provide users with detailed records capturing the experimental details behind each piece of interaction evidence. A targeted curation approach is necessary to transfer published data generated by primarily low-throughput techniques into interaction databases. In this review we present an example highlighting the value of both targeted curation and the subsequent effective visualization of detailed features of manually curated interaction information. We have curated interactions involving LRRK2, a protein of largely unknown function linked to familial forms of Parkinson's disease, and hosted the data in the IntAct database. This LRRK2-specific dataset was then used to produce different visualization examples highlighting different aspects of the data: the level of confidence in the interaction based on orthogonal evidence, those interactions found under close-to-native conditions, and the enzyme–substrate relationships in different in vitro enzymatic assays. Finally, pathway annotation taken from the Reactome database was overlaid on top of interaction networks to bring biological functional context to interaction maps.
Collapse
|