1
|
Pu J, Yu Y, Liu Y, Wang D, Gui S, Zhong X, Chen W, Chen X, Chen Y, Chen X, Qiao R, Jiang Y, Zhang H, Fan L, Ren Y, Chen X, Wang H, Xie P. ProMENDA: an updated resource for proteomic and metabolomic characterization in depression. Transl Psychiatry 2024; 14:229. [PMID: 38816410 PMCID: PMC11139925 DOI: 10.1038/s41398-024-02948-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 05/15/2024] [Accepted: 05/17/2024] [Indexed: 06/01/2024] Open
Abstract
Depression is a prevalent mental disorder with a complex biological mechanism. Following the rapid development of systems biology technology, a growing number of studies have applied proteomics and metabolomics to explore the molecular profiles of depression. However, a standardized resource facilitating the identification and annotation of the available knowledge from these scattered studies associated with depression is currently lacking. This study presents ProMENDA, an upgraded resource that provides a platform for manual annotation of candidate proteins and metabolites linked to depression. Following the establishment of the protein dataset and the update of the metabolite dataset, the ProMENDA database was developed as a major extension of its initial release. A multi-faceted annotation scheme was employed to provide comprehensive knowledge of the molecules and studies. A new web interface was also developed to improve the user experience. The ProMENDA database now contains 43,366 molecular entries, comprising 20,847 protein entries and 22,519 metabolite entries, which were manually curated from 1370 human, rat, mouse, and non-human primate studies. This represents a significant increase (more than 7-fold) in molecular entries compared to the initial release. To demonstrate the usage of ProMENDA, a case study identifying consistently reported proteins and metabolites in the brains of animal models of depression was presented. Overall, ProMENDA is a comprehensive resource that offers a panoramic view of proteomic and metabolomic knowledge in depression. ProMENDA is freely available at https://menda.cqmu.edu.cn .
Collapse
Affiliation(s)
- Juncai Pu
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Yue Yu
- Department of Health Sciences Research, Mayo Clinic, MN, 55901, USA
| | - Yiyun Liu
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Dongfang Wang
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Siwen Gui
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Xiaogang Zhong
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Weiyi Chen
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Xiaopeng Chen
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Yue Chen
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Xiang Chen
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Renjie Qiao
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Yanyi Jiang
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Hanping Zhang
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Li Fan
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Yi Ren
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Xiangyu Chen
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Haiyang Wang
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Peng Xie
- Department of Neurology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China.
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China.
- The Jinfeng Laboratory, Chongqing, 401336, China.
- Chongqing Institute for Brain and Intelligence, Chongqing, 400072, China.
| |
Collapse
|
2
|
Balabin H, Hoyt CT, Birkenbihl C, Gyori BM, Bachman J, Kodamullil AT, Plöger PG, Hofmann-Apitius M, Domingo-Fernández D. STonKGs: a sophisticated transformer trained on biomedical text and knowledge graphs. Bioinformatics 2022; 38:1648-1656. [PMID: 34986221 PMCID: PMC8896635 DOI: 10.1093/bioinformatics/btac001] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 12/09/2021] [Accepted: 01/03/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The majority of biomedical knowledge is stored in structured databases or as unstructured text in scientific publications. This vast amount of information has led to numerous machine learning-based biological applications using either text through natural language processing (NLP) or structured data through knowledge graph embedding models. However, representations based on a single modality are inherently limited. RESULTS To generate better representations of biological knowledge, we propose STonKGs, a Sophisticated Transformer trained on biomedical text and Knowledge Graphs (KGs). This multimodal Transformer uses combined input sequences of structured information from KGs and unstructured text data from biomedical literature to learn joint representations in a shared embedding space. First, we pre-trained STonKGs on a knowledge base assembled by the Integrated Network and Dynamical Reasoning Assembler consisting of millions of text-triple pairs extracted from biomedical literature by multiple NLP systems. Then, we benchmarked STonKGs against three baseline models trained on either one of the modalities (i.e. text or KG) across eight different classification tasks, each corresponding to a different biological application. Our results demonstrate that STonKGs outperforms both baselines, especially on the more challenging tasks with respect to the number of classes, improving upon the F1-score of the best baseline by up to 0.084 (i.e. from 0.881 to 0.965). Finally, our pre-trained model as well as the model architecture can be adapted to various other transfer learning applications. AVAILABILITY AND IMPLEMENTATION We make the source code and the Python package of STonKGs available at GitHub (https://github.com/stonkgs/stonkgs) and PyPI (https://pypi.org/project/stonkgs/). The pre-trained STonKGs models and the task-specific classification models are respectively available at https://huggingface.co/stonkgs/stonkgs-150k and https://zenodo.org/communities/stonkgs. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Charles Tapley Hoyt
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Colin Birkenbihl
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53757 Sankt Augustin, Germany
| | - Benjamin M Gyori
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - John Bachman
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Alpha Tom Kodamullil
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53757 Sankt Augustin, Germany
| | - Paul G Plöger
- Department of Bonn-Rhein-Sieg, University of Applied Sciences, 53757 Sankt Augustin, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, 53757 Sankt Augustin, Germany
| | | |
Collapse
|
3
|
Venkatraman DL, Pulimamidi D, Shukla HG, Hegde SR. Tumor relevant protein functional interactions identified using bipartite graph analyses. Sci Rep 2021; 11:21530. [PMID: 34728699 PMCID: PMC8563864 DOI: 10.1038/s41598-021-00879-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 09/30/2021] [Indexed: 12/02/2022] Open
Abstract
An increased surge of -omics data for the diseases such as cancer allows for deriving insights into the affiliated protein interactions. We used bipartite network principles to build protein functional associations of the differentially regulated genes in 18 cancer types. This approach allowed us to combine expression data to functional associations in many cancers simultaneously. Further, graph centrality measures suggested the importance of upregulated genes such as BIRC5, UBE2C, BUB1B, KIF20A and PTH1R in cancer. Pathway analysis of the high centrality network nodes suggested the importance of the upregulation of cell cycle and replication associated proteins in cancer. Some of the downregulated high centrality proteins include actins, myosins and ATPase subunits. Among the transcription factors, mini-chromosome maintenance proteins (MCMs) and E2F family proteins appeared prominently in regulating many differentially regulated genes. The projected unipartite networks of the up and downregulated genes were comprised of 37,411 and 41,756 interactions, respectively. The conclusions obtained by collating these interactions revealed pan-cancer as well as subtype specific protein complexes and clusters. Therefore, we demonstrate that incorporating expression data from multiple cancers into bipartite graphs validates existing cancer associated mechanisms as well as directs to novel interactions and pathways.
Collapse
Affiliation(s)
| | - Deepshika Pulimamidi
- Institute of Bioinformatics and Applied Biotechnology (IBAB), Bengaluru, 560 100, India
| | - Harsh G Shukla
- Institute of Bioinformatics and Applied Biotechnology (IBAB), Bengaluru, 560 100, India
| | - Shubhada R Hegde
- Institute of Bioinformatics and Applied Biotechnology (IBAB), Bengaluru, 560 100, India.
| |
Collapse
|
4
|
Rivas-Barragan D, Mubeen S, Guim Bernat F, Hofmann-Apitius M, Domingo-Fernández D. Drug2ways: Reasoning over causal paths in biological networks for drug discovery. PLoS Comput Biol 2020; 16:e1008464. [PMID: 33264280 PMCID: PMC7735677 DOI: 10.1371/journal.pcbi.1008464] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Revised: 12/14/2020] [Accepted: 10/23/2020] [Indexed: 12/24/2022] Open
Abstract
Elucidating the causal mechanisms responsible for disease can reveal potential therapeutic targets for pharmacological intervention and, accordingly, guide drug repositioning and discovery. In essence, the topology of a network can reveal the impact a drug candidate may have on a given biological state, leading the way for enhanced disease characterization and the design of advanced therapies. Network-based approaches, in particular, are highly suited for these purposes as they hold the capacity to identify the molecular mechanisms underlying disease. Here, we present drug2ways, a novel methodology that leverages multimodal causal networks for predicting drug candidates. Drug2ways implements an efficient algorithm which reasons over causal paths in large-scale biological networks to propose drug candidates for a given disease. We validate our approach using clinical trial information and demonstrate how drug2ways can be used for multiple applications to identify: i) single-target drug candidates, ii) candidates with polypharmacological properties that can optimize multiple targets, and iii) candidates for combination therapy. Finally, we make drug2ways available to the scientific community as a Python package that enables conducting these applications on multiple standard network formats.
Collapse
Affiliation(s)
- Daniel Rivas-Barragan
- Barcelona Supercomputing Center, Barcelona, Spain
- Computer Architecture Department, Universitat Politècnica de Catalunya, Barcelona, Spain
| | - Sarah Mubeen
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin, Germany
- Fraunhofer Center for Machine Learning, Germany
| | | | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin, Germany
| | - Daniel Domingo-Fernández
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin, Germany
- Fraunhofer Center for Machine Learning, Germany
| |
Collapse
|
5
|
Golriz Khatami S, Mubeen S, Hofmann-Apitius M. Data science in neurodegenerative disease: its capabilities, limitations, and perspectives. Curr Opin Neurol 2020; 33:249-254. [PMID: 32073441 PMCID: PMC7077964 DOI: 10.1097/wco.0000000000000795] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
PURPOSE OF REVIEW With the advancement of computational approaches and abundance of biomedical data, a broad range of neurodegenerative disease models have been developed. In this review, we argue that computational models can be both relevant and useful in neurodegenerative disease research and although the current established models have limitations in clinical practice, artificial intelligence has the potential to overcome deficiencies encountered by these models, which in turn can improve our understanding of disease. RECENT FINDINGS In recent years, diverse computational approaches have been used to shed light on different aspects of neurodegenerative disease models. For example, linear and nonlinear mixed models, self-modeling regression, differential equation models, and event-based models have been applied to provide a better understanding of disease progression patterns and biomarker trajectories. Additionally, the Cox-regression technique, Bayesian network models, and deep-learning-based approaches have been used to predict the probability of future incidence of disease, whereas nonnegative matrix factorization, nonhierarchical cluster analysis, hierarchical agglomerative clustering, and deep-learning-based approaches have been employed to stratify patients based on their disease subtypes. Furthermore, the interpretation of neurodegenerative disease data is possible through knowledge-based models which use prior knowledge to complement data-driven analyses. These knowledge-based models can include pathway-centric approaches to establish pathways perturbed in a given condition, as well as disease-specific knowledge maps, which elucidate the mechanisms involved in a given disease. Collectively, these established models have revealed high granular details and insights into neurodegenerative disease models. SUMMARY In conjunction with increasingly advanced computational approaches, a wide spectrum of neurodegenerative disease models, which can be broadly categorized into data-driven and knowledge-driven, have been developed. We review the state of the art data and knowledge-driven models and discuss the necessary steps which are vital to bring them into clinical application.
Collapse
Affiliation(s)
- Sepehr Golriz Khatami
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Sarah Mubeen
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| |
Collapse
|
6
|
Karki R, Kodamullil AT, Hoyt CT, Hofmann-Apitius M. Quantifying mechanisms in neurodegenerative diseases (NDDs) using candidate mechanism perturbation amplitude (CMPA) algorithm. BMC Bioinformatics 2019; 20:494. [PMID: 31604427 PMCID: PMC6788110 DOI: 10.1186/s12859-019-3101-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Accepted: 09/16/2019] [Indexed: 12/21/2022] Open
Abstract
Background Literature derived knowledge assemblies have been used as an effective way of representing biological phenomenon and understanding disease etiology in systems biology. These include canonical pathway databases such as KEGG, Reactome and WikiPathways and disease specific network inventories such as causal biological networks database, PD map and NeuroMMSig. The represented knowledge in these resources delineates qualitative information focusing mainly on the causal relationships between biological entities. Genes, the major constituents of knowledge representations, tend to express differentially in different conditions such as cell types, brain regions and disease stages. A classical approach of interpreting a knowledge assembly is to explore gene expression patterns of the individual genes. However, an approach that enables quantification of the overall impact of differentially expressed genes in the corresponding network is still lacking. Results Using the concept of heat diffusion, we have devised an algorithm that is able to calculate the magnitude of regulation of a biological network using expression datasets. We have demonstrated that molecular mechanisms specific to Alzheimer (AD) and Parkinson Disease (PD) regulate with different intensities across spatial and temporal resolutions. Our approach depicts that the mitochondrial dysfunction in PD is severe in cortex and advanced stages of PD patients. Similarly, we have shown that the intensity of aggregation of neurofibrillary tangles (NFTs) in AD increases as the disease progresses. This finding is in concordance with previous studies that explain the burden of NFTs in stages of AD. Conclusions This study is one of the first attempts that enable quantification of mechanisms represented as biological networks. We have been able to quantify the magnitude of regulation of a biological network and illustrate that the magnitudes are different across spatial and temporal resolution.
Collapse
Affiliation(s)
- Reagon Karki
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754, Sankt Augustin, Germany.,Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn-Aachen International Center for IT, Endenicher Allee 19a, 53115, Bonn, Germany
| | - Alpha Tom Kodamullil
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754, Sankt Augustin, Germany.,Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn-Aachen International Center for IT, Endenicher Allee 19a, 53115, Bonn, Germany
| | - Charles Tapley Hoyt
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754, Sankt Augustin, Germany.,Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn-Aachen International Center for IT, Endenicher Allee 19a, 53115, Bonn, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754, Sankt Augustin, Germany. .,Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn-Aachen International Center for IT, Endenicher Allee 19a, 53115, Bonn, Germany.
| |
Collapse
|
7
|
Lucignani G, Neri E. Integration of imaging biomarkers into systems biomedicine: a renaissance for medical imaging. Clin Transl Imaging 2019. [DOI: 10.1007/s40336-019-00320-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
8
|
Enhanced Molecular Appreciation of Psychiatric Disorders Through High-Dimensionality Data Acquisition and Analytics. Methods Mol Biol 2019; 2011:671-723. [PMID: 31273728 DOI: 10.1007/978-1-4939-9554-7_39] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The initial diagnosis, molecular investigation, treatment, and posttreatment care of major psychiatric disorders (schizophrenia and bipolar depression) are all still significantly hindered by the current inability to define these disorders in an explicit molecular signaling manner. High-dimensionality data analytics, using large datastreams from transcriptomic, proteomic, or metabolomic investigations, will likely advance both the appreciation of the molecular nature of major psychiatric disorders and simultaneously enhance our ability to more efficiently diagnose and treat these debilitating conditions. High-dimensionality data analysis in psychiatric research has been heterogeneous in aims and methods and limited by insufficient sample sizes, poorly defined case definitions, methodological inhomogeneity, and confounding results. All of these issues combine to constrain the conclusions that can be extracted from them. Here, we discuss possibilities for overcoming methodological challenges through the implementation of transcriptomic, proteomic, or metabolomics signatures in psychiatric diagnosis and offer an outlook for future investigations. To fulfill the promise of intelligent high-dimensionality data-based differential diagnosis in mental disease diagnosis and treatment, future research will need large, well-defined cohorts in combination with state-of-the-art technologies.
Collapse
|
9
|
Hoyt CT, Domingo-Fernández D, Aldisi R, Xu L, Kolpeja K, Spalek S, Wollert E, Bachman J, Gyori BM, Greene P, Hofmann-Apitius M. Re-curation and rational enrichment of knowledge graphs in Biological Expression Language. Database (Oxford) 2019; 2019:baz068. [PMID: 31225582 PMCID: PMC6587072 DOI: 10.1093/database/baz068] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 04/03/2019] [Accepted: 04/29/2019] [Indexed: 12/23/2022]
Abstract
The rapid accumulation of new biomedical literature not only causes curated knowledge graphs (KGs) to become outdated and incomplete, but also makes manual curation an impractical and unsustainable solution. Automated or semi-automated workflows are necessary to assist in prioritizing and curating the literature to update and enrich KGs. We have developed two workflows: one for re-curating a given KG to assure its syntactic and semantic quality and another for rationally enriching it by manually revising automatically extracted relations for nodes with low information density. We applied these workflows to the KGs encoded in Biological Expression Language from the NeuroMMSig database using content that was pre-extracted from MEDLINE abstracts and PubMed Central full-text articles using text mining output integrated by INDRA. We have made this workflow freely available at https://github.com/bel-enrichment/bel-enrichment.
Collapse
Affiliation(s)
- Charles Tapley Hoyt
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Daniel Domingo-Fernández
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Rana Aldisi
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Lingling Xu
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Kristian Kolpeja
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
| | - Sandra Spalek
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
| | - Esther Wollert
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
| | - John Bachman
- Laboratory of Systems Pharmacology, Harvard Medical School, 200 Longwood Ave, Boston, MA, USA
| | - Benjamin M Gyori
- Laboratory of Systems Pharmacology, Harvard Medical School, 200 Longwood Ave, Boston, MA, USA
| | - Patrick Greene
- Laboratory of Systems Pharmacology, Harvard Medical School, 200 Longwood Ave, Boston, MA, USA
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| |
Collapse
|
10
|
Hoyt CT, Domingo-Fernández D, Hofmann-Apitius M. BEL Commons: an environment for exploration and analysis of networks encoded in Biological Expression Language. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2018:5255171. [PMID: 30576488 PMCID: PMC6301338 DOI: 10.1093/database/bay126] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Accepted: 11/05/2018] [Indexed: 12/19/2022]
Abstract
The rapid accumulation of knowledge in the field of systems and networks biology during recent years requires complex, but user-friendly and accessible web applications that allow from visualization to complex algorithmic analysis. While several web applications exist with various focuses on creation, revision, curation, storage, integration, collaboration, exploration, visualization and analysis, many of these services remain disjoint and have yet to be packaged into a cohesive environment. Here, we present BEL Commons: an integrative knowledge discovery environment for networks encoded in the Biological Expression Language (BEL). Users can upload files in BEL to be parsed, validated, compiled and stored with fine granular permissions. After, users can summarize, explore and optionally shared their networks with the scientific community. We have implemented a query builder wizard to help users find the relevant portions of increasingly large and complex networks and a visualization interface that allows them to explore their resulting networks. Finally, we have included a dedicated analytical service for performing data-driven analysis of knowledge networks to support hypothesis generation.
Collapse
Affiliation(s)
- Charles Tapley Hoyt
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin, Germany.,Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Daniel Domingo-Fernández
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin, Germany.,Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin, Germany.,Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| |
Collapse
|