1
|
Delmas M, Filangi O, Paulhe N, Vinson F, Duperier C, Garrier W, Saunier PE, Pitarch Y, Jourdan F, Giacomoni F, Frainay C. FORUM: Building a Knowledge Graph from public databases and scientific literature to extract associations between chemicals and diseases. Bioinformatics 2021; 37:3896-3904. [PMID: 34478489 PMCID: PMC8570811 DOI: 10.1093/bioinformatics/btab627] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 08/16/2021] [Accepted: 09/01/2021] [Indexed: 11/22/2022] Open
Abstract
Motivation Metabolomics studies aim at reporting a metabolic signature (list of metabolites) related to a particular experimental condition. These signatures are instrumental in the identification of biomarkers or classification of individuals, however their biological and physiological interpretation remains a challenge. To support this task, we introduce FORUM: a Knowledge Graph (KG) providing a semantic representation of relations between chemicals and biomedical concepts, built from a federation of life science databases and scientific literature repositories. Results The use of a Semantic Web framework on biological data allows us to apply ontological-based reasoning to infer new relations between entities. We show that these new relations provide different levels of abstraction and could open the path to new hypotheses. We estimate the statistical relevance of each extracted relation, explicit or inferred, using an enrichment analysis, and instantiate them as new knowledge in the KG to support results interpretation/further inquiries. Availability and implementation A web interface to browse and download the extracted relations, as well as a SPARQL endpoint to directly probe the whole FORUM KG, are available at https://forum-webapp.semantic-metabolomics.fr. The code needed to reproduce the triplestore is available at https://github.com/eMetaboHUB/Forum-DiseasesChem. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- M Delmas
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, 31300, France
| | - O Filangi
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, Le Rheu, 35653, France
| | - N Paulhe
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, F-63000, France
| | - F Vinson
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, 31300, France
| | - C Duperier
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, F-63000, France
| | - W Garrier
- ISIMA, Campus des Cézeaux, Aubière, 63177, France
| | - P-E Saunier
- ISIMA, Campus des Cézeaux, Aubière, 63177, France
| | - Y Pitarch
- IRIT, Université de Toulouse, Cours Rose Dieng-Kuntz, Toulouse, 31400, France
| | - F Jourdan
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, 31300, France
| | - F Giacomoni
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, F-63000, France
| | - C Frainay
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, 31300, France
| |
Collapse
|
2
|
Frainay C, Pitarch Y, Filippi S, Evangelou M, Custovic A. Atopic dermatitis or eczema? Consequences of ambiguity in disease name for biomedical literature mining. Clin Exp Allergy 2021; 51:1185-1194. [PMID: 34213816 DOI: 10.1111/cea.13981] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Accepted: 06/30/2021] [Indexed: 11/26/2022]
Abstract
BACKGROUND Biomedical research increasingly relies on computational approaches to extract relevant information from large corpora of publications. OBJECTIVE To investigate the consequence of the ambiguity between the use of terms "Eczema" and "Atopic Dermatitis" (AD) from the Information Retrieval perspective, and its impact on meta-analyses, systematic reviews and text mining. METHODS Articles were retrieved by querying the PubMed using terms 'eczema' (D003876) and "dermatitis, atopic" (D004485). We used machine learning to investigate the differences between the contexts in which each term is used. We used a decision tree approach and trained model to predict if an article would be indexed with eczema or AD tags. We used text-mining tools to extract biological entities associated with eczema and AD, and investigated the discrepancy regarding the retrieval of key findings according to the terminology used. RESULTS Atopic dermatitis query yielded more articles related to veterinary science, biochemistry, cellular and molecular biology; the eczema query linked to public health, infectious disease and respiratory system. Medical Subject Headings terms associated with "AD" or "Eczema" differed, with an agreement between the top 40 lists of 52%. The presence of terms related to cellular mechanisms, especially allergies and inflammation, characterized AD literature. The metabolites mentioned more frequently than expected in articles with AD tag differed from those indexed with eczema. Fewer enriched genes were retrieved when using eczema compared to AD query. CONCLUSIONS AND CLINICAL RELEVANCE There is a considerable discrepancy when using text mining to extract bio-entities related to eczema or AD. Our results suggest that any systematic approach (particularly when looking for metabolites or genes related to the condition) should be performed using both terms jointly. We propose to use decision tree learning as a tool to spot and characterize ambiguity, and provide the source code for disambiguation at https://github.com/cfrainay/ResearchCodeBase.
Collapse
Affiliation(s)
- Clément Frainay
- Department of Epidemiology and Biostatistics, School of Public Health, Faculty of Medicine, Imperial College London, London, UK.,Toxalim (Research Center in Food Toxicology), INRAE, ENVT, INP-PURPAN, UPS, Université de Toulouse, Toulouse, France
| | - Yoann Pitarch
- UMR5505, IRIT, Université de Toulouse, Toulouse, France
| | - Sarah Filippi
- Department of Mathematics, Faculty of Natural Sciences, Imperial College London, London, UK
| | - Marina Evangelou
- Department of Epidemiology and Biostatistics, School of Public Health, Faculty of Medicine, Imperial College London, London, UK.,Department of Mathematics, Faculty of Natural Sciences, Imperial College London, London, UK
| | - Adnan Custovic
- National Heart and Lung Institute, Imperial College London, London, UK
| |
Collapse
|
3
|
Lloyd K, Papoutsopoulou S, Smith E, Stegmaier P, Bergey F, Morris L, Kittner M, England H, Spiller D, White MHR, Duckworth CA, Campbell BJ, Poroikov V, Martins Dos Santos VAP, Kel A, Muller W, Pritchard DM, Probert C, Burkitt MD. Using systems medicine to identify a therapeutic agent with potential for repurposing in inflammatory bowel disease. Dis Model Mech 2020; 13:dmm044040. [PMID: 32958515 PMCID: PMC7710021 DOI: 10.1242/dmm.044040] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Accepted: 09/08/2020] [Indexed: 12/11/2022] Open
Abstract
Inflammatory bowel diseases (IBDs) cause significant morbidity and mortality. Aberrant NF-κB signalling is strongly associated with these conditions, and several established drugs influence the NF-κB signalling network to exert their effect. This study aimed to identify drugs that alter NF-κB signalling and could be repositioned for use in IBD. The SysmedIBD Consortium established a novel drug-repurposing pipeline based on a combination of in silico drug discovery and biological assays targeted at demonstrating an impact on NF-κB signalling, and a murine model of IBD. The drug discovery algorithm identified several drugs already established in IBD, including corticosteroids. The highest-ranked drug was the macrolide antibiotic clarithromycin, which has previously been reported to have anti-inflammatory effects in aseptic conditions. The effects of clarithromycin effects were validated in several experiments: it influenced NF-κB-mediated transcription in murine peritoneal macrophages and intestinal enteroids; it suppressed NF-κB protein shuttling in murine reporter enteroids; it suppressed NF-κB (p65) DNA binding in the small intestine of mice exposed to lipopolysaccharide; and it reduced the severity of dextran sulphate sodium-induced colitis in C57BL/6 mice. Clarithromycin also suppressed NF-κB (p65) nuclear translocation in human intestinal enteroids. These findings demonstrate that in silico drug repositioning algorithms can viably be allied to laboratory validation assays in the context of IBD, and that further clinical assessment of clarithromycin in the management of IBD is required.This article has an associated First Person interview with the joint first authors of the paper.
Collapse
Affiliation(s)
- Katie Lloyd
- Department of Cellular and Molecular Physiology, University of Liverpool, Liverpool L69 3GE, UK
| | - Stamatia Papoutsopoulou
- Department of Cellular and Molecular Physiology, University of Liverpool, Liverpool L69 3GE, UK
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PL, UK
| | - Emily Smith
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PL, UK
| | | | | | | | | | - Hazel England
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PL, UK
| | - Dave Spiller
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PL, UK
| | - Mike H R White
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PL, UK
| | - Carrie A Duckworth
- Department of Cellular and Molecular Physiology, University of Liverpool, Liverpool L69 3GE, UK
| | - Barry J Campbell
- Department of Cellular and Molecular Physiology, University of Liverpool, Liverpool L69 3GE, UK
| | | | | | | | - Werner Muller
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PL, UK
| | - D Mark Pritchard
- Department of Cellular and Molecular Physiology, University of Liverpool, Liverpool L69 3GE, UK
| | - Chris Probert
- Department of Cellular and Molecular Physiology, University of Liverpool, Liverpool L69 3GE, UK
| | - Michael D Burkitt
- Department of Cellular and Molecular Physiology, University of Liverpool, Liverpool L69 3GE, UK
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PL, UK
| |
Collapse
|
4
|
Krallinger M, Rabal O, Lourenço A, Oyarzabal J, Valencia A. Information Retrieval and Text Mining Technologies for Chemistry. Chem Rev 2017; 117:7673-7761. [PMID: 28475312 DOI: 10.1021/acs.chemrev.6b00851] [Citation(s) in RCA: 129] [Impact Index Per Article: 16.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.
Collapse
Affiliation(s)
- Martin Krallinger
- Structural Computational Biology Group, Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre , C/Melchor Fernández Almagro 3, Madrid E-28029, Spain
| | - Obdulia Rabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Anália Lourenço
- ESEI - Department of Computer Science, University of Vigo , Edificio Politécnico, Campus Universitario As Lagoas s/n, Ourense E-32004, Spain.,Centro de Investigaciones Biomédicas (Centro Singular de Investigación de Galicia) , Campus Universitario Lagoas-Marcosende, Vigo E-36310, Spain.,CEB-Centre of Biological Engineering, University of Minho , Campus de Gualtar, Braga 4710-057, Portugal
| | - Julen Oyarzabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Alfonso Valencia
- Life Science Department, Barcelona Supercomputing Centre (BSC-CNS) , C/Jordi Girona, 29-31, Barcelona E-08034, Spain.,Joint BSC-IRB-CRG Program in Computational Biology, Parc Científic de Barcelona , C/ Baldiri Reixac 10, Barcelona E-08028, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA) , Passeig de Lluís Companys 23, Barcelona E-08010, Spain
| |
Collapse
|
5
|
Jácome AG, Fdez-Riverola F, Lourenço A. BIOMedical Search Engine Framework: Lightweight and customized implementation of domain-specific biomedical search engines. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2016; 131:63-77. [PMID: 27265049 DOI: 10.1016/j.cmpb.2016.03.030] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2015] [Revised: 02/10/2016] [Accepted: 03/29/2016] [Indexed: 06/05/2023]
Abstract
BACKGROUND AND OBJECTIVES Text mining and semantic analysis approaches can be applied to the construction of biomedical domain-specific search engines and provide an attractive alternative to create personalized and enhanced search experiences. Therefore, this work introduces the new open-source BIOMedical Search Engine Framework for the fast and lightweight development of domain-specific search engines. The rationale behind this framework is to incorporate core features typically available in search engine frameworks with flexible and extensible technologies to retrieve biomedical documents, annotate meaningful domain concepts, and develop highly customized Web search interfaces. METHODS The BIOMedical Search Engine Framework integrates taggers for major biomedical concepts, such as diseases, drugs, genes, proteins, compounds and organisms, and enables the use of domain-specific controlled vocabulary. Technologies from the Typesafe Reactive Platform, the AngularJS JavaScript framework and the Bootstrap HTML/CSS framework support the customization of the domain-oriented search application. Moreover, the RESTful API of the BIOMedical Search Engine Framework allows the integration of the search engine into existing systems or a complete web interface personalization. RESULTS The construction of the Smart Drug Search is described as proof-of-concept of the BIOMedical Search Engine Framework. This public search engine catalogs scientific literature about antimicrobial resistance, microbial virulence and topics alike. The keyword-based queries of the users are transformed into concepts and search results are presented and ranked accordingly. The semantic graph view portraits all the concepts found in the results, and the researcher may look into the relevance of different concepts, the strength of direct relations, and non-trivial, indirect relations. The number of occurrences of the concept shows its importance to the query, and the frequency of concept co-occurrence is indicative of biological relations meaningful to that particular scope of research. Conversely, indirect concept associations, i.e. concepts related by other intermediary concepts, can be useful to integrate information from different studies and look into non-trivial relations. CONCLUSIONS The BIOMedical Search Engine Framework supports the development of domain-specific search engines. The key strengths of the framework are modularity and extensibilityin terms of software design, the use of open-source consolidated Web technologies, and the ability to integrate any number of biomedical text mining tools and information resources. Currently, the Smart Drug Search keeps over 1,186,000 documents, containing more than 11,854,000 annotations for 77,200 different concepts. The Smart Drug Search is publicly accessible at http://sing.ei.uvigo.es/sds/. The BIOMedical Search Engine Framework is freely available for non-commercial use at https://github.com/agjacome/biomsef.
Collapse
Affiliation(s)
- Alberto G Jácome
- ESEI-Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, Universidad de Vigo, 32004 Ourense, Spain
| | - Florentino Fdez-Riverola
- ESEI-Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, Universidad de Vigo, 32004 Ourense, Spain
| | - Anália Lourenço
- ESEI-Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, Universidad de Vigo, 32004 Ourense, Spain; Centre of Biological Engineering, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal.
| |
Collapse
|
6
|
How to learn about gene function: text-mining or ontologies? Methods 2014; 74:3-15. [PMID: 25088781 DOI: 10.1016/j.ymeth.2014.07.004] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2014] [Revised: 07/01/2014] [Accepted: 07/09/2014] [Indexed: 12/31/2022] Open
Abstract
As the amount of genome information increases rapidly, there is a correspondingly greater need for methods that provide accurate and automated annotation of gene function. For example, many high-throughput technologies--e.g., next-generation sequencing--are being used today to generate lists of genes associated with specific conditions. However, their functional interpretation remains a challenge and many tools exist trying to characterize the function of gene-lists. Such systems rely typically in enrichment analysis and aim to give a quick insight into the underlying biology by presenting it in a form of a summary-report. While the load of annotation may be alleviated by such computational approaches, the main challenge in modern annotation remains to develop a systems form of analysis in which a pipeline can effectively analyze gene-lists quickly and identify aggregated annotations through computerized resources. In this article we survey some of the many such tools and methods that have been developed to automatically interpret the biological functions underlying gene-lists. We overview current functional annotation aspects from the perspective of their epistemology (i.e., the underlying theories used to organize information about gene function into a body of verified and documented knowledge) and find that most of the currently used functional annotation methods fall broadly into one of two categories: they are based either on 'known' formally-structured ontology annotations created by 'experts' (e.g., the GO terms used to describe the function of Entrez Gene entries), or--perhaps more adventurously--on annotations inferred from literature (e.g., many text-mining methods use computer-aided reasoning to acquire knowledge represented in natural languages). Overall however, deriving detailed and accurate insight from such gene lists remains a challenging task, and improved methods are called for. In particular, future methods need to (1) provide more holistic insight into the underlying molecular systems; (2) provide better follow-up experimental testing and treatment options, and (3) better manage gene lists derived from organisms that are not well-studied. We discuss some promising approaches that may help achieve these advances, especially the use of extended dictionaries of biomedical concepts and molecular mechanisms, as well as greater use of annotation benchmarks.
Collapse
|