Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Peng Y, Rios A, Kavuluru R, Lu Z. Extracting chemical-protein relations with ensembles of SVM and deep learning models. Database (Oxford) 2018;2018:5055578. [PMID: 30020437 PMCID: PMC6051439 DOI: 10.1093/database/bay073] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Accepted: 06/15/2018] [Indexed: 11/14/2022]

For:	Peng Y, Rios A, Kavuluru R, Lu Z. Extracting chemical-protein relations with ensembles of SVM and deep learning models. Database (Oxford) 2018;2018:5055578. [PMID: 30020437 PMCID: PMC6051439 DOI: 10.1093/database/bay073] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Accepted: 06/15/2018] [Indexed: 11/14/2022]

Number

Cited by Other Article(s)

Wang Y, Wang B, Zou J, Wu A, Liu Y, Wan Y, Luo J, Wu J. Capsule neural network and its applications in drug discovery. iScience 2025;28:112217. [PMID: 40241764 PMCID: PMC12002614 DOI: 10.1016/j.isci.2025.112217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/18/2025] Open

Lilhore UK, Simiaya S, Alhussein M, Faujdar N, Dalal S, Aurangzeb K. Optimizing protein sequence classification: integrating deep learning models with Bayesian optimization for enhanced biological analysis. BMC Med Inform Decis Mak 2024;24:236. [PMID: 39192227 DOI: 10.1186/s12911-024-02631-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2024] [Accepted: 08/07/2024] [Indexed: 08/29/2024] Open

Zhang Y, Yang Z, Yang Y, Lin H, Wang J. Location-enhanced syntactic knowledge for biomedical relation extraction. J Biomed Inform 2024;156:104676. [PMID: 38876451 DOI: 10.1016/j.jbi.2024.104676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 06/08/2024] [Accepted: 06/10/2024] [Indexed: 06/16/2024]

Yuan J, Zhang F, Qiu Y, Lin H, Zhang Y. Document-level biomedical relation extraction via hierarchical tree graph and relation segmentation module. Bioinformatics 2024;40:btae418. [PMID: 38917409 PMCID: PMC11629692 DOI: 10.1093/bioinformatics/btae418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Revised: 05/27/2024] [Accepted: 06/24/2024] [Indexed: 06/27/2024] Open

He J, Li F, Li J, Hu X, Nian Y, Xiang Y, Wang J, Wei Q, Li Y, Xu H, Tao C. Prompt Tuning in Biomedical Relation Extraction. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2024;8:206-224. [PMID: 38681754 PMCID: PMC11052745 DOI: 10.1007/s41666-024-00162-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 02/09/2024] [Accepted: 02/19/2024] [Indexed: 05/01/2024]

Abstract

Biomedical relation extraction (RE) is critical in constructing high-quality knowledge graphs and databases as well as supporting many downstream text mining applications. This paper explores prompt tuning on biomedical RE and its few-shot scenarios, aiming to propose a simple yet effective model for this specific task. Prompt tuning reformulates natural language processing (NLP) downstream tasks into masked language problems by embedding specific text prompts into the original input, facilitating the adaption of pre-trained language models (PLMs) to better address these tasks. This study presents a customized prompt tuning model designed explicitly for biomedical RE, including its applicability in few-shot learning contexts. The model's performance was rigorously assessed using the chemical-protein relation (CHEMPROT) dataset from BioCreative VI and the drug-drug interaction (DDI) dataset from SemEval-2013, showcasing its superior performance over conventional fine-tuned PLMs across both datasets, encompassing few-shot scenarios. This observation underscores the effectiveness of prompt tuning in enhancing the capabilities of conventional PLMs, though the extent of enhancement may vary by specific model. Additionally, the model demonstrated a harmonious balance between simplicity and efficiency, matching state-of-the-art performance without needing external knowledge or extra computational resources. The pivotal contribution of our study is the development of a suitably designed prompt tuning model, highlighting prompt tuning's effectiveness in biomedical RE. It offers a robust, efficient approach to the field's challenges and represents a significant advancement in extracting complex relations from biomedical texts.

Supplementary Information

The online version contains supplementary material available at 10.1007/s41666-024-00162-9.

Collapse

Liu H, Soroush A, Nestor JG, Park E, Idnay B, Fang Y, Pan J, Liao S, Bernard M, Peng Y, Weng C. Retrieval augmented scientific claim verification. JAMIA Open 2024;7:ooae021. [PMID: 38455840 PMCID: PMC10919922 DOI: 10.1093/jamiaopen/ooae021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/19/2024] [Accepted: 02/14/2024] [Indexed: 03/09/2024] Open

Miranda-Escalada A, Mehryary F, Luoma J, Estrada-Zavala D, Gasco L, Pyysalo S, Valencia A, Krallinger M. Overview of DrugProt task at BioCreative VII: data and methods for large-scale text mining and knowledge graph generation of heterogenous chemical-protein relations. Database (Oxford) 2023;2023:baad080. [PMID: 38015956 PMCID: PMC10683943 DOI: 10.1093/database/baad080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Revised: 09/22/2023] [Accepted: 10/30/2023] [Indexed: 11/30/2023]

Abstract

It is getting increasingly challenging to efficiently exploit drug-related information described in the growing amount of scientific literature. Indeed, for drug-gene/protein interactions, the challenge is even bigger, considering the scattered information sources and types of interactions. However, their systematic, large-scale exploitation is key for developing tools, impacting knowledge fields as diverse as drug design or metabolic pathway research. Previous efforts in the extraction of drug-gene/protein interactions from the literature did not address these scalability and granularity issues. To tackle them, we have organized the DrugProt track at BioCreative VII. In the context of the track, we have released the DrugProt Gold Standard corpus, a collection of 5000 PubMed abstracts, manually annotated with granular drug-gene/protein interactions. We have proposed a novel large-scale track to evaluate the capacity of natural language processing systems to scale to the range of millions of documents, and generate with their predictions a silver standard knowledge graph of 53 993 602 nodes and 19 367 406 edges. Its use exceeds the shared task and points toward pharmacological and biological applications such as drug discovery or continuous database curation. Finally, we have created a persistent evaluation scenario on CodaLab to continuously evaluate new relation extraction systems that may arise. Thirty teams from four continents, which involved 110 people, sent 107 submission runs for the Main DrugProt track, and nine teams submitted 21 runs for the Large Scale DrugProt track. Most participants implemented deep learning approaches based on pretrained transformer-like language models (LMs) such as BERT or BioBERT, reaching precision and recall values as high as 0.9167 and 0.9542 for some relation types. Finally, some initial explorations of the applicability of the knowledge graph have shown its potential to explore the chemical-protein relations described in the literature, or chemical compound-enzyme interactions. Database URL: https://doi.org/10.5281/zenodo.4955410.

Collapse

Lai PT, Wei CH, Luo L, Chen Q, Lu Z. BioREx: Improving biomedical relation extraction by leveraging heterogeneous datasets. J Biomed Inform 2023;146:104487. [PMID: 37673376 DOI: 10.1016/j.jbi.2023.104487] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 08/18/2023] [Accepted: 09/02/2023] [Indexed: 09/08/2023]

Abstract

Biomedical relation extraction (RE) is the task of automatically identifying and characterizing relations between biomedical concepts from free text. RE is a central task in biomedical natural language processing (NLP) research and plays a critical role in many downstream applications, such as literature-based discovery and knowledge graph construction. State-of-the-art methods were used primarily to train machine learning models on individual RE datasets, such as protein-protein interaction and chemical-induced disease relation. Manual dataset annotation, however, is highly expensive and time-consuming, as it requires domain knowledge. Existing RE datasets are usually domain-specific or small, which limits the development of generalized and high-performing RE models. In this work, we present a novel framework for systematically addressing the data heterogeneity of individual datasets and combining them into a large dataset. Based on the framework and dataset, we report on BioREx, a data-centric approach for extracting relations. Our evaluation shows that BioREx achieves significantly higher performance than the benchmark system trained on the individual dataset, setting a new SOTA from 74.4% to 79.6% in F-1 measure on the recently released BioRED corpus. We further demonstrate that the combined dataset can improve performance for five different RE tasks. In addition, we show that on average BioREx compares favorably to current best-performing methods such as transfer learning and multi-task learning. Finally, we demonstrate BioREx's robustness and generalizability in two independent RE tasks not previously seen in training data: drug-drug N-ary combination and document-level gene-disease RE. The integrated dataset and optimized method have been packaged as a stand-alone tool available at https://github.com/ncbi/BioREx.

Collapse

Ai X, Kavuluru R. End-to-End Models for Chemical-Protein Interaction Extraction: Better Tokenization and Span-Based Pipeline Strategies. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS 2023;2023:610-618. [PMID: 38274947 PMCID: PMC10809256 DOI: 10.1109/ichi57859.2023.00108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2024]

Jiang Y, Kavuluru R. End-to-End n-ary Relation Extraction for Combination Drug Therapies. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS 2023;2023:72-80. [PMID: 38283165 PMCID: PMC10814995 DOI: 10.1109/ichi57859.2023.00021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2024]

Álvarez Montoya AC, Sepúlveda Rincón CT, Zapata Montoya JE. Modelling of the kinetics of red tilapia (Oreochromis spp.) viscera enzymatic hydrolysis using mathematical and neural network models. INTERNATIONAL FOOD RESEARCH JOURNAL 2022. [DOI: 10.47836/ifrj.29.6.16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Garabaghi FH, Benzer R, Benzer S, Günal Ç. Effect of polynomial, radial basis, and Pearson VII function kernels in support vector machine algorithm for classification of crayfish. ECOL INFORM 2022. [DOI: 10.1016/j.ecoinf.2022.101911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Luo L, Lai PT, Wei CH, Arighi CN, Lu Z. BioRED: a rich biomedical relation extraction dataset. Brief Bioinform 2022;23:6645993. [PMID: 35849818 PMCID: PMC9487702 DOI: 10.1093/bib/bbac282] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2022] [Revised: 06/02/2022] [Accepted: 06/19/2022] [Indexed: 11/13/2022] Open

Luo L, Lai PT, Wei CH, Lu Z. A sequence labeling framework for extracting drug-protein relations from biomedical literature. Database (Oxford) 2022;2022:baac058. [PMID: 35856889 PMCID: PMC9297941 DOI: 10.1093/database/baac058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 05/24/2022] [Accepted: 07/14/2022] [Indexed: 06/15/2023]

Zheng J, Xiao X, Qiu WR. DTI-BERT: Identifying Drug-Target Interactions in Cellular Networking Based on BERT and Deep Learning Method. Front Genet 2022;13:859188. [PMID: 35754843 PMCID: PMC9213727 DOI: 10.3389/fgene.2022.859188] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 04/25/2022] [Indexed: 11/20/2022] Open

Zanoli R, Lavelli A, Löffler T, Perez Gonzalez NA, Rinaldi F. An annotated dataset for extracting gene-melanoma relations from scientific literature. J Biomed Semantics 2022;13:2. [PMID: 35045882 PMCID: PMC8772125 DOI: 10.1186/s13326-021-00251-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Accepted: 08/27/2021] [Indexed: 11/10/2022] Open

Abstract Abstract Background Melanoma is one of the least common but the deadliest of skin cancers. This cancer begins when the genes of a cell suffer damage or fail, and identifying the genes involved in melanoma is crucial for understanding the melanoma tumorigenesis. Thousands of publications about human melanoma appear every year. However, while biological curation of data is costly and time-consuming, to date the application of machine learning for gene-melanoma relation extraction from text has been severely limited by the lack of annotated resources. Results To overcome this lack of resources for melanoma, we have exploited the information of the Melanoma Gene Database (MGDB, a manually curated database of genes involved in human melanoma) to automatically build an annotated dataset of binary relations between gene and melanoma entities occurring in PubMed abstracts. The entities were automatically annotated by state-of-the-art text-mining tools. Their annotation includes both the mention text spans and normalized concept identifiers. The relations among the entities were annotated at concept- and mention-level. The concept-level annotation was produced using the information of the genes in MGDB to decide if a relation holds between a gene and melanoma concept in the whole abstract. The exploitability of this dataset was tested with both traditional machine learning, and neural network-based models like BERT. The models were then used to automatically extract gene-melanoma relations from the biomedical literature. Most of the current models use context-aware representations of the target entities to establish relations between them. To facilitate researchers in their experiments we generated a mention-level annotation in support to the concept-level annotation. The mention-level annotation was generated by automatically linking gene and melanoma mentions co-occurring within the sentences that in MGDB establish the association of the gene with melanoma. Conclusions This paper presents a corpus containing gene-melanoma annotated relations. Additionally, it discusses experiments which show the usefulness of such a corpus for training a system capable of mining gene-melanoma relationships from the literature. Researchers can use the corpus to develop and compare their own models, and produce results which might be integrated with existing structured knowledge databases, which in turn might facilitate medical research. Collapse

Pourreza Shahri M, Kahanda I. Deep semi-supervised learning ensemble framework for classifying co-mentions of human proteins and phenotypes. BMC Bioinformatics 2021;22:500. [PMID: 34656098 PMCID: PMC8520253 DOI: 10.1186/s12859-021-04421-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Accepted: 10/04/2021] [Indexed: 11/13/2022] Open

Abstract

Background

Identifying human protein-phenotype relationships has attracted researchers in bioinformatics and biomedical natural language processing due to its importance in uncovering rare and complex diseases. Since experimental validation of protein-phenotype associations is prohibitive, automated tools capable of accurately extracting these associations from the biomedical text are in high demand. However, while the manual annotation of protein-phenotype co-mentions required for training such models is highly resource-consuming, extracting millions of unlabeled co-mentions is straightforward.

Results

In this study, we propose a novel deep semi-supervised ensemble framework that combines deep neural networks, semi-supervised, and ensemble learning for classifying human protein-phenotype co-mentions with the help of unlabeled data. This framework allows the ability to incorporate an extensive collection of unlabeled sentence-level co-mentions of human proteins and phenotypes with a small labeled dataset to enhance overall performance. We develop PPPredSS, a prototype of our proposed semi-supervised framework that combines sophisticated language models, convolutional networks, and recurrent networks. Our experimental results demonstrate that the proposed approach provides a new state-of-the-art performance in classifying human protein-phenotype co-mentions by outperforming other supervised and semi-supervised counterparts. Furthermore, we highlight the utility of PPPredSS in powering a curation assistant system through case studies involving a group of biologists.

Conclusions

This article presents a novel approach for human protein-phenotype co-mention classification based on deep, semi-supervised, and ensemble learning. The insights and findings from this work have implications for biomedical researchers, biocurators, and the text mining community working on biomedical relationship extraction.

Collapse

Warikoo N, Chang YC, Hsu WL. LBERT: Lexically aware Transformer-based Bidirectional Encoder Representation model for learning universal bio-entity relations. Bioinformatics 2021;37:404-412. [PMID: 32810217 DOI: 10.1093/bioinformatics/btaa721] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 06/30/2020] [Accepted: 08/13/2020] [Indexed: 11/13/2022] Open

Abstract

MOTIVATION

Natural Language Processing techniques are constantly being advanced to accommodate the influx of data as well as to provide exhaustive and structured knowledge dissemination. Within the biomedical domain, relation detection between bio-entities known as the Bio-Entity Relation Extraction (BRE) task has a critical function in knowledge structuring. Although recent advances in deep learning-based biomedical domain embedding have improved BRE predictive analytics, these works are often task selective or use external knowledge-based pre-/post-processing. In addition, deep learning-based models do not account for local syntactic contexts, which have improved data representation in many kernel classifier-based models. In this study, we propose a universal BRE model, i.e. LBERT, which is a Lexically aware Transformer-based Bidirectional Encoder Representation model, and which explores both local and global contexts representations for sentence-level classification tasks.

RESULTS

This article presents one of the most exhaustive BRE studies ever conducted over five different bio-entity relation types. Our model outperforms state-of-the-art deep learning models in protein-protein interaction (PPI), drug-drug interaction and protein-bio-entity relation classification tasks by 0.02%, 11.2% and 41.4%, respectively. LBERT representations show a statistically significant improvement over BioBERT in detecting true bio-entity relation for large corpora like PPI. Our ablation studies clearly indicate the contribution of the lexical features and distance-adjusted attention in improving prediction performance by learning additional local semantic context along with bi-directionally learned global context.

AVAILABILITY AND IMPLEMENTATION

Github. https://github.com/warikoone/LBERT.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

He J, Nguyen DQ, Akhondi SA, Druckenbrodt C, Thorne C, Hoessel R, Afzal Z, Zhai Z, Fang B, Yoshikawa H, Albahem A, Cavedon L, Cohn T, Baldwin T, Verspoor K. ChEMU 2020: Natural Language Processing Methods Are Effective for Information Extraction From Chemical Patents. Front Res Metr Anal 2021;6:654438. [PMID: 33870071 PMCID: PMC8028406 DOI: 10.3389/frma.2021.654438] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Accepted: 02/24/2021] [Indexed: 11/21/2022] Open

Lwowski B, Rios A. The risk of racial bias while tracking influenza-related content on social media using machine learning. J Am Med Inform Assoc 2021;28:839-849. [PMID: 33484133 PMCID: PMC7973478 DOI: 10.1093/jamia/ocaa326] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2020] [Accepted: 12/08/2020] [Indexed: 11/13/2022] Open

Shao Y, Li H, Gu J, Qian L, Zhou G. Extraction of causal relations based on SBEL and BERT model. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021;2021:6133143. [PMID: 33570092 PMCID: PMC7904051 DOI: 10.1093/database/baab005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 01/19/2021] [Accepted: 01/26/2021] [Indexed: 11/15/2022]

Lai PT, Lu Z. BERT-GT: Cross-sentence n-ary relation extraction with BERT and graph transformer. Bioinformatics 2021;36:5678-5685. [PMID: 33416851 PMCID: PMC8023679 DOI: 10.1093/bioinformatics/btaa1087] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 12/17/2020] [Accepted: 12/20/2020] [Indexed: 11/14/2022] Open

Wang W, Yang X, Wu C, Yang C. CGINet: graph convolutional network-based model for identifying chemical-gene interaction in an integrated multi-relational graph. BMC Bioinformatics 2020;21:544. [PMID: 33243142 PMCID: PMC7689985 DOI: 10.1186/s12859-020-03899-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Accepted: 11/19/2020] [Indexed: 11/19/2022] Open

Nicholson DN, Greene CS. Constructing knowledge graphs and their biomedical applications. Comput Struct Biotechnol J 2020;18:1414-1428. [PMID: 32637040 PMCID: PMC7327409 DOI: 10.1016/j.csbj.2020.05.017] [Citation(s) in RCA: 95] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 05/22/2020] [Accepted: 05/23/2020] [Indexed: 12/31/2022] Open

Liu X, Fan J, Dong S. Document-Level Biomedical Relation Extraction Leveraging Pretrained Self-Attention Structure and Entity Replacement: Algorithm and Pretreatment Method Validation Study. JMIR Med Inform 2020;8:e17644. [PMID: 32469325 PMCID: PMC7314385 DOI: 10.2196/17644] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Revised: 03/02/2020] [Accepted: 03/19/2020] [Indexed: 01/26/2023] Open

Abstract

Background

The most current methods applied for intrasentence relation extraction in the biomedical literature are inadequate for document-level relation extraction, in which the relationship may cross sentence boundaries. Hence, some approaches have been proposed to extract relations by splitting the document-level datasets through heuristic rules and learning methods. However, these approaches may introduce additional noise and do not really solve the problem of intersentence relation extraction. It is challenging to avoid noise and extract cross-sentence relations.

Objective

This study aimed to avoid errors by dividing the document-level dataset, verify that a self-attention structure can extract biomedical relations in a document with long-distance dependencies and complex semantics, and discuss the relative benefits of different entity pretreatment methods for biomedical relation extraction.

Methods

This paper proposes a new data preprocessing method and attempts to apply a pretrained self-attention structure for document biomedical relation extraction with an entity replacement method to capture very long-distance dependencies and complex semantics.

Results

Compared with state-of-the-art approaches, our method greatly improved the precision. The results show that our approach increases the F1 value, compared with state-of-the-art methods. Through experiments of biomedical entity pretreatments, we found that a model using an entity replacement method can improve performance.

Conclusions

When considering all target entity pairs as a whole in the document-level dataset, a pretrained self-attention structure is suitable to capture very long-distance dependencies and learn the textual context and complicated semantics. A replacement method for biomedical entities is conducive to biomedical relation extraction, especially to document-level relation extraction.

Collapse

Bio-semantic relation extraction with attention-based external knowledge reinforcement. BMC Bioinformatics 2020;21:213. [PMID: 32448122 PMCID: PMC7245897 DOI: 10.1186/s12859-020-3540-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Accepted: 05/07/2020] [Indexed: 12/13/2022] Open

Wang E, Wang F, Yang Z, Wang L, Zhang Y, Lin H, Wang J. A Graph Convolutional Network-Based Method for Chemical-Protein Interaction Extraction: Algorithm Development. JMIR Med Inform 2020;8:e17643. [PMID: 32348257 PMCID: PMC7267994 DOI: 10.2196/17643] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Revised: 03/14/2020] [Accepted: 03/19/2020] [Indexed: 01/06/2023] Open

Abstract

Background

Extracting the interactions between chemicals and proteins from the biomedical literature is important for many biomedical tasks such as drug discovery, medicine precision, and knowledge graph construction. Several computational methods have been proposed for automatic chemical-protein interaction (CPI) extraction. However, the majority of these proposed models cannot effectively learn semantic and syntactic information from complex sentences in biomedical texts.

Objective

To relieve this problem, we propose a method to effectively encode syntactic information from long text for CPI extraction.

Methods

Since syntactic information can be captured from dependency graphs, graph convolutional networks (GCNs) have recently drawn increasing attention in natural language processing. To investigate the performance of a GCN on CPI extraction, this paper proposes a novel GCN-based model. The model can effectively capture sequential information and long-range syntactic relations between words by using the dependency structure of input sentences.

Results

We evaluated our model on the ChemProt corpus released by BioCreative VI; it achieved an F-score of 65.17%, which is 1.07% higher than that of the state-of-the-art system proposed by Peng et al. As indicated by the significance test (P<.001), the improvement is significant. It indicates that our model is effective in extracting CPIs. The GCN-based model can better capture the semantic and syntactic information of the sentence compared to other models, therefore alleviating the problems associated with the complexity of biomedical literature.

Conclusions

Our model can obtain more information from the dependency graph than previously proposed models. Experimental results suggest that it is competitive to state-of-the-art methods and significantly outperforms other methods on the ChemProt corpus, which is the benchmark data set for CPI extraction.

Collapse

Döring K, Qaseem A, Becer M, Li J, Mishra P, Gao M, Kirchner P, Sauter F, Telukunta KK, Moumbock AFA, Thomas P, Günther S. Automated recognition of functional compound-protein relationships in literature. PLoS One 2020;15:e0220925. [PMID: 32126064 PMCID: PMC7053725 DOI: 10.1371/journal.pone.0220925] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Accepted: 01/29/2020] [Indexed: 11/18/2022] Open

Sun C, Yang Z, Wang L, Zhang Y, Lin H, Wang J. Attention guided capsule networks for chemical-protein interaction extraction. J Biomed Inform 2020;103:103392. [PMID: 32068034 DOI: 10.1016/j.jbi.2020.103392] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Revised: 02/08/2020] [Accepted: 02/11/2020] [Indexed: 11/19/2022]

A neural network-based joint learning approach for biomedical entity and relation extraction from biomedical literature. J Biomed Inform 2020;103:103384. [PMID: 32032717 DOI: 10.1016/j.jbi.2020.103384] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Revised: 11/19/2019] [Accepted: 02/03/2020] [Indexed: 11/24/2022]

Chung JW, Yang W, Park JC. Unsupervised inference of implicit biomedical events using context triggers. BMC Bioinformatics 2020;21:29. [PMID: 31992184 PMCID: PMC6988352 DOI: 10.1186/s12859-020-3341-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Accepted: 01/07/2020] [Indexed: 12/23/2022] Open

Abstract

BACKGROUND

Event extraction from the biomedical literature is one of the most actively researched areas in biomedical text mining and natural language processing. However, most approaches have focused on events within single sentence boundaries, and have thus paid much less attention to events spanning multiple sentences. The Bacteria-Biotope event (BB-event) subtask presented in BioNLP Shared Task 2016 is one such example; a significant amount of relations between bacteria and biotope span more than one sentence, but existing systems have treated them as false negatives because labeled data is not sufficiently large enough to model a complex reasoning process using supervised learning frameworks.

RESULTS

We present an unsupervised method for inferring cross-sentence events by propagating intra-sentence information to adjacent sentences using context trigger expressions that strongly signal the implicit presence of entities of interest. Such expressions can be collected from a large amount of unlabeled plain text based on simple syntactic constraints, helping to overcome the limitation of relying only on a small number of training examples available. The experimental results demonstrate that our unsupervised system extracts cross-sentence events quite well and outperforms all the state-of-the-art supervised systems when combined with existing methods for intra-sentence event extraction. Moreover, our system is also found effective at detecting long-distance intra-sentence events, compared favorably with existing high-dimensional models such as deep neural networks, without any supervised learning techniques.

CONCLUSIONS

Our linguistically motivated inference model is shown to be effective at detecting implicit events that have not been covered by previous work, without relying on training data or curated knowledge bases. Moreover, it also helps to boost the performance of existing systems by allowing them to detect additional cross-sentence events. We believe that the proposed model offers an effective way to infer implicit information beyond sentence boundaries, especially when human-annotated data is not sufficient enough to train a robust supervised system.

Collapse

Zhang Y, Lin H, Yang Z, Wang J, Sun Y. Chemical-protein interaction extraction via contextualized word representations and multihead attention. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020;2019:5498050. [PMID: 31125403 PMCID: PMC6534182 DOI: 10.1093/database/baz054] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Revised: 03/16/2019] [Accepted: 04/02/2019] [Indexed: 12/17/2022]

Neural network-based approaches for biomedical relation classification: A review. J Biomed Inform 2019;99:103294. [DOI: 10.1016/j.jbi.2019.103294] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 06/02/2019] [Accepted: 09/21/2019] [Indexed: 12/14/2022]

Tsueng G, Nanis M, Fouquier JT, Mayers M, Good BM, Su AI. Applying citizen science to gene, drug and disease relationship extraction from biomedical abstracts. Bioinformatics 2019;36:1226-1233. [PMID: 31504205 PMCID: PMC8104067 DOI: 10.1093/bioinformatics/btz678] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 08/05/2019] [Accepted: 08/29/2019] [Indexed: 01/31/2023] Open

Abstract

MOTIVATION

Biomedical literature is growing at a rate that outpaces our ability to harness the knowledge contained therein. To mine valuable inferences from the large volume of literature, many researchers use information extraction algorithms to harvest information in biomedical texts. Information extraction is usually accomplished via a combination of manual expert curation and computational methods. Advances in computational methods usually depend on the time-consuming generation of gold standards by a limited number of expert curators. Citizen science is public participation in scientific research. We previously found that citizen scientists are willing and capable of performing named entity recognition of disease mentions in biomedical abstracts, but did not know if this was true with relationship extraction (RE).

RESULTS

In this article, we introduce the Relationship Extraction Module of the web-based application Mark2Cure (M2C) and demonstrate that citizen scientists can perform RE. We confirm the importance of accurate named entity recognition on user performance of RE and identify design issues that impacted data quality. We find that the data generated by citizen scientists can be used to identify relationship types not currently available in the M2C Relationship Extraction Module. We compare the citizen science-generated data with algorithm-mined data and identify ways in which the two approaches may complement one another. We also discuss opportunities for future improvement of this system, as well as the potential synergies between citizen science, manual biocuration and natural language processing.

AVAILABILITY AND IMPLEMENTATION

Mark2Cure platform: https://mark2cure.org; Mark2Cure source code: https://github.com/sulab/mark2cure; and data and analysis code for this article: https://github.com/gtsueng/M2C_rel_nb.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Zhang Y, Lu Z. Exploring semi-supervised variational autoencoders for biomedical relation extraction. Methods 2019;166:112-119. [PMID: 30822516 PMCID: PMC6708455 DOI: 10.1016/j.ymeth.2019.02.021] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 01/28/2019] [Accepted: 02/25/2019] [Indexed: 10/27/2022] Open

Abstract

The biomedical literature provides a rich source of knowledge such as protein-protein interactions (PPIs), drug-drug interactions (DDIs) and chemical-protein interactions (CPIs). Biomedical relation extraction aims to automatically extract biomedical relations from biomedical text for various biomedical research. State-of-the-art methods for biomedical relation extraction are primarily based on supervised machine learning and therefore depend on (sufficient) labeled data. However, creating large sets of training data is prohibitively expensive and labor-intensive, especially so in biomedicine as domain knowledge is required. In contrast, there is a large amount of unlabeled biomedical text available in PubMed. Hence, computational methods capable of employing unlabeled data to reduce the burden of manual annotation are of particular interest in biomedical relation extraction. We present a novel semi-supervised approach based on variational autoencoder (VAE) for biomedical relation extraction. Our model consists of the following three parts, a classifier, an encoder and a decoder. The classifier is implemented using multi-layer convolutional neural networks (CNNs), and the encoder and decoder are implemented using both bidirectional long short-term memory networks (Bi-LSTMs) and CNNs, respectively. The semi-supervised mechanism allows our model to learn features from both the labeled and unlabeled data. We evaluate our method on multiple public PPI, DDI and CPI corpora. Experimental results show that our method effectively exploits the unlabeled data to improve the performance and reduce the dependence on labeled data. To our best knowledge, this is the first semi-supervised VAE-based method for (biomedical) relation extraction. Our results suggest that exploiting such unlabeled data can be greatly beneficial to improved performance in various biomedical relation extraction, especially when only limited labeled data (e.g. 2000 samples or less) is available in such tasks.

Collapse

Zhang Y, Chen Q, Yang Z, Lin H, Lu Z. BioWordVec, improving biomedical word embeddings with subword information and MeSH. Sci Data 2019;6:52. [PMID: 31076572 PMCID: PMC6510737 DOI: 10.1038/s41597-019-0055-0] [Citation(s) in RCA: 171] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Accepted: 03/27/2019] [Indexed: 11/10/2022] Open

Matsuzaka Y, Uesawa Y. Optimization of a Deep-Learning Method Based on the Classification of Images Generated by Parameterized Deep Snap a Novel Molecular-Image-Input Technique for Quantitative Structure-Activity Relationship (QSAR) Analysis. Front Bioeng Biotechnol 2019;7:65. [PMID: 30984753 PMCID: PMC6447703 DOI: 10.3389/fbioe.2019.00065] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Accepted: 03/07/2019] [Indexed: 12/22/2022] Open

Abstract

Numerous chemical compounds are distributed around the world and may affect the homeostasis of the endocrine system by disrupting the normal functions of hormone receptors. Although the risks associated with these compounds have been evaluated by acute toxicity testing in mammalian models, the chronic toxicity of many chemicals remains due to high cost of the compounds and the testing, etc. However, computational approaches may be promising alternatives and reduce these evaluations. Recently, deep learning (DL) has been shown to be promising prediction models with high accuracy for recognition of images, speech, signals, and videos since it greatly benefits from large datasets. Recently, a novel DL-based technique called DeepSnap was developed to conduct QSAR analysis using three-dimensional images of chemical structures. It can be used to predict the potential toxicity of many different chemicals to various receptors without extraction of descriptors. DeepSnap has been shown to have a very high capacity in tests using Tox21 quantitative qHTP datasets. Numerous parameters must be adjusted to use the DeepSnap method but they have not been optimized. In this study, the effects of these parameters on the performance of the DL prediction model were evaluated in terms of the loss in validation as an indicator for evaluating the performance of the DL using the toxicity information in the Tox21 qHTP database. The relations of the parameters of DeepSnap such as (1) number of molecules per SDF split into (2) zoom factor percentage, (3) atom size for van der waals percentage, (4) bond radius, (5) minimum bond distance, and (6) bond tolerance, with the validation loss following quadratic function curves, which suggests that optimal thresholds exist to attain the best performance with these prediction models. Using the parameter values set with the best performance, the prediction model of chemical compounds for CAR agonist was built using 64 images, at 105° angle, with AUC of 0.791. Thus, based on these parameters, the proposed DeepSnap-DL approach will be highly reliable and beneficial to establish models to assess the risk associated with various chemicals.

Collapse

Lung PY, He Z, Zhao T, Yu D, Zhang J. Extracting chemical-protein interactions from literature using sentence structure analysis and feature engineering. Database (Oxford) 2019;2019:5280305. [PMID: 30624652 PMCID: PMC6323317 DOI: 10.1093/database/bay138] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Revised: 12/04/2018] [Accepted: 12/06/2018] [Indexed: 12/14/2022]

Antunes R, Matos S. Extraction of chemical-protein interactions from the literature using neural networks and narrow instance representation. Database (Oxford) 2019;2019:baz095. [PMID: 31622463 PMCID: PMC6796919 DOI: 10.1093/database/baz095] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Revised: 06/28/2019] [Accepted: 07/01/2019] [Indexed: 01/21/2023]

Yu K, Lung PY, Zhao T, Zhao P, Tseng YY, Zhang J. Automatic extraction of protein-protein interactions using grammatical relationship graph. BMC Med Inform Decis Mak 2018;18:42. [PMID: 30066644 PMCID: PMC6069288 DOI: 10.1186/s12911-018-0628-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open

Abstract

Background

Relationships between bio-entities (genes, proteins, diseases, etc.) constitute a significant part of our knowledge. Most of this information is documented as unstructured text in different forms, such as books, articles and on-line pages. Automatic extraction of such information and storing it in structured form could help researchers more easily access such information and also make it possible to incorporate it in advanced integrative analysis. In this study, we developed a novel approach to extract bio-entity relationships information using Nature Language Processing (NLP) and a graph-theoretic algorithm.

Methods

Our method, called GRGT (Grammatical Relationship Graph for Triplets), not only extracts the pairs of terms that have certain relationships, but also extracts the type of relationship (the word describing the relationships). In addition, the directionality of the relationship can also be extracted. Our method is based on the assumption that a triplet exists for a pair of interactions. A triplet is defined as two terms (entities) and an interaction word describing the relationship of the two terms in a sentence. We first use a sentence parsing tool to obtain the sentence structure represented as a dependency graph where words are nodes and edges are typed dependencies. The shortest paths among the pairs of words in the triplet are then extracted, which form the basis for our information extraction method. Flexible pattern matching scheme was then used to match a triplet graph with unknown relationship to those triplet graphs with labels (True or False) in the database.

Results

We applied the method on three benchmark datasets to extract the protein-protein-interactions (PPIs), and obtained better precision than the top performing methods in literature.

Conclusions

We have developed a method to extract the protein-protein interactions from biomedical literature. PPIs extracted by our method have higher precision among other methods, suggesting that our method can be used to effectively extract PPIs and deposit them into databases. Beyond extracting PPIs, our method could be easily extended to extracting relationship information between other bio-entities.

Collapse