Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Weber L, Thobe K, Migueles Lozano OA, Wolf J, Leser U. PEDL: extracting protein-protein associations using deep language models and distant supervision. Bioinformatics 2020;36:i490-i498. [PMID: 32657389 PMCID: PMC7355289 DOI: 10.1093/bioinformatics/btaa430] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open

For:	Weber L, Thobe K, Migueles Lozano OA, Wolf J, Leser U. PEDL: extracting protein-protein associations using deep language models and distant supervision. Bioinformatics 2020;36:i490-i498. [PMID: 32657389 PMCID: PMC7355289 DOI: 10.1093/bioinformatics/btaa430] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open

Number

Cited by Other Article(s)

Sänger M, Garda S, Wang XD, Weber-Genzel L, Droop P, Fuchs B, Akbik A, Leser U. HunFlair2 in a cross-corpus evaluation of biomedical named entity recognition and normalization tools. Bioinformatics 2024;40:btae564. [PMID: 39302686 PMCID: PMC11453098 DOI: 10.1093/bioinformatics/btae564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 08/23/2024] [Accepted: 09/17/2024] [Indexed: 09/22/2024] Open

Abstract

MOTIVATION

With the exponential growth of the life sciences literature, biomedical text mining (BTM) has become an essential technology for accelerating the extraction of insights from publications. The identification of entities in texts, such as diseases or genes, and their normalization, i.e. grounding them in knowledge base, are crucial steps in any BTM pipeline to enable information aggregation from multiple documents. However, tools for these two steps are rarely applied in the same context in which they were developed. Instead, they are applied "in the wild," i.e. on application-dependent text collections from moderately to extremely different from those used for training, varying, e.g. in focus, genre or text type. This raises the question whether the reported performance, usually obtained by training and evaluating on different partitions of the same corpus, can be trusted for downstream applications.

RESULTS

Here, we report on the results of a carefully designed cross-corpus benchmark for entity recognition and normalization, where tools were applied systematically to corpora not used during their training. Based on a survey of 28 published systems, we selected five, based on predefined criteria like feature richness and availability, for an in-depth analysis on three publicly available corpora covering four entity types. Our results present a mixed picture and show that cross-corpus performance is significantly lower than the in-corpus performance. HunFlair2, the redesigned and extended successor of the HunFlair tool, showed the best performance on average, being closely followed by PubTator Central. Our results indicate that users of BTM tools should expect a lower performance than the original published one when applying tools in "the wild" and show that further research is necessary for more robust BTM tools.

AVAILABILITY AND IMPLEMENTATION

All our models are integrated into the Natural Language Processing (NLP) framework flair: https://github.com/flairNLP/flair. Code to reproduce our results is available at: https://github.com/hu-ner/hunflair2-experiments.

Collapse

Weber L, Barth F, Lorenz L, Konrath F, Huska K, Wolf J, Leser U. PEDL+: protein-centered relation extraction from PubMed at your fingertip. Bioinformatics 2023;39:btad603. [PMID: 37950510 PMCID: PMC10660277 DOI: 10.1093/bioinformatics/btad603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 08/29/2023] [Accepted: 10/31/2023] [Indexed: 11/12/2023] Open

Zirkle J, Han X, Racz R, Samieegohar M, Chaturbedi A, Mann J, Chakravartula S, Li Z. Deep learning-enabled natural language processing to identify directional pharmacokinetic drug-drug interactions. BMC Bioinformatics 2023;24:413. [PMID: 37914988 PMCID: PMC10619324 DOI: 10.1186/s12859-023-05520-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 10/04/2023] [Indexed: 11/03/2023] Open

Affiliation(s)

Joel Zirkle Division of Applied Regulatory Science, Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, Food and Drug Administration, WO Bldg 64 Rm 2078, 10903 New Hampshire Ave, Silver Spring, MD, 20993, USA
Xiaomei Han Division of Applied Regulatory Science, Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, Food and Drug Administration, WO Bldg 64 Rm 2078, 10903 New Hampshire Ave, Silver Spring, MD, 20993, USA
Rebecca Racz Division of Applied Regulatory Science, Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, Food and Drug Administration, WO Bldg 64 Rm 2078, 10903 New Hampshire Ave, Silver Spring, MD, 20993, USA
Mohammadreza Samieegohar Division of Applied Regulatory Science, Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, Food and Drug Administration, WO Bldg 64 Rm 2078, 10903 New Hampshire Ave, Silver Spring, MD, 20993, USA
Anik Chaturbedi Division of Applied Regulatory Science, Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, Food and Drug Administration, WO Bldg 64 Rm 2078, 10903 New Hampshire Ave, Silver Spring, MD, 20993, USA
John Mann Division of Applied Regulatory Science, Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, Food and Drug Administration, WO Bldg 64 Rm 2078, 10903 New Hampshire Ave, Silver Spring, MD, 20993, USA
Shilpa Chakravartula Division of Applied Regulatory Science, Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, Food and Drug Administration, WO Bldg 64 Rm 2078, 10903 New Hampshire Ave, Silver Spring, MD, 20993, USA
Zhihua Li Division of Applied Regulatory Science, Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, Food and Drug Administration, WO Bldg 64 Rm 2078, 10903 New Hampshire Ave, Silver Spring, MD, 20993, USA.

Collapse

Dhrangadhariya A, Müller H. Not so weak PICO: leveraging weak supervision for participants, interventions, and outcomes recognition for systematic review automation. JAMIA Open 2023;6:ooac107. [PMID: 36632329 PMCID: PMC9828146 DOI: 10.1093/jamiaopen/ooac107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 12/01/2022] [Accepted: 12/21/2022] [Indexed: 01/11/2023] Open

Abstract

Objective

The aim of this study was to test the feasibility of PICO (participants, interventions, comparators, outcomes) entity extraction using weak supervision and natural language processing.

Methodology

We re-purpose more than 127 medical and nonmedical ontologies and expert-generated rules to obtain multiple noisy labels for PICO entities in the evidence-based medicine (EBM)-PICO corpus. These noisy labels are aggregated using simple majority voting and generative modeling to get consensus labels. The resulting probabilistic labels are used as weak signals to train a weakly supervised (WS) discriminative model and observe performance changes. We explore mistakes in the EBM-PICO that could have led to inaccurate evaluation of previous automation methods.

Results

In total, 4081 randomized clinical trials were weakly labeled to train the WS models and compared against full supervision. The models were separately trained for PICO entities and evaluated on the EBM-PICO test set. A WS approach combining ontologies and expert-generated rules outperformed full supervision for the participant entity by 1.71% macro-F1. Error analysis on the EBM-PICO subset revealed 18-23% erroneous token classifications.

Discussion

Automatic PICO entity extraction accelerates the writing of clinical systematic reviews that commonly use PICO information to filter health evidence. However, PICO extends to more entities-PICOS (S-study type and design), PICOC (C-context), and PICOT (T-timeframe) for which labelled datasets are unavailable. In such cases, the ability to use weak supervision overcomes the expensive annotation bottleneck.

Conclusions

We show the feasibility of WS PICO entity extraction using freely available ontologies and heuristics without manually annotated data. Weak supervision has encouraging performance compared to full supervision but requires careful design to outperform it.

Collapse

Weber L, Sänger M, Garda S, Barth F, Alt C, Leser U. Chemical-protein relation extraction with ensembles of carefully tuned pretrained language models. Database (Oxford) 2022;2022:6833204. [PMID: 36399413 PMCID: PMC9674024 DOI: 10.1093/database/baac098] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 10/18/2022] [Accepted: 10/21/2022] [Indexed: 11/19/2022]