Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Smalheiser NR, Fragnito DP, Tirk EE. Anne O'Tate: Value-added PubMed search engine for analysis and text mining. PLoS One 2021;16:e0248335. [PMID: 33684153 PMCID: PMC7939269 DOI: 10.1371/journal.pone.0248335] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 02/24/2021] [Indexed: 11/30/2022] Open

For:	Smalheiser NR, Fragnito DP, Tirk EE. Anne O'Tate: Value-added PubMed search engine for analysis and text mining. PLoS One 2021;16:e0248335. [PMID: 33684153 PMCID: PMC7939269 DOI: 10.1371/journal.pone.0248335] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 02/24/2021] [Indexed: 11/30/2022] Open

Number

Cited by Other Article(s)

Menke JD, Ming S, Radhakrishna S, Kilicoglu H, Smalheiser NR. Enhancing automated indexing of publication types and study designs in biomedical literature using full-text features. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2025.04.23.25326300. [PMID: 40343026 PMCID: PMC12060953 DOI: 10.1101/2025.04.23.25326300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/11/2025]

Abstract

Objective

Searching for biomedical articles by publication type or study design is essential for tasks like evidence synthesis. Prior work has relied solely on PubMed information or a limited set of types (e.g., randomized controlled trials). This study builds on our previous work by leveraging full-text features, alternative text representations, and advanced optimization techniques.

Methods

Using a dataset of PubMed articles published between 1987 and 2023 with human-curated indexing terms, we fine-tuned BERT-based encoders (PubMedBERT, BioLinkBERT, SPECTER, SPECTER2, SPECTER2-Clf) to investigate whether text representations based on different pre-training objectives could benefit the task. We incorporated textual and verbalized metadata features, full-text extraction (rule-based, extractive, and abstractive summarization), and additional topical information about the articles. To improve calibration and mitigate label noise, we used asymmetric loss and label smoothing. We also explored contrastive learning approaches (SimCSE, ADNCE, HeroCon, WeighCon). Models were evaluated using precision, recall, F1 score (both micro- and macro-), and area under ROC curve (AUC).

Results

Fine-tuning SPECTER2-base with adding the MeSH term "Animals", asymmetric loss with label smoothing, and WeighCon contrastive loss improved performance significantly over the previous best architecture (micro-F1: 0.664 → 0.679 [ + 2.2 % ] ; macro-F1: 0.663 → 0.690 [ + 4.1 % ] ; p < 0.0001). Asymmetric loss and using SPECTER2-base instead of PubMedBERT contributed most to this gain. Full-text features boosted performance by 2.4% (micro-F1) and 1.8% (macro-F1) over the baseline (micro-F1: 0.616 → 0.631 macro-F1: 0.556 → 0.566 ; p < 0.0001). Topical label splitting and contrastive learning provided minor, non-significant improvements.

Conclusion

Full-text features, enhanced document representations, and fine-tuning optimizations improve publication type and study design indexing. Future work should refine label accuracy, better distill relevant article information, and expand label sets to meet needs of the research community. Data, code, and models are available at https://github.com/ScienceNLP-Lab/MultiTagger-v2.

Collapse

Jin Q, Leaman R, Lu Z. PubMed and beyond: biomedical literature search in the age of artificial intelligence. EBioMedicine 2024;100:104988. [PMID: 38306900 PMCID: PMC10850402 DOI: 10.1016/j.ebiom.2024.104988] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Revised: 01/14/2024] [Accepted: 01/15/2024] [Indexed: 02/04/2024] Open