Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kim S, Fiorini N, Wilbur WJ, Lu Z. Bridging the gap: Incorporating a semantic similarity measure for effectively mapping PubMed queries to documents. J Biomed Inform 2017;75:122-127. [PMID: 28986328 DOI: 10.1016/j.jbi.2017.09.014] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Revised: 09/01/2017] [Accepted: 09/30/2017] [Indexed: 11/28/2022]

For:	Kim S, Fiorini N, Wilbur WJ, Lu Z. Bridging the gap: Incorporating a semantic similarity measure for effectively mapping PubMed queries to documents. J Biomed Inform 2017;75:122-127. [PMID: 28986328 DOI: 10.1016/j.jbi.2017.09.014] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Revised: 09/01/2017] [Accepted: 09/30/2017] [Indexed: 11/28/2022]

Number

Cited by Other Article(s)

Khader A, Ensan F. Learning to rank query expansion terms for COVID-19 scholarly search. J Biomed Inform 2023;142:104386. [PMID: 37178780 PMCID: PMC10174726 DOI: 10.1016/j.jbi.2023.104386] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Revised: 04/19/2023] [Accepted: 05/05/2023] [Indexed: 05/15/2023]

Abstract

OBJECTIVE

With the onset of the Coronavirus Disease 2019 (COVID-19) pandemic, there has been a surge in the number of publicly available biomedical information sources, which makes it an increasingly challenging research goal to retrieve a relevant text to a topic of interest. In this paper, we propose a Contextual Query Expansion framework based on the clinical Domain knowledge (CQED) for formalizing an effective search over PubMed to retrieve relevant COVID-19 scholarly articles to a given information need.

MATERIALS AND METHODS

For the sake of training and evaluation, we use the widely adopted TREC-COVID benchmark. Given a query, the proposed framework utilizes a contextual and a domain-specific neural language model to generate a set of candidate query expansion terms that enrich the original query. Moreover, the framework includes a multi-head attention mechanism that is trained alongside a learning-to-rank model for re-ranking the list of generated expansion candidate terms. The original query and the top-ranked expansion terms are posed to the PubMed search engine for retrieving relevant scholarly articles to an information need. The framework, CQED, can have four different variations, depending upon the learning path adopted for training and re-ranking the candidate expansion terms.

RESULTS

The model drastically improves the search performance, when compared to the original query. The performance improvement in comparison to the original query, in terms of terms of RECALL@1000 is 190.85% and in terms of NDCG@1000 is 343.55%. Additionally, the model outperforms all existing state-of-the-art baselines. In terms of P@10, the model that has been optimized based on Precision outperforms all baselines (0.7987). On the other hand, in terms of NDCG@10 (0.7986), MAP (0.3450) and bpref (0.4900), the CQED model that has been optimized based on an average of all retrieval measures outperforms all the baselines.

CONCLUSION

The proposed model successfully expands queries posed to PubMed, and improves search performance, as compared to all existing baselines. A success/failure analysis shows that the model improved the search performance of each of the evaluated queries. Moreover, an ablation study depicted that if ranking of generated candidate terms is not conducted, the overall performance decreases. For future work, we would like to explore the application of the presented query expansion framework in conducting technology-assisted Systematic Literature Reviews (SLR).

Collapse

Sharifpour R, Wu M, Zhang X. Large-scale analysis of query logs to profile users for dataset search. JOURNAL OF DOCUMENTATION 2022. [DOI: 10.1108/jd-12-2021-0245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Chenaina T, Neji S, Shoeb A. Query Sense Discovery Approach to Realize the User's Search Intent. INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH 2022. [DOI: 10.4018/ijirr.289609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Yeganova L, Kim S, Chen Q, Balasanov G, Wilbur WJ, Lu Z. Better synonyms for enriching biomedical search. J Am Med Inform Assoc 2020;27:1894-1902. [PMID: 33083825 PMCID: PMC7727334 DOI: 10.1093/jamia/ocaa151] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Revised: 05/20/2020] [Accepted: 08/20/2020] [Indexed: 01/12/2023] Open

Massonnaud CR, Kerdelhué G, Grosjean J, Lelong R, Griffon N, Darmoni SJ. Identification of the Best Semantic Expansion to Query PubMed Through Automatic Performance Assessment of Four Search Strategies on All Medical Subject Heading Descriptors: Comparative Study. JMIR Med Inform 2020;8:e12799. [PMID: 32496201 PMCID: PMC7303830 DOI: 10.2196/12799] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 01/20/2020] [Accepted: 03/23/2020] [Indexed: 12/04/2022] Open

Abstract

Background

With the continuous expansion of available biomedical data, efficient and effective information retrieval has become of utmost importance. Semantic expansion of queries using synonyms may improve information retrieval.

Objective

The aim of this study was to automatically construct and evaluate expanded PubMed queries of the form “preferred term”[MH] OR “preferred term”[TIAB] OR “synonym 1”[TIAB] OR “synonym 2”[TIAB] OR …, for each of the 28,313 Medical Subject Heading (MeSH) descriptors, by using different semantic expansion strategies. We sought to propose an innovative method that could automatically evaluate these strategies, based on the three main metrics used in information science (precision, recall, and F-measure).

Methods

Three semantic expansion strategies were assessed. They differed by the synonyms used to build the queries as follows: MeSH synonyms, Unified Medical Language System (UMLS) mappings, and custom mappings (Catalogue et Index des Sites Médicaux de langue Française [CISMeF]). The precision, recall, and F-measure metrics were automatically computed for the three strategies and for the standard automatic term mapping (ATM) of PubMed. The method to automatically compute the metrics involved computing the number of all relevant citations (A), using National Library of Medicine indexing as the gold standard (“preferred term”[MH]), the number of citations retrieved by the added terms (”synonym 1“[TIAB] OR ”synonym 2“[TIAB] OR …) (B), and the number of relevant citations retrieved by the added terms (combining the previous two queries with an “AND” operator) (C). It was possible to programmatically compute the metrics for each strategy using each of the 28,313 MeSH descriptors as a “preferred term,” corresponding to 239,724 different queries built and sent to the PubMed application program interface. The four search strategies were ranked and compared for each metric.

Results

ATM had the worst performance for all three metrics among the four strategies. The MeSH strategy had the best mean precision (51%, SD 23%). The UMLS strategy had the best recall and F-measure (41%, SD 31% and 36%, SD 24%, respectively). CISMeF had the second best recall and F-measure (40%, SD 31% and 35%, SD 24%, respectively). However, considering a cutoff of 5%, CISMeF had better precision than UMLS for 1180 descriptors, better recall for 793 descriptors, and better F-measure for 678 descriptors.

Conclusions

This study highlights the importance of using semantic expansion strategies to improve information retrieval. However, the performances of a given strategy, relatively to another, varied greatly depending on the MeSH descriptor. These results confirm there is no ideal search strategy for all descriptors. Different semantic expansions should be used depending on the descriptor and the user’s objectives. Thus, we developed an interface that allows users to input a descriptor and then proposes the best semantic expansion to maximize the three main metrics (precision, recall, and F-measure).

Collapse

Leveraging synonymy and polysemy to improve semantic similarity assessments based on intrinsic information content. Artif Intell Rev 2020. [DOI: 10.1007/s10462-019-09725-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Zhu L, Zheng H. Biomedical event extraction with a novel combination strategy based on hybrid deep neural networks. BMC Bioinformatics 2020;21:47. [PMID: 32028883 PMCID: PMC7006190 DOI: 10.1186/s12859-020-3376-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2019] [Accepted: 01/20/2020] [Indexed: 11/10/2022] Open

Lashkari F, Bagheri E, Ghorbani AA. Neural embedding-based indices for semantic search. Inf Process Manag 2019. [DOI: 10.1016/j.ipm.2018.10.015] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Mura C, Draizen EJ, Bourne PE. Structural biology meets data science: does anything change? Curr Opin Struct Biol 2018;52:95-102. [PMID: 30267935 DOI: 10.1016/j.sbi.2018.09.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Revised: 08/31/2018] [Accepted: 09/07/2018] [Indexed: 01/22/2023]

Best Match: New relevance search for PubMed. PLoS Biol 2018;16:e2005343. [PMID: 30153250 PMCID: PMC6112631 DOI: 10.1371/journal.pbio.2005343] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Kim S, Yeganova L, Comeau DC, Wilbur WJ, Lu Z. PubMed Phrases, an open set of coherent phrases for searching biomedical literature. Sci Data 2018;5:180104. [PMID: 29893755 PMCID: PMC5996850 DOI: 10.1038/sdata.2018.104] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Accepted: 04/06/2018] [Indexed: 11/09/2022] Open