Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Wang Y, Rastegar-Mojarad M, Komandur-Elayavilli R, Liu H. Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts. Database (Oxford) 2017;2017:bax091. [PMID: 31725862 PMCID: PMC7243926 DOI: 10.1093/database/bax091] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Revised: 10/17/2017] [Accepted: 11/14/2017] [Indexed: 11/16/2022]

For:	Wang Y, Rastegar-Mojarad M, Komandur-Elayavilli R, Liu H. Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts. Database (Oxford) 2017;2017:bax091. [PMID: 31725862 PMCID: PMC7243926 DOI: 10.1093/database/bax091] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Revised: 10/17/2017] [Accepted: 11/14/2017] [Indexed: 11/16/2022]

Number

Cited by Other Article(s)

Hughes LD, Tsueng G, DiGiovanna J, Horvath TD, Rasmussen LV, Savidge TC, Stoeger T, Turkarslan S, Wu Q, Wu C, Su AI, Pache L. Addressing barriers in FAIR data practices for biomedical data. Sci Data 2023;10:98. [PMID: 36823198 PMCID: PMC9950056 DOI: 10.1038/s41597-023-01969-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 01/13/2023] [Indexed: 02/25/2023] Open

Affiliation(s)

Laura D Hughes Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA.
Ginger Tsueng Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
Jack DiGiovanna Velsera, 529 Main St, Suite 6610, Charlestown, MA, 02129, USA
Thomas D Horvath Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, 77030, USA Texas Children's Microbiome Center, Department of Pathology, Texas Children's Hospital, Houston, TX, 77030, USA
Luke V Rasmussen Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
Tor C Savidge Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, 77030, USA Texas Children's Microbiome Center, Texas Children's Hospital, Houston, TX, 77030, USA
Thomas Stoeger Department of Chemical and Biological Engineering, McCormick School of Engineering, Evanston, IL, 60208, USA
Serdar Turkarslan Institute for Systems Biology, Seattle, WA, 98109, USA
Qinglong Wu Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, 77030, USA Texas Children's Microbiome Center, Texas Children's Hospital, Houston, TX, 77030, USA
Chunlei Wu Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA Scripps Research Translational Institute, La Jolla, CA, 92037, USA Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, 92037, USA
Andrew I Su Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA Scripps Research Translational Institute, La Jolla, CA, 92037, USA Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, 92037, USA
Lars Pache Infectious and Inflammatory Disease Center, Immunity and Pathogenesis Program, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, 92037, USA

Collapse

Tsueng G, Cano MAA, Bento J, Czech C, Kang M, Pache L, Rasmussen LV, Savidge TC, Starren J, Wu Q, Xin J, Yeaman MR, Zhou X, Su AI, Wu C, Brown L, Shabman RS, Hughes LD. Developing a standardized but extendable framework to increase the findability of infectious disease datasets. Sci Data 2023;10:99. [PMID: 36823157 PMCID: PMC9950378 DOI: 10.1038/s41597-023-01968-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 01/13/2023] [Indexed: 02/25/2023] Open

Affiliation(s)

Ginger Tsueng Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA.
Marco A Alvarado Cano Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
José Bento Department of Computer Science, Boston College, 245 Beacon St, Chestnut Hill, MA, 02467, USA
Candice Czech Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
Mengjia Kang Division of Pulmonary and Critical Care, Feinberg School of Medicine, Northwestern University, Chicago, IL, 60611, USA
Lars Pache Infectious and Inflammatory Disease Center, Immunity and Pathogenesis Program, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, 92037, USA
Luke V Rasmussen Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
Tor C Savidge Texas Children's Microbiome Center & Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, 77030, USA
Justin Starren Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
Qinglong Wu Texas Children's Microbiome Center & Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, 77030, USA
Jiwen Xin Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
Michael R Yeaman Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA Divisions of Molecular Medicine and Infectious Diseases, Harbor-UCLA Medical Center, Torrance, CA, 90502, USA Lundquist Institute for Infection & Immunity at Harbor-UCLA Medical Center, Torrance, CA, 90502, USA
Xinghua Zhou Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
Andrew I Su Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA Scripps Research Translational Institute, La Jolla, CA, 92037, USA Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, 92037, USA
Chunlei Wu Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA Scripps Research Translational Institute, La Jolla, CA, 92037, USA Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, 92037, USA
Liliana Brown Office of Genomics and Advanced Technologies, National Institute of Allergy and Infectious Diseases, Rockville, MD, 20852, USA
Reed S Shabman Office of Genomics and Advanced Technologies, National Institute of Allergy and Infectious Diseases, Rockville, MD, 20852, USA
Laura D Hughes Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA.

Collapse

Li X, Zhang Y, Jin J, Sun F, Li N, Liang S. A model of integrating convolution and BiGRU dual-channel mechanism for Chinese medical text classifications. PLoS One 2023;18:e0282824. [PMID: 36928266 PMCID: PMC10019650 DOI: 10.1371/journal.pone.0282824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 02/23/2023] [Indexed: 03/18/2023] Open

Zhang Z. An improved BM25 algorithm for clinical decision support in Precision Medicine based on co-word analysis and Cuckoo Search. BMC Med Inform Decis Mak 2021;21:81. [PMID: 33653325 PMCID: PMC7927407 DOI: 10.1186/s12911-021-01454-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Accepted: 02/23/2021] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Retrieving gene and disease information from a vast collection of biomedical abstracts to provide doctors with clinical decision support is one of the important research directions of Precision Medicine.

METHOD

We propose a novel article retrieval method based on expanded word and co-word analyses, also conducting Cuckoo Search to optimize parameters of the retrieval function. The main goal is to retrieve the abstracts of biomedical articles that refer to treatments. The methods mentioned in this manuscript adopt the BM25 algorithm to calculate the score of abstracts. We, however, propose an improved version of BM25 that computes the scores of expanded words and co-word leading to a composite retrieval function, which is then optimized using the Cuckoo Search. The proposed method aims to find both disease and gene information in the abstract of the same biomedical article. This is to achieve higher relevance and hence score of articles. Besides, we investigate the influence of different parameters on the retrieval algorithm and summarize how they meet various retrieval needs.

RESULTS

The data used in this manuscript is sourced from medical articles presented in Text Retrieval Conference (TREC): Clinical Decision Support (CDS) Tracks of 2017, 2018, and 2019 in Precision Medicine. A total of 120 topics are tested. Three indicators are employed for the comparison of utilized methods, which are selected among the ones based only on the BM25 algorithm and its improved version to conduct comparable experiments. The results showed that the proposed algorithm achieves better results.

CONCLUSION

The proposed method, an improved version of the BM25 algorithm, utilizes both co-word implementation and Cuckoo Search, which has been verified achieving better results on a large number of experimental sets. Besides, a relatively simple query expansion method is implemented in this manuscript. Future research will focus on ontology and semantic networks to expand the query vocabulary.

Collapse

Zhang L, Hu J, Xu Q, Li F, Rao G, Tao C. A semantic relationship mining method among disorders, genes, and drugs from different biomedical datasets. BMC Med Inform Decis Mak 2020;20:283. [PMID: 33317518 PMCID: PMC7734713 DOI: 10.1186/s12911-020-01274-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 09/22/2020] [Indexed: 12/16/2022] Open

Abstract

BACKGROUND

Semantic web technology has been applied widely in the biomedical informatics field. Large numbers of biomedical datasets are available online in the resource description framework (RDF) format. Semantic relationship mining among genes, disorders, and drugs is widely used in, for example, precision medicine and drug repositioning. However, most of the existing studies focused on a single dataset. It is not easy to find the most current relationships among disorder-gene-drug relationships since the relationships are distributed in heterogeneous datasets. How to mine their semantic relationships from different biomedical datasets is an important issue.

METHODS

First, a variety of biomedical datasets were converted into RDF triple data; then, multisource biomedical datasets were integrated into a storage system using a data integration algorithm. Second, nine query patterns among genes, disorders, and drugs from different biomedical datasets were designed. Third, the gene-disorder-drug semantic relationship mining algorithm is presented. This algorithm can query the relationships among various entities from different datasets.

RESULTS AND CONCLUSIONS

We focused on mining the putative and the most current disorder-gene-drug relationships about Parkinson's disease (PD). The results demonstrate that our method has significant advantages in mining and integrating multisource heterogeneous biomedical datasets. Twenty-five new relationships among the genes, disorders, and drugs were mined from four different datasets. The query results showed that most of them came from different datasets. The precision of the method increased by 2.51% compared to that of the multisource linked open data fusion method presented in the 4th International Workshop on Semantics-Powered Data Mining and Analytics (SEPDA 2019). Moreover, the number of query results increased by 7.7%, and the number of correct queries increased by 9.5%.

Collapse

Xu B, Lin H, Yang L, Xu K, Zhang Y, Zhang D, Yang Z, Wang J, Lin Y, Yin F. A supervised term ranking model for diversity enhanced biomedical information retrieval. BMC Bioinformatics 2019;20:590. [PMID: 31787087 PMCID: PMC6886246 DOI: 10.1186/s12859-019-3080-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open

Wang Y, Sohn S, Liu S, Shen F, Wang L, Atkinson EJ, Amin S, Liu H. A clinical text classification paradigm using weak supervision and deep representation. BMC Med Inform Decis Mak 2019;19:1. [PMID: 30616584 PMCID: PMC6322223 DOI: 10.1186/s12911-018-0723-6] [Citation(s) in RCA: 171] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Accepted: 12/10/2018] [Indexed: 01/02/2023] Open

Abstract

BACKGROUND

Automatic clinical text classification is a natural language processing (NLP) technology that unlocks information embedded in clinical narratives. Machine learning approaches have been shown to be effective for clinical text classification tasks. However, a successful machine learning model usually requires extensive human efforts to create labeled training data and conduct feature engineering. In this study, we propose a clinical text classification paradigm using weak supervision and deep representation to reduce these human efforts.

METHODS

We develop a rule-based NLP algorithm to automatically generate labels for the training data, and then use the pre-trained word embeddings as deep representation features for training machine learning models. Since machine learning is trained on labels generated by the automatic NLP algorithm, this training process is called weak supervision. We evaluat the paradigm effectiveness on two institutional case studies at Mayo Clinic: smoking status classification and proximal femur (hip) fracture classification, and one case study using a public dataset: the i2b2 2006 smoking status classification shared task. We test four widely used machine learning models, namely, Support Vector Machine (SVM), Random Forest (RF), Multilayer Perceptron Neural Networks (MLPNN), and Convolutional Neural Networks (CNN), using this paradigm. Precision, recall, and F1 score are used as metrics to evaluate performance.

RESULTS

CNN achieves the best performance in both institutional tasks (F1 score: 0.92 for Mayo Clinic smoking status classification and 0.97 for fracture classification). We show that word embeddings significantly outperform tf-idf and topic modeling features in the paradigm, and that CNN captures additional patterns from the weak supervision compared to the rule-based NLP algorithms. We also observe two drawbacks of the proposed paradigm that CNN is more sensitive to the size of training data, and that the proposed paradigm might not be effective for complex multiclass classification tasks.

CONCLUSION

The proposed clinical text classification paradigm could reduce human efforts of labeled training data creation and feature engineering for applying machine learning to clinical text classification by leveraging weak supervision and deep representation. The experimental experiments have validated the effectiveness of paradigm by two institutional and one shared clinical text classification tasks.

Collapse

MedSTS: a resource for clinical semantic textual similarity. LANG RESOUR EVAL 2018. [DOI: 10.1007/s10579-018-9431-1] [Citation(s) in RCA: 54] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]