Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

15
(from Reference Citation Analysis)

Article PDFs (6)

Cited by > 0 (14)

Searched Name

Knowledge extraction

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Vithanage D, Yu P, Wang L, Deng C. Contextual Word Embedding for Biomedical Knowledge Extraction: a Rapid Review and Case Study. J Healthc Inform Res 2024;8:158-179. [PMID: 38273979 PMCID: PMC10805696 DOI: 10.1007/s41666-023-00157-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 11/27/2023] [Accepted: 12/09/2023] [Indexed: 01/27/2024]

Sarabi S, Han Q, de Vries B, Romme AGL, Almassy D. The Nature-Based Solutions Case-Based System: A hybrid expert system. J Environ Manage 2022;324:116413. [PMID: 36352717 DOI: 10.1016/j.jenvman.2022.116413] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 08/17/2022] [Accepted: 09/28/2022] [Indexed: 06/16/2023]

Alakent B, Kaya-Özkiper K, Soyer-Uzun S. Global interpretation and generalizability of boosted regression models for the prediction of methylene blue adsorption by different clay minerals and alkali activated materials. Chemosphere 2022;308:136248. [PMID: 36057344 DOI: 10.1016/j.chemosphere.2022.136248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 08/15/2022] [Accepted: 08/25/2022] [Indexed: 06/15/2023]

Abstract

In this study, Gradient Boosted Regression Trees is applied, for the first time, to predict governing factors for methylene blue (MB) adsorption on a variety of adsorbents involving clay minerals, such as kaolinite and sepiolite together with industrial wastes red mud and fly ash, and alkali activated materials synthesized from aforementioned raw materials. Dataset was constructed using electronic databases, such as ScienceDirect, Scopus, Elsevier, and Google, experimental studies published between 2005 and 2022 were covered. The final dataset included experimental conditions, such as adsorbent type, adsorbent properties (surface characteristics, density, and chemical modifications), pH of the medium, adsorbent dosage, and temperature; and it involved 914 datapoints, which were extracted out of 75 papers (out of ∼1360 initially screened). Among distinct parameters, initial adsorbate concentration was found to be the most dominant factor affecting the MB uptake. Concordantly, pH of the solution medium, raw material selection, and modification types were also found to be significant in MB adsorption. Results showed that in terms of raw material and modification types, sepiolite and chemical (acid and/or alkaline modification) and thermal treatments, respectively, come forward as the most powerful candidates for enhanced MB adsorption performance. Modifications applied on adsorbents should be evaluated separately, as there is no general rule applicable for all experimental conditions, and the strength of the contribution of modification type also depends on initial adsorbate concentration. Implementation of various imputation methods showed the importance of reporting experimental factors, such as surface area, in the literature. Range of applicability of the suggested modeling procedure was assessed to help experimenters in testing MB uptake under novel experimental conditions.

Collapse

Andreadis S, Antzoulatos G, Mavropoulos T, Giannakeris P, Tzionis G, Pantelidis N, Ioannidis K, Karakostas A, Gialampoukidis I, Vrochidis S, Kompatsiaris I. A social media analytics platform visualising the spread of COVID-19 in Italy via exploitation of automatically geotagged tweets. Online Soc Netw Media 2021;23:100134. [PMID: 36570037 PMCID: PMC9767437 DOI: 10.1016/j.osnem.2021.100134] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 03/31/2021] [Accepted: 04/03/2021] [Indexed: 12/27/2022]

Gong F, Chen Y, Wang H, Lu H. On building a diabetes centric knowledge base via mining the web. BMC Med Inform Decis Mak 2019;19:49. [PMID: 30961582 PMCID: PMC6454670 DOI: 10.1186/s12911-019-0771-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Abstract

Background

Diabetes has become one of the hot topics in life science researches. To support the analytical procedures, researchers and analysts expend a mass of labor cost to collect experimental data, which is also error-prone. To reduce the cost and to ensure the data quality, there is a growing trend of extracting clinical events in form of knowledge from electronic medical records (EMRs). To do so, we first need a high-coverage knowledge base (KB) of a specific disease to support the above extraction tasks called KB-based Extraction.

Methods

We propose an approach to build a diabetes-centric knowledge base (a.k.a. DKB) via mining the Web. In particular, we first extract knowledge from semi-structured contents of vertical portals, fuse individual knowledge from each site, and further map them to a unified KB. The target DKB is then extracted from the overall KB based on a distance-based Expectation-Maximization (EM) algorithm.

Results

During the experiments, we selected eight popular vertical portals in China as data sources to construct DKB. There are 7703 instances and 96,041 edges in the final diabetes KB covering diseases, symptoms, western medicines, traditional Chinese medicines, examinations, departments, and body structures. The accuracy of DKB is 95.91%. Besides the quality assessment of extracted knowledge from vertical portals, we also carried out detailed experiments for evaluating the knowledge fusion performance as well as the convergence of the distance-based EM algorithm with positive results.

Conclusions

In this paper, we introduced an approach to constructing DKB. A knowledge extraction and fusion pipeline was first used to extract semi-structured data from vertical portals and individual KBs were further fused into a unified knowledge base. After that, we develop a distance based Expectation Maximization algorithm to extract a subset from the overall knowledge base forming the target DKB. Experiments showed that the data in DKB are rich and of high-quality.

Collapse

Shi J, Zheng M, Yao L, Ge Y. Developing a healthcare dataset information resource (DIR) based on Semantic Web. BMC Med Genomics 2018;11:102. [PMID: 30453940 PMCID: PMC6245488 DOI: 10.1186/s12920-018-0411-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open

Abstract

BACKGROUND

The right dataset is essential to obtain the right insights in data science; therefore, it is important for data scientists to have a good understanding of the availability of relevant datasets as well as the content, structure, and existing analyses of these datasets. While a number of efforts are underway to integrate the large amount and variety of datasets, the lack of an information resource that focuses on specific needs of target users of datasets has existed as a problem for years. To address this gap, we have developed a Dataset Information Resource (DIR), using a user-oriented approach, which gathers relevant dataset knowledge for specific user types. In the present version, we specifically address the challenges of entry-level data scientists in learning to identify, understand, and analyze major datasets in healthcare. We emphasize that the DIR does not contain actual data from the datasets but aims to provide comprehensive knowledge about the datasets and their analyses.

METHODS

The DIR leverages Semantic Web technologies and the W3C Dataset Description Profile as the standard for knowledge integration and representation. To extract tailored knowledge for target users, we have developed methods for manual extractions from dataset documentations as well as semi-automatic extractions from related publications, using natural language processing (NLP)-based approaches. A semantic query component is available for knowledge retrieval, and a parameterized question-answering functionality is provided to facilitate the ease of search.

RESULTS

The DIR prototype is composed of four major components-dataset metadata and related knowledge, search modules, question answering for frequently-asked questions, and blogs. The current implementation includes information on 12 commonly used large and complex healthcare datasets. The initial usage evaluation based on health informatics novices indicates that the DIR is helpful and beginner-friendly.

CONCLUSIONS

We have developed a novel user-oriented DIR that provides dataset knowledge specialized for target user groups. Knowledge about datasets is effectively represented in the Semantic Web. At this initial stage, the DIR has already been able to provide sophisticated and relevant knowledge of 12 datasets to help entry health informacians learn healthcare data analysis using suitable datasets. Further development of both content and function levels is underway.

Collapse

Weitschek E, Lauro SD, Cappelli E, Bertolazzi P, Felici G. CamurWeb: a classification software and a large knowledge base for gene expression data of cancer. BMC Bioinformatics 2018;19:354. [PMID: 30367574 PMCID: PMC6191971 DOI: 10.1186/s12859-018-2299-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open

Abstract

BACKGROUND

The high growth of Next Generation Sequencing data currently demands new knowledge extraction methods. In particular, the RNA sequencing gene expression experimental technique stands out for case-control studies on cancer, which can be addressed with supervised machine learning techniques able to extract human interpretable models composed of genes, and their relation to the investigated disease. State of the art rule-based classifiers are designed to extract a single classification model, possibly composed of few relevant genes. Conversely, we aim to create a large knowledge base composed of many rule-based models, and thus determine which genes could be potentially involved in the analyzed tumor. This comprehensive and open access knowledge base is required to disseminate novel insights about cancer.

RESULTS

We propose CamurWeb, a new method and web-based software that is able to extract multiple and equivalent classification models in form of logic formulas ("if then" rules) and to create a knowledge base of these rules that can be queried and analyzed. The method is based on an iterative classification procedure and an adaptive feature elimination technique that enables the computation of many rule-based models related to the cancer under study. Additionally, CamurWeb includes a user friendly interface for running the software, querying the results, and managing the performed experiments. The user can create her profile, upload her gene expression data, run the classification analyses, and interpret the results with predefined queries. In order to validate the software we apply it to all public available RNA sequencing datasets from The Cancer Genome Atlas database obtaining a large open access knowledge base about cancer. CamurWeb is available at http://bioinformatics.iasi.cnr.it/camurweb .

CONCLUSIONS

The experiments prove the validity of CamurWeb, obtaining many classification models and thus several genes that are associated to 21 different cancer types. Finally, the comprehensive knowledge base about cancer and the software tool are released online; interested researchers have free access to them for further studies and to design biological experiments in cancer research.

Collapse

Michelini S, Balakrishnan B, Parolo S, Matone A, Mullaney JA, Young W, Gasser O, Wall C, Priami C, Lombardo R, Kussmann M. A reverse metabolic approach to weaning: in silico identification of immune-beneficial infant gut bacteria, mining their metabolism for prebiotic feeds and sourcing these feeds in the natural product space. Microbiome 2018;6:171. [PMID: 30241567 PMCID: PMC6151060 DOI: 10.1186/s40168-018-0545-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2018] [Accepted: 08/30/2018] [Indexed: 05/13/2023]

Lazzarini N, Bacardit J. RGIFE: a ranked guided iterative feature elimination heuristic for the identification of biomarkers. BMC Bioinformatics 2017;18:322. [PMID: 28666416 PMCID: PMC5493069 DOI: 10.1186/s12859-017-1729-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2016] [Accepted: 06/13/2017] [Indexed: 12/13/2022] Open

Cumbo F, Fiscon G, Ceri S, Masseroli M, Weitschek E. TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas. BMC Bioinformatics 2017;18:6. [PMID: 28049410 DOI: 10.1186/s12859-016-1419-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Accepted: 12/10/2016] [Indexed: 01/05/2023] Open

Rybinski M, Aldana-Montes JF. tESA: a distributional measure for calculating semantic relatedness. J Biomed Semantics 2016;7:67. [PMID: 28031037 PMCID: PMC5192592 DOI: 10.1186/s13326-016-0109-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2016] [Accepted: 11/13/2016] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Semantic relatedness is a measure that quantifies the strength of a semantic link between two concepts. Often, it can be efficiently approximated with methods that operate on words, which represent these concepts. Approximating semantic relatedness between texts and concepts represented by these texts is an important part of many text and knowledge processing tasks of crucial importance in the ever growing domain of biomedical informatics. The problem of most state-of-the-art methods for calculating semantic relatedness is their dependence on highly specialized, structured knowledge resources, which makes these methods poorly adaptable for many usage scenarios. On the other hand, the domain knowledge in the Life Sciences has become more and more accessible, but mostly in its unstructured form - as texts in large document collections, which makes its use more challenging for automated processing. In this paper we present tESA, an extension to a well known Explicit Semantic Relatedness (ESA) method.

RESULTS

In our extension we use two separate sets of vectors, corresponding to different sections of the articles from the underlying corpus of documents, as opposed to the original method, which only uses a single vector space. We present an evaluation of Life Sciences domain-focused applicability of both tESA and domain-adapted Explicit Semantic Analysis. The methods are tested against a set of standard benchmarks established for the evaluation of biomedical semantic relatedness quality. Our experiments show that the propsed method achieves results comparable with or superior to the current state-of-the-art methods. Additionally, a comparative discussion of the results obtained with tESA and ESA is presented, together with a study of the adaptability of the methods to different corpora and their performance with different input parameters.

CONCLUSIONS

Our findings suggest that combined use of the semantics from different sections (i.e. extending the original ESA methodology with the use of title vectors) of the documents of scientific corpora may be used to enhance the performance of a distributional semantic relatedness measures, which can be observed in the largest reference datasets. We also present the impact of the proposed extension on the size of distributional representations.

Collapse

Wang L, Bray BE, Shi J, Del Fiol G, Haug PJ. A method for the development of disease-specific reference standards vocabularies from textual biomedical literature resources. Artif Intell Med 2016;68:47-57. [PMID: 26971304 DOI: 10.1016/j.artmed.2016.02.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Revised: 02/22/2016] [Accepted: 02/25/2016] [Indexed: 10/22/2022]

Abstract

OBJECTIVE

Disease-specific vocabularies are fundamental to many knowledge-based intelligent systems and applications like text annotation, cohort selection, disease diagnostic modeling, and therapy recommendation. Reference standards are critical in the development and validation of automated methods for disease-specific vocabularies. The goal of the present study is to design and test a generalizable method for the development of vocabulary reference standards from expert-curated, disease-specific biomedical literature resources.

METHODS

We formed disease-specific corpora from literature resources like textbooks, evidence-based synthesized online sources, clinical practice guidelines, and journal articles. Medical experts annotated and adjudicated disease-specific terms in four classes (i.e., causes or risk factors, signs or symptoms, diagnostic tests or results, and treatment). Annotations were mapped to UMLS concepts. We assessed source variation, the contribution of each source to build disease-specific vocabularies, the saturation of the vocabularies with respect to the number of used sources, and the generalizability of the method with different diseases.

RESULTS

The study resulted in 2588 string-unique annotations for heart failure in four classes, and 193 and 425 respectively for pulmonary embolism and rheumatoid arthritis in treatment class. Approximately 80% of the annotations were mapped to UMLS concepts. The agreement among heart failure sources ranged between 0.28 and 0.46. The contribution of these sources to the final vocabulary ranged between 18% and 49%. With the sources explored, the heart failure vocabulary reached near saturation in all four classes with the inclusion of minimal six sources (or between four to seven sources if only counting terms occurred in two or more sources). It took fewer sources to reach near saturation for the other two diseases in terms of the treatment class.

CONCLUSIONS

We developed a method for the development of disease-specific reference vocabularies. Expert-curated biomedical literature resources are substantial for acquiring disease-specific medical knowledge. It is feasible to reach near saturation in a disease-specific vocabulary using a relatively small number of literature sources.

Collapse

Scharl A, Hubmann-Haidvogel A, Jones A, Fischl D, Kamolov R, Weichselbraun A, Rafelsberger W. Analyzing the public discourse on works of fiction - Detection and visualization of emotion in online coverage about HBO's Game of Thrones. Inf Process Manag 2016;52:129-138. [PMID: 27065510 PMCID: PMC4804387 DOI: 10.1016/j.ipm.2015.02.003] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Weichselbraun A, Gindl S, Scharl A. Enriching semantic knowledge bases for opinion mining in big data applications. Knowl Based Syst 2014;69:78-85. [PMID: 25431524 PMCID: PMC4235782 DOI: 10.1016/j.knosys.2014.04.039] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]

Sannino G, De Falco I, De Pietro G. Monitoring Obstructive Sleep Apnea by means of a real-time mobile system based on the automatic extraction of sets of rules through Differential Evolution. J Biomed Inform 2014;49:84-100. [PMID: 24632080 DOI: 10.1016/j.jbi.2014.02.015] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2013] [Revised: 02/04/2014] [Accepted: 02/28/2014] [Indexed: 10/25/2022]

Abstract

Real-time Obstructive Sleep Apnea (OSA) episode detection and monitoring are important for society in terms of an improvement in the health of the general population and of a reduction in mortality and healthcare costs. Currently, to diagnose OSA patients undergo PolySomnoGraphy (PSG), a complicated and invasive test to be performed in a specialized center involving many sensors and wires. Accordingly, each patient is required to stay in the same position throughout the duration of one night, thus restricting their movements. This paper proposes an easy, cheap, and portable approach for the monitoring of patients with OSA, which collects single-channel ElectroCardioGram (ECG) data only. It is easy to perform from the patient's point of view because only one wearable sensor is required, so the patient is not restricted to keeping the same position all night long, and the detection and monitoring can be carried out in any place through the use of a mobile device. Our approach is based on the automatic extraction, from a database containing information about the monitored patient, of explicit knowledge in the form of a set of IF…THEN rules containing typical parameters derived from Heart Rate Variability (HRV) analysis. The extraction is carried out off-line by means of a Differential Evolution algorithm. This set of rules can then be exploited in the real-time mobile monitoring system developed at our Laboratory: the ECG data is gathered by a wearable sensor and sent to a mobile device, where it is processed in real time. Subsequently, HRV-related parameters are computed from this data, and, if their values activate some of the rules describing the occurrence of OSA, an alarm is automatically produced. This approach has been tested on a well-known literature database of OSA patients. The numerical results show its effectiveness in terms of accuracy, sensitivity, and specificity, and the achieved sets of rules evidence the user-friendliness of the approach. Furthermore, the method is compared against other well known classifiers, and its discrimination ability is shown to be higher.

Collapse