Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Lee HJ, Shim SH, Song MR, Lee H, Park JC. CoMAGC: a corpus with multi-faceted annotations of gene-cancer relations. BMC Bioinformatics 2013;14:323. [PMID: 24225062 PMCID: PMC3833657 DOI: 10.1186/1471-2105-14-323] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2013] [Accepted: 11/05/2013] [Indexed: 11/10/2022] Open

For:	Lee HJ, Shim SH, Song MR, Lee H, Park JC. CoMAGC: a corpus with multi-faceted annotations of gene-cancer relations. BMC Bioinformatics 2013;14:323. [PMID: 24225062 PMCID: PMC3833657 DOI: 10.1186/1471-2105-14-323] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2013] [Accepted: 11/05/2013] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Irrera O, Marchesin S, Silvello G. MetaTron: advancing biomedical annotation empowering relation annotation and collaboration. BMC Bioinformatics 2024;25:112. [PMID: 38486137 PMCID: PMC10941452 DOI: 10.1186/s12859-024-05730-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 03/04/2024] [Indexed: 03/17/2024] Open

Abstract

BACKGROUND

The constant growth of biomedical data is accompanied by the need for new methodologies to effectively and efficiently extract machine-readable knowledge for training and testing purposes. A crucial aspect in this regard is creating large, often manually or semi-manually, annotated corpora vital for developing effective and efficient methods for tasks like relation extraction, topic recognition, and entity linking. However, manual annotation is expensive and time-consuming especially if not assisted by interactive, intuitive, and collaborative computer-aided tools. To support healthcare experts in the annotation process and foster annotated corpora creation, we present MetaTron. MetaTron is an open-source and free-to-use web-based annotation tool to annotate biomedical data interactively and collaboratively; it supports both mention-level and document-level annotations also integrating automatic built-in predictions. Moreover, MetaTron enables relation annotation with the support of ontologies, functionalities often overlooked by off-the-shelf annotation tools.

RESULTS

We conducted a qualitative analysis to compare MetaTron with a set of manual annotation tools including TeamTat, INCEpTION, LightTag, MedTAG, and brat, on three sets of criteria: technical, data, and functional. A quantitative evaluation allowed us to assess MetaTron performances in terms of time and number of clicks to annotate a set of documents. The results indicated that MetaTron fulfills almost all the selected criteria and achieves the best performances.

CONCLUSIONS

MetaTron stands out as one of the few annotation tools targeting the biomedical domain supporting the annotation of relations, and fully customizable with documents in several formats-PDF included, as well as abstracts retrieved from PubMed, Semantic Scholar, and OpenAIRE. To meet any user need, we released MetaTron both as an online instance and as a Docker image locally deployable.

Collapse

Marchesin S, Menotti L, Giachelle F, Silvello G, Alonso O. Building a large gene expression-cancer knowledge base with limited human annotations. Database (Oxford) 2023;2023:baad061. [PMID: 37768281 PMCID: PMC10533344 DOI: 10.1093/database/baad061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 07/27/2023] [Accepted: 08/25/2023] [Indexed: 09/29/2023]

Marchesin S, Silvello G. TBGA: a large-scale Gene-Disease Association dataset for Biomedical Relation Extraction. BMC Bioinformatics 2022;23:111. [PMID: 35361129 PMCID: PMC8973894 DOI: 10.1186/s12859-022-04646-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 03/22/2022] [Indexed: 01/12/2023] Open

Taha K, Davuluri R, Yoo P, Spencer J. Personizing the prediction of future susceptibility to a specific disease. PLoS One 2021;16:e0243127. [PMID: 33406077 PMCID: PMC7787538 DOI: 10.1371/journal.pone.0243127] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Accepted: 11/17/2020] [Indexed: 01/22/2023] Open

Abstract

A traceable biomarker is a member of a disease's molecular pathway. A disease may be associated with several molecular pathways. Each different combination of these molecular pathways, to which detected traceable biomarkers belong, may serve as an indicative of the elicitation of the disease at a different time frame in the future. Based on this notion, we introduce a novel methodology for personalizing an individual's degree of future susceptibility to a specific disease. We implemented the methodology in a working system called Susceptibility Degree to a Disease Predictor (SDDP). For a specific disease d, let S be the set of molecular pathways, to which traceable biomarkers detected from most patients of d belong. For the same disease d, let S' be the set of molecular pathways, to which traceable biomarkers detected from a certain individual belong. SDDP is able to infer the subset S'' ⊆{S-S'} of undetected molecular pathways for the individual. Thus, SDDP can infer undetected molecular pathways of a disease for an individual based on few molecular pathways detected from the individual. SDDP can also help in inferring the combination of molecular pathways in the set {S'+S''}, whose traceable biomarkers collectively is an indicative of the disease. SDDP is composed of the following four components: information extractor, interrelationship between molecular pathways modeler, logic inferencer, and risk indicator. The information extractor takes advantage of the exponential increase of biomedical literature to automatically extract the common traceable biomarkers for a specific disease. The interrelationship between molecular pathways modeler models the hierarchical interrelationships between the molecular pathways of the traceable biomarkers. The logic inferencer transforms the hierarchical interrelationships between the molecular pathways into rule-based specifications. It employs the specification rules and the inference rules for predicate logic to infer as many as possible undetected molecular pathways of a disease for an individual. The risk indicator outputs a risk indicator value that reflects the individual's degree of future susceptibility to the disease. We evaluated SDDP by comparing it experimentally with other methods. Results revealed marked improvement.

Collapse

Nicholson DN, Greene CS. Constructing knowledge graphs and their biomedical applications. Comput Struct Biotechnol J 2020;18:1414-1428. [PMID: 32637040 PMCID: PMC7327409 DOI: 10.1016/j.csbj.2020.05.017] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 05/22/2020] [Accepted: 05/23/2020] [Indexed: 12/31/2022] Open

Gu J, Sun F, Qian L, Zhou G. Chemical-induced disease relation extraction via attention-based distant supervision. BMC Bioinformatics 2019;20:403. [PMID: 31331263 PMCID: PMC6647285 DOI: 10.1186/s12859-019-2884-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 05/08/2019] [Indexed: 11/24/2022] Open

Bhasuran B, Natarajan J. Automatic extraction of gene-disease associations from literature using joint ensemble learning. PLoS One 2018;13:e0200699. [PMID: 30048465 PMCID: PMC6061985 DOI: 10.1371/journal.pone.0200699] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2018] [Accepted: 07/02/2018] [Indexed: 12/26/2022] Open

Gu J, Sun F, Qian L, Zhou G. Chemical-induced disease relation extraction via convolutional neural network. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2017;2017:3098440. [PMID: 28415073 PMCID: PMC5467558 DOI: 10.1093/database/bax024] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Accepted: 03/01/2017] [Indexed: 01/08/2023]

Chen CC, Ho CL. StemTextSearch: Stem cell gene database with evidence from abstracts. J Biomed Inform 2017;69:150-159. [PMID: 28315408 DOI: 10.1016/j.jbi.2017.03.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Revised: 03/08/2017] [Accepted: 03/10/2017] [Indexed: 11/29/2022]

Peng Y, Wei CH, Lu Z. Improving chemical disease relation extraction with rich features and weakly labeled data. J Cheminform 2016;8:53. [PMID: 28316651 PMCID: PMC5054544 DOI: 10.1186/s13321-016-0165-z] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Accepted: 09/28/2016] [Indexed: 01/08/2023] Open

Abstract

Background

Due to the importance of identifying relations between chemicals and diseases for new drug discovery and improving chemical safety, there has been a growing interest in developing automatic relation extraction systems for capturing these relations from the rich and rapid-growing biomedical literature. In this work we aim to build on current advances in named entity recognition and a recent BioCreative effort to further improve the state of the art in biomedical relation extraction, in particular for the chemical-induced disease (CID) relations.

Results

We propose a rich-feature approach with Support Vector Machine to aid in the extraction of CIDs from PubMed articles. Our feature vector includes novel statistical features, linguistic knowledge, and domain resources. We also incorporate the output of a rule-based system as features, thus combining the advantages of rule- and machine learning-based systems. Furthermore, we augment our approach with automatically generated labeled text from an existing knowledge base to improve performance without additional cost for corpus construction. To evaluate our system, we perform experiments on the human-annotated BioCreative V benchmarking dataset and compare with previous results. When trained using only BioCreative V training and development sets, our system achieves an F-score of 57.51 %, which already compares favorably to previous methods. Our system performance was further improved to 61.01 % in F-score when augmented with additional automatically generated weakly labeled data.

Conclusions

Our text-mining approach demonstrates state-of-the-art performance in disease-chemical relation extraction. More importantly, this work exemplifies the use of (freely available) curated document-level annotations in existing biomedical databases, which are largely overlooked in text-mining system development.

Collapse

Verspoor KM, Heo GE, Kang KY, Song M. Establishing a baseline for literature mining human genetic variants and their relationships to disease cohorts. BMC Med Inform Decis Mak 2016;16 Suppl 1:68. [PMID: 27454860 PMCID: PMC4959367 DOI: 10.1186/s12911-016-0294-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The Variome corpus, a small collection of published articles about inherited colorectal cancer, includes annotations of 11 entity types and 13 relation types related to the curation of the relationship between genetic variation and disease. Due to the richness of these annotations, the corpus provides a good testbed for evaluation of biomedical literature information extraction systems.

METHODS

In this paper, we focus on assessing performance on extracting the relations in the corpus, using gold standard entities as a starting point, to establish a baseline for extraction of relations important for extraction of genetic variant information from the literature. We test the application of the Public Knowledge Discovery Engine for Java (PKDE4J) system, a natural language processing system designed for information extraction of entities and relations in text, on the relation extraction task using this corpus.

RESULTS

For the relations which are attested at least 100 times in the Variome corpus, we realise a performance ranging from 0.78-0.84 Precision-weighted F-score, depending on the relation. We find that the PKDE4J system adapted straightforwardly to the range of relation types represented in the corpus; some extensions to the original methodology were required to adapt to the multi-relational classification context. The results are competitive with state-of-the-art relation extraction performance on more heavily studied corpora, although the analysis shows that the Recall of a co-occurrence baseline outweighs the benefit of improved Precision for many relations, indicating the value of simple semantic constraints on relations.

CONCLUSIONS

This work represents the first attempt to apply relation extraction methods to the Variome corpus. The results demonstrate that automated methods have good potential to structure the information expressed in the published literature related to genetic variants, connecting mutations to genes, diseases, and patient cohorts. Further development of such approaches will facilitate more efficient biocuration of genetic variant information into structured databases, leveraging the knowledge embedded in the vast publication literature.

Collapse

Song M, Kim WC, Lee D, Heo GE, Kang KY. PKDE4J: Entity and relation extraction for public knowledge discovery. J Biomed Inform 2015;57:320-32. [PMID: 26277115 DOI: 10.1016/j.jbi.2015.08.008] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2015] [Revised: 07/20/2015] [Accepted: 08/06/2015] [Indexed: 11/18/2022]

Lee HJ, Dang TC, Lee H, Park JC. OncoSearch: cancer gene search engine with literature evidence. Nucleic Acids Res 2014;42:W416-21. [PMID: 24813447 PMCID: PMC4086113 DOI: 10.1093/nar/gku368] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open