1
|
Lee J, Lee D, Lee KH. Literature mining for context-specific molecular relations using multimodal representations (COMMODAR). BMC Bioinformatics 2020; 21:250. [PMID: 33106154 PMCID: PMC7586695 DOI: 10.1186/s12859-020-3396-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Accepted: 02/06/2020] [Indexed: 01/14/2023] Open
Abstract
Biological contextual information helps understand various phenomena occurring in the biological systems consisting of complex molecular relations. The construction of context-specific relational resources vastly relies on laborious manual extraction from unstructured literature. In this paper, we propose COMMODAR, a machine learning-based literature mining framework for context-specific molecular relations using multimodal representations. The main idea of COMMODAR is the feature augmentation by the cooperation of multimodal representations for relation extraction. We leveraged biomedical domain knowledge as well as canonical linguistic information for more comprehensive representations of textual sources. The models based on multiple modalities outperformed those solely based on the linguistic modality. We applied COMMODAR to the 14 million PubMed abstracts and extracted 9214 context-specific molecular relations. All corpora, extracted data, evaluation results, and the implementation code are downloadable at https://github.com/jae-hyun-lee/commodar . CCS CONCEPTS: • Computing methodologies~Information extraction • Computing methodologies~Neural networks • Applied computing~Biological networks.
Collapse
Affiliation(s)
- Jaehyun Lee
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
| | - Doheon Lee
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea. .,Bio-Synergy Research Center, Daejeon, South Korea.
| | - Kwang Hyung Lee
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea.
| |
Collapse
|
2
|
Kim YH, Song M. A context-based ABC model for literature-based discovery. PLoS One 2019; 14:e0215313. [PMID: 31017923 PMCID: PMC6481912 DOI: 10.1371/journal.pone.0215313] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 03/29/2019] [Indexed: 12/13/2022] Open
Abstract
Background In the literature-based discovery, considerable research has been done based on the ABC model developed by Swanson. ABC model hypothesizes that there is a meaningful relation between entity A extracted from document set 1 and entity C extracted from document set 2 through B entities that appear commonly in both document sets. The results of ABC model are relations among entity A, B, and C, which is referred as paths. A path allows for hypothesizing the relationship between entity A and entity C, or helps discover entity B as a new evidence for the relationship between entity A and entity C. The co-occurrence based approach of ABC model is a well-known approach to automatic hypothesis generation by creating various paths. However, the co-occurrence based ABC model has a limitation, in that biological context is not considered. It focuses only on matching of B entity which commonly appears in relation between two entities. Therefore, the paths extracted by the co-occurrence based ABC model tend to include a lot of irrelevant paths, meaning that expert verification is essential. Methods In order to overcome this limitation of the co-occurrence based ABC model, we propose a context-based approach to connecting one entity relation to another, modifying the ABC model using biological contexts. In this study, we defined four biological context elements: cell, drug, disease, and organism. Based on these biological context, we propose two extended ABC models: a context-based ABC model and a context-assignment-based ABC model. In order to measure the performance of the both proposed models, we examined the relevance of the B entities between the well-known relations “APOE–MAPT” as well as “FUS–TARDBP”. Each relation means interaction between neurodegenerative disease associated with proteins. The interaction between APOE and MAPT is known to play a crucial role in Alzheimer’s disease as APOE affects tau-mediated neurodegeneration. It has been shown that mutation in FUS and TARDBP are associated with amyotrophic lateral sclerosis(ALS), a motor neuron disease by leading to neuronal cell death. Using these two relations, we compared both of proposed models to co-occurrence based ABC model. Results The precision of B entities by co-occurrence based ABC model was 27.1% for “APOE–MAPT” and 22.1% for “FUS–TARDBP”, respectively. In context-based ABC model, precision of extracted B entities was 71.4% for “APOE–MAPT”, and 77.9% for “FUS–TARDBP”. Context-assignment based ABC model achieved 89% and 97.5% precision for the two relations, respectively. Both proposed models achieved a higher precision than co-occurrence-based ABC model.
Collapse
Affiliation(s)
- Yong Hwan Kim
- Division of Humanities, CheongJu University, CheongJu, Korea
| | - Min Song
- Department of Library and Information Science, Yonsei University, Seoul, Korea
- * E-mail:
| |
Collapse
|
3
|
Automated extraction of potential migraine biomarkers using a semantic graph. J Biomed Inform 2017; 71:178-189. [PMID: 28579531 DOI: 10.1016/j.jbi.2017.05.018] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2016] [Revised: 04/03/2017] [Accepted: 05/23/2017] [Indexed: 01/20/2023]
Abstract
PROBLEM Biomedical literature and databases contain important clues for the identification of potential disease biomarkers. However, searching these enormous knowledge reservoirs and integrating findings across heterogeneous sources is costly and difficult. Here we demonstrate how semantically integrated knowledge, extracted from biomedical literature and structured databases, can be used to automatically identify potential migraine biomarkers. METHOD We used a knowledge graph containing more than 3.5 million biomedical concepts and 68.4 million relationships. Biochemical compound concepts were filtered and ranked by their potential as biomarkers based on their connections to a subgraph of migraine-related concepts. The ranked results were evaluated against the results of a systematic literature review that was performed manually by migraine researchers. Weight points were assigned to these reference compounds to indicate their relative importance. RESULTS Ranked results automatically generated by the knowledge graph were highly consistent with results from the manual literature review. Out of 222 reference compounds, 163 (73%) ranked in the top 2000, with 547 out of the 644 (85%) weight points assigned to the reference compounds. For reference compounds that were not in the top of the list, an extensive error analysis has been performed. When evaluating the overall performance, we obtained a ROC-AUC of 0.974. DISCUSSION Semantic knowledge graphs composed of information integrated from multiple and varying sources can assist researchers in identifying potential disease biomarkers.
Collapse
|
4
|
Yu H, Choo S, Park J, Jung J, Kang Y, Lee D. Prediction of drugs having opposite effects on disease genes in a directed network. BMC SYSTEMS BIOLOGY 2016; 10 Suppl 1:2. [PMID: 26818006 PMCID: PMC4895308 DOI: 10.1186/s12918-015-0243-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Background Developing novel uses of approved drugs, called drug repositioning, can reduce costs and times in traditional drug development. Network-based approaches have presented promising results in this field. However, even though various types of interactions such as activation or inhibition exist in drug-target interactions and molecular pathways, most of previous network-based studies disregarded this information. Methods We developed a novel computational method, Prediction of Drugs having Opposite effects on Disease genes (PDOD), for identifying drugs having opposite effects on altered states of disease genes. PDOD utilized drug-drug target interactions with ‘effect type’, an integrated directed molecular network with ‘effect type’ and ‘effect direction’, and disease genes with regulated states in disease patients. With this information, we proposed a scoring function to discover drugs likely to restore altered states of disease genes using the path from a drug to a disease through the drug-drug target interactions, shortest paths from drug targets to disease genes in molecular pathways, and disease gene-disease associations. Results We collected drug-drug target interactions, molecular pathways, and disease genes with their regulated states in the diseases. PDOD is applied to 898 drugs with known drug-drug target interactions and nine diseases. We compared performance of PDOD for predicting known therapeutic drug-disease associations with the previous methods. PDOD outperformed other previous approaches which do not exploit directional information in molecular network. In addition, we provide a simple web service that researchers can submit genes of interest with their altered states and will obtain drugs seeming to have opposite effects on altered states of input genes at http://gto.kaist.ac.kr/pdod/index.php/main. Conclusions Our results showed that ‘effect type’ and ‘effect direction’ information in the network based approaches can be utilized to identify drugs having opposite effects on diseases. Our study can offer a novel insight into the field of network-based drug repositioning. Electronic supplementary material The online version of this article (doi:10.1186/s12918-015-0243-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hasun Yu
- Department of Bio and Brain Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, 305-701, Republic of Korea. .,Bio-Synergy Research Center, 291 Daehak-ro, Yuseong-gu, Daejeon, 305-701, Republic of Korea.
| | - Sungji Choo
- Department of Bio and Brain Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, 305-701, Republic of Korea. .,Bio-Synergy Research Center, 291 Daehak-ro, Yuseong-gu, Daejeon, 305-701, Republic of Korea.
| | - Junseok Park
- Department of Bio and Brain Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, 305-701, Republic of Korea. .,Bio-Synergy Research Center, 291 Daehak-ro, Yuseong-gu, Daejeon, 305-701, Republic of Korea.
| | - Jinmyung Jung
- Department of Bio and Brain Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, 305-701, Republic of Korea. .,Bio-Synergy Research Center, 291 Daehak-ro, Yuseong-gu, Daejeon, 305-701, Republic of Korea.
| | - Yeeok Kang
- Department of Bio and Brain Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, 305-701, Republic of Korea. .,Bio-Synergy Research Center, 291 Daehak-ro, Yuseong-gu, Daejeon, 305-701, Republic of Korea.
| | - Doheon Lee
- Department of Bio and Brain Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, 305-701, Republic of Korea. .,Bio-Synergy Research Center, 291 Daehak-ro, Yuseong-gu, Daejeon, 305-701, Republic of Korea.
| |
Collapse
|