1
|
Sikirzhytskaya A, Tyagin I, Sutton SS, Wyatt MD, Safro I, Shtutman M. AI-based mining of biomedical literature: Applications for drug repurposing for the treatment of dementia. RESEARCH SQUARE 2024:rs.3.rs-4750719. [PMID: 39184100 PMCID: PMC11343300 DOI: 10.21203/rs.3.rs-4750719/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Neurodegenerative pathologies such as Alzheimer's disease, Parkinson's disease, Huntington's disease, Amyotrophic lateral sclerosis, Multiple sclerosis, HIV-associated neurocognitive disorder, and others significantly affect individuals, their families, caregivers, and healthcare systems. While there are no cures yet, researchers worldwide are actively working on the development of novel treatments that have the potential to slow disease progression, alleviate symptoms, and ultimately improve the overall health of patients. Huge volumes of new scientific information necessitate new analytical approaches for meaningful hypothesis generation. To enable the automatic analysis of biomedical data we introduced AGATHA, an effective AI-based literature mining tool that can navigate massive scientific literature databases, such as PubMed. The overarching goal of this effort is to adapt AGATHA for drug repurposing by revealing hidden connections between FDA-approved medications and a health condition of interest. Our tool converts the abstracts of peer-reviewed papers from PubMed into multidimensional space where each gene and health condition are represented by specific metrics. We implemented advanced statistical analysis to reveal distinct clusters of scientific terms within the virtual space created using AGATHA-calculated parameters for selected health conditions and genes. Partial Least Squares Discriminant Analysis was employed for categorizing and predicting samples (122 diseases and 20889 genes) fitted to specific classes. Advanced statistics were employed to build a discrimination model and extract lists of genes specific to each disease class. Here we focus on drugs that can be repurposed for dementia treatment as an outcome of neurodegenerative diseases. Therefore, we determined dementia-associated genes statistically highly ranked in other disease classes. Additionally, we report a mechanism for detecting genes common to multiple health conditions. These sets of genes were classified based on their presence in biological pathways, aiding in selecting candidates and biological processes that are exploitable with drug repurposing.
Collapse
|
2
|
Sikirzhytskaya A, Tyagin I, Sutton SS, Wyatt MD, Safro I, Shtutman M. AI-based mining of biomedical literature: Applications for drug repurposing for the treatment of dementia. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.06.597745. [PMID: 38895485 PMCID: PMC11185689 DOI: 10.1101/2024.06.06.597745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Neurodegenerative pathologies such as Alzheimer's disease, Parkinson's disease, Huntington's disease, Amyotrophic lateral sclerosis, Multiple sclerosis, HIV-associated neurocognitive disorder, and others significantly affect individuals, their families, caregivers, and healthcare systems. While there are no cures yet, researchers worldwide are actively working on the development of novel treatments that have the potential to slow disease progression, alleviate symptoms, and ultimately improve the overall health of patients. Huge volumes of new scientific information necessitate new analytical approaches for meaningful hypothesis generation. To enable the automatic analysis of biomedical data we introduced AGATHA, an effective AI-based literature mining tool that can navigate massive scientific literature databases, such as PubMed. The overarching goal of this effort is to adapt AGATHA for drug repurposing by revealing hidden connections between FDA-approved medications and a health condition of interest. Our tool converts the abstracts of peer-reviewed papers from PubMed into multidimensional space where each gene and health condition are represented by specific metrics. We implemented advanced statistical analysis to reveal distinct clusters of scientific terms within the virtual space created using AGATHA-calculated parameters for selected health conditions and genes. Partial Least Squares Discriminant Analysis was employed for categorizing and predicting samples (122 diseases and 20889 genes) fitted to specific classes. Advanced statistics were employed to build a discrimination model and extract lists of genes specific to each disease class. Here we focus on drugs that can be repurposed for dementia treatment as an outcome of neurodegenerative diseases. Therefore, we determined dementia-associated genes statistically highly ranked in other disease classes. Additionally, we report a mechanism for detecting genes common to multiple health conditions. These sets of genes were classified based on their presence in biological pathways, aiding in selecting candidates and biological processes that are exploitable with drug repurposing. Author Summary This manuscript outlines our project involving the application of AGATHA, an AI-based literature mining tool, to discover drugs with the potential for repurposing in the context of neurocognitive disorders. The primary objective is to identify connections between approved medications and specific health conditions through advanced statistical analysis, including techniques like Partial Least Squares Discriminant Analysis (PLSDA) and unsupervised clustering. The methodology involves grouping scientific terms related to different health conditions and genes, followed by building discrimination models to extract lists of disease-specific genes. These genes are then analyzed through pathway analysis to select candidates for drug repurposing.
Collapse
|
3
|
Nicholson DN, Himmelstein DS, Greene CS. Expanding a database-derived biomedical knowledge graph via multi-relation extraction from biomedical abstracts. BioData Min 2022; 15:26. [PMID: 36258252 PMCID: PMC9578183 DOI: 10.1186/s13040-022-00311-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2022] [Accepted: 09/17/2022] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Knowledge graphs support biomedical research efforts by providing contextual information for biomedical entities, constructing networks, and supporting the interpretation of high-throughput analyses. These databases are populated via manual curation, which is challenging to scale with an exponentially rising publication rate. Data programming is a paradigm that circumvents this arduous manual process by combining databases with simple rules and heuristics written as label functions, which are programs designed to annotate textual data automatically. Unfortunately, writing a useful label function requires substantial error analysis and is a nontrivial task that takes multiple days per function. This bottleneck makes populating a knowledge graph with multiple nodes and edge types practically infeasible. Thus, we sought to accelerate the label function creation process by evaluating how label functions can be re-used across multiple edge types. RESULTS We obtained entity-tagged abstracts and subsetted these entities to only contain compounds, genes, and disease mentions. We extracted sentences containing co-mentions of certain biomedical entities contained in a previously described knowledge graph, Hetionet v1. We trained a baseline model that used database-only label functions and then used a sampling approach to measure how well adding edge-specific or edge-mismatch label function combinations improved over our baseline. Next, we trained a discriminator model to detect sentences that indicated a biomedical relationship and then estimated the number of edge types that could be recalled and added to Hetionet v1. We found that adding edge-mismatch label functions rarely improved relationship extraction, while control edge-specific label functions did. There were two exceptions to this trend, Compound-binds-Gene and Gene-interacts-Gene, which both indicated physical relationships and showed signs of transferability. Across the scenarios tested, discriminative model performance strongly depends on generated annotations. Using the best discriminative model for each edge type, we recalled close to 30% of established edges within Hetionet v1. CONCLUSIONS Our results show that this framework can incorporate novel edges into our source knowledge graph. However, results with label function transfer were mixed. Only label functions describing very similar edge types supported improved performance when transferred. We expect that the continued development of this strategy may provide essential building blocks to populating biomedical knowledge graphs with discoveries, ensuring that these resources include cutting-edge results.
Collapse
Affiliation(s)
- David N. Nicholson
- grid.25879.310000 0004 1936 8972Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA USA
| | - Daniel S. Himmelstein
- grid.25879.310000 0004 1936 8972Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA USA
| | - Casey S. Greene
- grid.430503.10000 0001 0703 675XDepartment of Biomedical Informatics, University of Colorado School of Medicine and Center for Health Artificial Intellegence (CHAI), University of Colorado School of Medicine, Aurora, USA
| |
Collapse
|
4
|
Alshahrani M, Almansour A, Alkhaldi A, Thafar MA, Uludag M, Essack M, Hoehndorf R. Combining biomedical knowledge graphs and text to improve predictions for drug-target interactions and drug-indications. PeerJ 2022; 10:e13061. [PMID: 35402106 PMCID: PMC8988936 DOI: 10.7717/peerj.13061] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 02/13/2022] [Indexed: 01/11/2023] Open
Abstract
Biomedical knowledge is represented in structured databases and published in biomedical literature, and different computational approaches have been developed to exploit each type of information in predictive models. However, the information in structured databases and literature is often complementary. We developed a machine learning method that combines information from literature and databases to predict drug targets and indications. To effectively utilize information in published literature, we integrate knowledge graphs and published literature using named entity recognition and normalization before applying a machine learning model that utilizes the combination of graph and literature. We then use supervised machine learning to show the effects of combining features from biomedical knowledge and published literature on the prediction of drug targets and drug indications. We demonstrate that our approach using datasets for drug-target interactions and drug indications is scalable to large graphs and can be used to improve the ranking of targets and indications by exploiting features from either structure or unstructured information alone.
Collapse
Affiliation(s)
- Mona Alshahrani
- National Center for Artificial Intelligence (NCAI), Saudi Data and Artificial Intelligence Authority (SDAIA), Riyadh, Saudi Arabia
| | - Abdullah Almansour
- National Center for Artificial Intelligence (NCAI), Saudi Data and Artificial Intelligence Authority (SDAIA), Riyadh, Saudi Arabia
| | - Asma Alkhaldi
- National Center for Artificial Intelligence (NCAI), Saudi Data and Artificial Intelligence Authority (SDAIA), Riyadh, Saudi Arabia
| | - Maha A. Thafar
- College of Computers and Information Technology, Taif University, Taif, Saudi Arabia,Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Mahmut Uludag
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Magbubah Essack
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| |
Collapse
|
5
|
Allahgholi M, Rahmani H, Javdani D, Sadeghi-Adl Z, Bender A, Módos D, Weiss G. DDREL: From drug-drug relationships to drug repurposing. INTELL DATA ANAL 2022. [DOI: 10.3233/ida-215745] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Analyzing the relationships among various drugs is an essential issue in the field of computational biology. Different kinds of informative knowledge, such as drug repurposing, can be extracted from drug-drug relationships. Scientific literature represents a rich source for the retrieval of knowledge about the relationships between biological concepts, mainly drug-drug, disease-disease, and drug-disease relationships. In this paper, we propose DDREL as a general-purpose method that applies deep learning on scientific literature to automatically extract the graph of syntactic and semantic relationships among drugs. DDREL remarkably outperforms the existing human drug network method and a random network respected to average similarities of drugs’ anatomical therapeutic chemical (ATC) codes. DDREL is able to shed light on the existing deficiency of the ATC codes in various drug groups. From the DDREL graph, the history of drug discovery became visible. In addition, drugs that had repurposing score 1 (diflunisal, pargyline, fenofibrate, guanfacine, chlorzoxazone, doxazosin, oxymetholone, azathioprine, drotaverine, demecarium, omifensine, yohimbine) were already used in additional indication. The proposed DDREL method justifies the predictive power of textual data in PubMed abstracts. DDREL shows that such data can be used to 1- Predict repurposing drugs with high accuracy, and 2- Reveal existing deficiencies of the ATC codes in various drug groups.
Collapse
Affiliation(s)
- Milad Allahgholi
- School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
| | - Hossein Rahmani
- School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
| | - Delaram Javdani
- School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
| | - Zahra Sadeghi-Adl
- School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Dezsö Módos
- Quadram Institute Bioscience, Norwich Research Park, Norwich, Norfolk, UK
- Earlham Institute, Norwich Research Park, Norwich, Norfolk, UK
| | - Gerhard Weiss
- Department of Data Science and Knowledge Engineering (DKE), Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
6
|
Rodriguez-Esteban R. The speed of information propagation in the scientific network distorts biomedical research. PeerJ 2022; 10:e12764. [PMID: 35070506 PMCID: PMC8759377 DOI: 10.7717/peerj.12764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 12/17/2021] [Indexed: 01/07/2023] Open
Abstract
Delays in the propagation of scientific discoveries across scientific communities have been an oft-maligned feature of scientific research for introducing a bias towards knowledge that is produced within a scientist's closest community. The vastness of the scientific literature has been commonly blamed for this phenomenon, despite recent improvements in information retrieval and text mining. Its actual negative impact on scientific progress, however, has never been quantified. This analysis attempts to do so by exploring its effects on biomedical discovery, particularly in the discovery of relations between diseases, genes and chemical compounds. Results indicate that the probability that two scientific facts will enable the discovery of a new fact depends on how far apart these two facts were originally within the scientific landscape. In particular, the probability decreases exponentially with the citation distance. Thus, the direction of scientific progress is distorted based on the location in which each scientific fact is published, representing a path-dependent bias in which originally closely-located discoveries drive the sequence of future discoveries. To counter this bias, scientists should open the scope of their scientific work with modern information retrieval and extraction approaches.
Collapse
Affiliation(s)
- Raul Rodriguez-Esteban
- Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, Basel, Switzerland
| |
Collapse
|
7
|
Gopal J, Prakash Sinnarasan VS, Venkatesan A. Identification of Repurpose Drugs by Computational Analysis of Disease-Gene-Drug Associations. J Comput Biol 2021; 28:975-984. [PMID: 34242526 DOI: 10.1089/cmb.2020.0356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Repurposing of marketed drugs to find new indications has become an alternative to circumvent the risk of traditional drug development by its productivity quality. Despite many approaches, computational analysis has great potential to fuel the development of all-rounder drugs to find new classes of medicine for neglected and rare disease. The genes that can explain variations in drug response associated to disease are more important and significant in drug therapeutics necessitate elucidating the relationships of a gene, drug, and disease. The proposed computational analysis facilitates the discovery of knowledge on both target and disease-based relationships from large sources of biomedical literature spread over different platforms. It uses the utility of text mining for automatic extraction of valuable aggregated biomedical entities (disease, gene, and drug) from PubMed to serves as an input to the analysis of association prediction. The top-ranked associations considered for identification of repurposing drugs and also the hidden associations identified using concurrence principle to extrapolate the new relationships. Such findings are reported as novel and contribute to the knowledge base for pharmacogenomics, would immensely support the discovery and progress of novel therapeutic pathways and patient segment biomarkers.
Collapse
Affiliation(s)
- Jeyakodi Gopal
- Centre for Bioinformatics, School of Life Sciences, Pondicherry University, Puducherry, India
| | | | - Amouda Venkatesan
- Centre for Bioinformatics, School of Life Sciences, Pondicherry University, Puducherry, India
| |
Collapse
|
8
|
Henry S, Wijesinghe DS, Myers A, McInnes BT. Using Literature Based Discovery to Gain Insights Into the Metabolomic Processes of Cardiac Arrest. Front Res Metr Anal 2021; 6:644728. [PMID: 34250435 PMCID: PMC8267364 DOI: 10.3389/frma.2021.644728] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 05/07/2021] [Indexed: 12/19/2022] Open
Abstract
In this paper, we describe how we applied LBD techniques to discover lecithin cholesterol acyltransferase (LCAT) as a druggable target for cardiac arrest. We fully describe our process which includes the use of high-throughput metabolomic analysis to identify metabolites significantly related to cardiac arrest, and how we used LBD to gain insights into how these metabolites relate to cardiac arrest. These insights lead to our proposal (for the first time) of LCAT as a druggable target; the effects of which are supported by in vivo studies which were brought forth by this work. Metabolites are the end product of many biochemical pathways within the human body. Observed changes in metabolite levels are indicative of changes in these pathways, and provide valuable insights toward the cause, progression, and treatment of diseases. Following cardiac arrest, we observed changes in metabolite levels pre- and post-resuscitation. We used LBD to help discover diseases implicitly linked via these metabolites of interest. Results of LBD indicated a strong link between Fish Eye disease and cardiac arrest. Since fish eye disease is characterized by an LCAT deficiency, it began an investigation into the effects of LCAT and cardiac arrest survival. In the investigation, we found that decreased LCAT activity may increase cardiac arrest survival rates by increasing ω-3 polyunsaturated fatty acid availability in circulation. We verified the effects of ω-3 polyunsaturated fatty acids on increasing survival rate following cardiac arrest via in vivo with rat models.
Collapse
Affiliation(s)
- Sam Henry
- Department of Physics, Computer Science and Engineering, Christopher Newport University, Newport News, VA, United States
| | - D. Shanaka Wijesinghe
- Department of Pharmacotherapy and Outcomes Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Aidan Myers
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Bridget T. McInnes
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| |
Collapse
|
9
|
Text Mining Gene Selection to Understand Pathological Phenotype Using Biological Big Data. Bioinformatics 2021. [DOI: 10.36255/exonpublications.bioinformatics.2021.ch1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] Open
|
10
|
|
11
|
Anwar A, Khan NA, Siddiqui R. Repurposing of Drugs Is a Viable Approach to Develop Therapeutic Strategies against Central Nervous System Related Pathogenic Amoebae. ACS Chem Neurosci 2020; 11:2378-2384. [PMID: 32073257 DOI: 10.1021/acschemneuro.9b00613] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Brain-eating amoebae including Acanthamoeba spp., Naegleria fowleri, and Balamuthia mandrillaris cause rare infections of the central nervous system that almost always result in death. The high mortality rate, lack of interest for drug development from pharmaceutical industries, and no available effective drugs present an alarming challenge. The current drugs employed in the management and therapy of these devastating diseases are amphotericin B, miltefosine, chlorhexidine, pentamidine, and voriconazole which are generally used in combination. However, clinical evidence shows that these drugs have limited efficacy and high host cell cytotoxicity. Repurposing of drugs is a practical approach to utilize commercially available, U.S. Food and Drug Administration approved drugs for one disease against rare diseases caused by brain-eating amoebae. In this Perspective, we highlight some of the success stories of drugs repositioned against neglected parasitic diseases and identify future potential for effective and sustainable drug development against brain-eating amoebae infections.
Collapse
Affiliation(s)
- Ayaz Anwar
- Department of Biological Sciences, School of Science and Technology, Sunway University, Subang Jaya 47500, Selangor, Malaysia
| | - Naveed Ahmed Khan
- Department of Biology, Chemistry and Environmental Sciences, College of Arts and Sciences, American University of Sharjah, Sharjah 26666, United Arab Emirates
| | - Ruqaiyyah Siddiqui
- Department of Biology, Chemistry and Environmental Sciences, College of Arts and Sciences, American University of Sharjah, Sharjah 26666, United Arab Emirates
| |
Collapse
|
12
|
Nicholson DN, Greene CS. Constructing knowledge graphs and their biomedical applications. Comput Struct Biotechnol J 2020; 18:1414-1428. [PMID: 32637040 PMCID: PMC7327409 DOI: 10.1016/j.csbj.2020.05.017] [Citation(s) in RCA: 95] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 05/22/2020] [Accepted: 05/23/2020] [Indexed: 12/31/2022] Open
Abstract
Knowledge graphs can support many biomedical applications. These graphs represent biomedical concepts and relationships in the form of nodes and edges. In this review, we discuss how these graphs are constructed and applied with a particular focus on how machine learning approaches are changing these processes. Biomedical knowledge graphs have often been constructed by integrating databases that were populated by experts via manual curation, but we are now seeing a more robust use of automated systems. A number of techniques are used to represent knowledge graphs, but often machine learning methods are used to construct a low-dimensional representation that can support many different applications. This representation is designed to preserve a knowledge graph's local and/or global structure. Additional machine learning methods can be applied to this representation to make predictions within genomic, pharmaceutical, and clinical domains. We frame our discussion first around knowledge graph construction and then around unifying representational learning techniques and unifying applications. Advances in machine learning for biomedicine are creating new opportunities across many domains, and we note potential avenues for future work with knowledge graphs that appear particularly promising.
Collapse
Affiliation(s)
- David N. Nicholson
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, United States
| | - Casey S. Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, United States
| |
Collapse
|
13
|
Oh J, Bae H, Kim CE. Construction And Analysis Of The Time-Evolving Pain-Related Brain Network Using Literature Mining. J Pain Res 2019; 12:2891-2903. [PMID: 31802931 PMCID: PMC6801488 DOI: 10.2147/jpr.s217036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Accepted: 09/17/2019] [Indexed: 11/23/2022] Open
Abstract
Purpose We aimed to quantitatively investigate how the neuroscience field developed over time in terms of its concept on how pain is represented in the brain and compare the research trends of pain with those of mental disorders through literature mining of accumulated published articles. Methods The abstracts and publication years of 137,525 pain-related articles were retrieved from the PubMed database. We defined 22 pain-related brain regions that appeared more than 100 times in the retrieved abstracts. Time-evolving networks of pain-related brain regions were constructed using the co-occurrence frequency. The state-space model was implemented to capture the trend patterns of the pain-related brain regions and the patterns were compared with those of mental disorders. Results The number of pain-related abstracts including brain areas steadily increased; however, the relative frequency of each brain region showed different patterns. According to the chronological patterns of relative frequencies, pain-related brain regions were clustered into three groups: rising, falling, and consistent. The network of pain-related brain regions extended over time from localized regions (mainly including brain stem and diencephalon) to wider cortical/subcortical regions. In the state-space model, the relative frequency trajectory of pain-related brain regions gradually became closer to that of mental disorder-related brain regions. Conclusion Temporal changes of pain-related brain regions in the abstracts indicate that emotional/cognitive aspects of pain have been gradually emphasized. The networks of pain-related brain regions imply perspective changes on pain from the simple percept to the multidimensional experience. Based on the notable occurrence patterns of the cerebellum and motor cortex, we suggest that motor-related areas will be actively explored in pain studies.
Collapse
Affiliation(s)
- Jihong Oh
- Department of Physiology, College of Korean Medicine, Gachon University, Seongnam 13120, Republic of Korea
| | - Hyojin Bae
- Department of Physiology, College of Korean Medicine, Gachon University, Seongnam 13120, Republic of Korea
| | - Chang-Eop Kim
- Department of Physiology, College of Korean Medicine, Gachon University, Seongnam 13120, Republic of Korea
| |
Collapse
|
14
|
Kim YH, Song M. A context-based ABC model for literature-based discovery. PLoS One 2019; 14:e0215313. [PMID: 31017923 PMCID: PMC6481912 DOI: 10.1371/journal.pone.0215313] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 03/29/2019] [Indexed: 12/13/2022] Open
Abstract
Background In the literature-based discovery, considerable research has been done based on the ABC model developed by Swanson. ABC model hypothesizes that there is a meaningful relation between entity A extracted from document set 1 and entity C extracted from document set 2 through B entities that appear commonly in both document sets. The results of ABC model are relations among entity A, B, and C, which is referred as paths. A path allows for hypothesizing the relationship between entity A and entity C, or helps discover entity B as a new evidence for the relationship between entity A and entity C. The co-occurrence based approach of ABC model is a well-known approach to automatic hypothesis generation by creating various paths. However, the co-occurrence based ABC model has a limitation, in that biological context is not considered. It focuses only on matching of B entity which commonly appears in relation between two entities. Therefore, the paths extracted by the co-occurrence based ABC model tend to include a lot of irrelevant paths, meaning that expert verification is essential. Methods In order to overcome this limitation of the co-occurrence based ABC model, we propose a context-based approach to connecting one entity relation to another, modifying the ABC model using biological contexts. In this study, we defined four biological context elements: cell, drug, disease, and organism. Based on these biological context, we propose two extended ABC models: a context-based ABC model and a context-assignment-based ABC model. In order to measure the performance of the both proposed models, we examined the relevance of the B entities between the well-known relations “APOE–MAPT” as well as “FUS–TARDBP”. Each relation means interaction between neurodegenerative disease associated with proteins. The interaction between APOE and MAPT is known to play a crucial role in Alzheimer’s disease as APOE affects tau-mediated neurodegeneration. It has been shown that mutation in FUS and TARDBP are associated with amyotrophic lateral sclerosis(ALS), a motor neuron disease by leading to neuronal cell death. Using these two relations, we compared both of proposed models to co-occurrence based ABC model. Results The precision of B entities by co-occurrence based ABC model was 27.1% for “APOE–MAPT” and 22.1% for “FUS–TARDBP”, respectively. In context-based ABC model, precision of extracted B entities was 71.4% for “APOE–MAPT”, and 77.9% for “FUS–TARDBP”. Context-assignment based ABC model achieved 89% and 97.5% precision for the two relations, respectively. Both proposed models achieved a higher precision than co-occurrence-based ABC model.
Collapse
Affiliation(s)
- Yong Hwan Kim
- Division of Humanities, CheongJu University, CheongJu, Korea
| | - Min Song
- Department of Library and Information Science, Yonsei University, Seoul, Korea
- * E-mail:
| |
Collapse
|
15
|
Lotfi Shahreza M, Ghadiri N, Mousavi SR, Varshosaz J, Green JR. A review of network-based approaches to drug repositioning. Brief Bioinform 2019; 19:878-892. [PMID: 28334136 DOI: 10.1093/bib/bbx017] [Citation(s) in RCA: 184] [Impact Index Per Article: 30.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2016] [Indexed: 01/17/2023] Open
Abstract
Experimental drug development is time-consuming, expensive and limited to a relatively small number of targets. However, recent studies show that repositioning of existing drugs can function more efficiently than de novo experimental drug development to minimize costs and risks. Previous studies have proven that network analysis is a versatile platform for this purpose, as the biological networks are used to model interactions between many different biological concepts. The present study is an attempt to review network-based methods in predicting drug targets for drug repositioning. For each method, the preferred type of data set is described, and their advantages and limitations are discussed. For each method, we seek to provide a brief description, as well as an evaluation based on its performance metrics.We conclude that integrating distinct and complementary data should be used because each type of data set reveals a unique aspect of information about an organism. We also suggest that applying a standard set of evaluation metrics and data sets would be essential in this fast-growing research domain.
Collapse
Affiliation(s)
- Maryam Lotfi Shahreza
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, Iran
| | - Nasser Ghadiri
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, Iran
| | | | - Jaleh Varshosaz
- Drug Delivery Systems Research Center of Isfahan University of Medical Sciences
| | - James R Green
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, Iran
| |
Collapse
|
16
|
Yin Z, Guo B, Mi Z, Li J, Zheng Z. Gene Saturation: An Approach to Assess Exploration Stage of Gene Interaction Networks. Sci Rep 2019; 9:5017. [PMID: 30899072 PMCID: PMC6428845 DOI: 10.1038/s41598-019-41539-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Accepted: 03/11/2019] [Indexed: 12/26/2022] Open
Abstract
The gene interaction network is one of the most important biological networks and has been studied by many researchers. The gene interaction network provides information about whether the genes in the network can cause or heal diseases. As gene-gene interaction relations are constantly explored, gene interaction networks are evolving. To describe how much a gene has been studied, an approach based on a logistic model for each gene called gene saturation has been proposed, which in most cases, satisfies non-decreasing, correlation and robustness principles. The average saturation of a group of genes can be used to assess the network constructed by these genes. Saturation reflects the distance between known gene interaction networks and the real gene interaction network in a cell. Furthermore, the saturation values of 546 disease gene networks that belong to 15 categories of diseases have been calculated. The disease gene networks’ saturation for cancer is significantly higher than that of all other diseases, which means that the disease gene networks’ structure for cancer has been more deeply studied than other disease. Gene saturation provides guidance for selecting an experimental subject gene, which may have a large number of unknown interactions.
Collapse
Affiliation(s)
- Ziqiao Yin
- Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing, 100191, China.,Shenyuan Honors College and School of Mathematics and Systems Science, Beihang University, Beijing, 100191, China.,LMIB and Peng Cheng Laboratory, Shenzhen, 518055, Guangdong, China
| | - Binghui Guo
- Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing, 100191, China. .,Shenyuan Honors College and School of Mathematics and Systems Science, Beihang University, Beijing, 100191, China. .,LMIB and Peng Cheng Laboratory, Shenzhen, 518055, Guangdong, China.
| | - Zhilong Mi
- Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing, 100191, China.,Shenyuan Honors College and School of Mathematics and Systems Science, Beihang University, Beijing, 100191, China.,LMIB and Peng Cheng Laboratory, Shenzhen, 518055, Guangdong, China
| | - Jiahui Li
- Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing, 100191, China.,Shenyuan Honors College and School of Mathematics and Systems Science, Beihang University, Beijing, 100191, China.,LMIB and Peng Cheng Laboratory, Shenzhen, 518055, Guangdong, China
| | - Zhiming Zheng
- Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing, 100191, China.,Shenyuan Honors College and School of Mathematics and Systems Science, Beihang University, Beijing, 100191, China.,LMIB and Peng Cheng Laboratory, Shenzhen, 518055, Guangdong, China
| |
Collapse
|
17
|
Alaimo S, Pulvirenti A. Network-Based Drug Repositioning: Approaches, Resources, and Research Directions. Methods Mol Biol 2019; 1903:97-113. [PMID: 30547438 DOI: 10.1007/978-1-4939-8955-3_6] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The wealth of knowledge and omic data available in drug research allowed the rising of several computational methods in drug discovery field yielding a novel and exciting application called drug repositioning. Several computational methods try to make a high-level integration of all the knowledge in order to discover unknown mechanisms. In this chapter we present an in-depth review of data resources and computational models for drug repositioning.
Collapse
Affiliation(s)
- Salvatore Alaimo
- Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy
| | - Alfredo Pulvirenti
- Department of Clinical and Experimental Medicine, University of Catania, Catania, Italy.
| |
Collapse
|
18
|
Antunes R, Matos S. Extraction of chemical-protein interactions from the literature using neural networks and narrow instance representation. Database (Oxford) 2019; 2019:baz095. [PMID: 31622463 PMCID: PMC6796919 DOI: 10.1093/database/baz095] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Revised: 06/28/2019] [Accepted: 07/01/2019] [Indexed: 01/21/2023]
Abstract
The scientific literature contains large amounts of information on genes, proteins, chemicals and their interactions. Extraction and integration of this information in curated knowledge bases help researchers support their experimental results, leading to new hypotheses and discoveries. This is especially relevant for precision medicine, which aims to understand the individual variability across patient groups in order to select the most appropriate treatments. Methods for improved retrieval and automatic relation extraction from biomedical literature are therefore required for collecting structured information from the growing number of published works. In this paper, we follow a deep learning approach for extracting mentions of chemical-protein interactions from biomedical articles, based on various enhancements over our participation in the BioCreative VI CHEMPROT task. A significant aspect of our best method is the use of a simple deep learning model together with a very narrow representation of the relation instances, using only up to 10 words from the shortest dependency path and the respective dependency edges. Bidirectional long short-term memory recurrent networks or convolutional neural networks are used to build the deep learning models. We report the results of several experiments and show that our best model is competitive with more complex sentence representations or network structures, achieving an F1-score of 0.6306 on the test set. The source code of our work, along with detailed statistics, is publicly available.
Collapse
Affiliation(s)
- Rui Antunes
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal
| | - Sérgio Matos
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal
| |
Collapse
|
19
|
Wang H, Liu X, Tao Y, Ye W, Jin Q, Cohen WW, Xing EP. Automatic Human-like Mining and Constructing Reliable Genetic Association Database with Deep Reinforcement Learning. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019; 24:112-123. [PMID: 30864315 PMCID: PMC6417822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The increasing amount of scientific literature in biological and biomedical science research has created a challenge in continuous and reliable curation of the latest knowledge discovered, and automatic biomedical text-mining has been one of the answers to this challenge. In this paper, we aim to further improve the reliability of biomedical text-mining by training the system to directly simulate the human behaviors such as querying the PubMed, selecting articles from queried results, and reading selected articles for knowledge. We take advantage of the efficiency of biomedical text-mining, the exibility of deep reinforcement learning, and the massive amount of knowledge collected in UMLS into an integrative artificial intelligent reader that can automatically identify the authentic articles and effectively acquire the knowledge conveyed in the articles. We construct a system, whose current primary task is to build the genetic association database between genes and complex traits of human. Our contributions in this paper are three-fold: 1) We propose to improve the reliability of text-mining by building a system that can directly simulate the behavior of a researcher, and we develop corresponding methods, such as Bi-directional LSTM for text mining and Deep Q-Network for organizing behaviors. 2) We demonstrate the effectiveness of our system with an example in constructing a genetic association database. 3) We release our implementation as a generic framework for researchers in the community to conveniently construct other databases.
Collapse
Affiliation(s)
- Haohan Wang
- Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xiang Liu
- Chinese University of Hong Kong Shenzhen, China
| | - Yifeng Tao
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Wenting Ye
- Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Qiao Jin
- Tsinghua University Beijing, China
| | - William W. Cohen
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA,Google AI Pittsburgh, PA, USA
| | - Eric P. Xing
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA,Pettum Inc. Pittsburgh, PA, USA
| |
Collapse
|
20
|
Tian Z, Teng Z, Cheng S, Guo M. Computational drug repositioning using meta-path-based semantic network analysis. BMC SYSTEMS BIOLOGY 2018; 12:134. [PMID: 30598084 PMCID: PMC6311940 DOI: 10.1186/s12918-018-0658-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
BACKGROUND Drug repositioning is a promising and efficient way to discover new indications for existing drugs, which holds the great potential for precision medicine in the post-genomic era. Many network-based approaches have been proposed for drug repositioning based on similarity networks, which integrate multiple sources of drugs and diseases. However, these methods may simply view nodes as the same-typed and neglect the semantic meanings of different meta-paths in the heterogeneous network. Therefore, it is urgent to develop a rational method to infer new indications for approved drugs. RESULTS In this study, we proposed a novel methodology named HeteSim_DrugDisease (HSDD) for the prediction of drug repositioning. Firstly, we build the drug-drug similarity network and disease-disease similarity network by integrating the information of drugs and diseases. Secondly, a drug-disease heterogeneous network is constructed, which combines the drug similarity network, disease similarity network as well as the known drug-disease association network. Finally, HSDD predicts novel drug-disease associations based on the HeteSim scores of different meta-paths. The experimental results show that HSDD performs significantly better than the existing state-of-the-art approaches. HSDD achieves an AUC score of 0.8994 in the leave-one-out cross validation experiment. Moreover, case studies for selected drugs further illustrate the practical usefulness of HSDD. CONCLUSIONS HSDD can be an effective and feasible way to infer the associations between drugs and diseases using on meta-path-based semantic network analysis.
Collapse
Affiliation(s)
- Zhen Tian
- School of Information Engineering, Zhengzhou University, Zhengzhou, 450001, People's Republic of China
| | - Zhixia Teng
- School of information and computer engineering, Northeast Forestry, Harbin, 150001, People's Republic of China
| | - Shuang Cheng
- Institute of Materials, China Academy of Engineering Physics, Jiang You, 621907, Sichuan, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, 100044, People's Republic of China. .,Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing, 100044, China.
| |
Collapse
|
21
|
Singh S, Gupta SK, Seth PK. Biomarkers for detection, prognosis and therapeutic assessment of neurological disorders. Rev Neurosci 2018; 29:771-789. [PMID: 29466244 DOI: 10.1515/revneuro-2017-0097] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Accepted: 12/17/2017] [Indexed: 10/24/2023]
Abstract
Neurological disorders have aroused a significant concern among the health scientists globally, as diseases such as Parkinson's, Alzheimer's and dementia lead to disability and people have to live with them throughout the life. Recent evidence suggests that a number of environmental chemicals such as pesticides (paraquat) and metals (lead and aluminum) are also the cause of these diseases and other neurological disorders. Biomarkers can help in detecting the disorder at the preclinical stage, progression of the disease and key metabolomic alterations permitting identification of potential targets for intervention. A number of biomarkers have been proposed for some neurological disorders based on laboratory and clinical studies. In silico approaches have also been used by some investigators. Yet the ideal biomarker, which can help in early detection and follow-up on treatment and identifying the susceptible populations, is not available. An attempt has therefore been made to review the recent advancements of in silico approaches for discovery of biomarkers and their validation. In silico techniques implemented with multi-omics approaches have potential to provide a fast and accurate approach to identify novel biomarkers.
Collapse
Affiliation(s)
- Sarita Singh
- Distinguished Scientist Laboratory, Biotech Park, Sector-G Jankipram, Kursi Road, Lucknow 226021, Uttar Pradesh, India
| | - Sunil Kumar Gupta
- Distinguished Scientist Laboratory, Biotech Park, Lucknow 226021, Uttar Pradesh, India
| | - Prahlad Kishore Seth
- Distinguished Scientist Laboratory, Biotech Park, Lucknow 226021, Uttar Pradesh, India
| |
Collapse
|
22
|
Optimization of a Density Gradient Centrifugation Protocol for Isolation of Peripheral Blood Mononuclear Cells. ACTA MEDICA MARISIENSIS 2018. [DOI: 10.2478/amma-2018-0011] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Abstract
Objective: Peripheral blood mononuclear cells (PBMC) are extremely important in the body’s immune response. Their isolation represents a major step in many immunological experiments. In this two phase study, we aimed to establish an optimum protocol for PBMC isolation by density-gradient centrifugation.
Methods: During Phase-1, we compared two commercially available PBMC isolation protocols, Stemcell Technologies (ST) and Miltenyi Biotec (MB), in terms of PBMC recovery and purity. Twelve blood samples were assigned to each protocol. Each sample was divided in three subsamples of 1ml, 2ml and 3ml in order to assess the influence of blood sample volume on isolation performance. During Phase-2, a hybrid protocol was similarly tested, processing six blood samples. Additionally, we performed a flow cytometric analysis using an Annexin-V/Propidium-Iodide viability staining protocol.
Results: Phase-1 results showed that, for all subsample volumes, ST had superior PBMC recovery (mean values: 56%, 80% and 87%, respectively) compared to MB (mean values: 39%, 54% and 43%, respectively). However, platelet removal was significantly higher for MB (mean value of 96.8%) than for ST (mean value of 75.2%). Regarding granulocyte/erythrocyte contamination, both protocols performed similarly, yielding high purity PBMC (mean values: 97.3% for ST and 95.8% for MB). During Phase-2, our hybrid protocol yielded comparable results to MB, with an average viability of 89.4% for lymphocytes and 16.9% for monocytes.
Conclusions: ST yields higher cell recovery rates and MB excels at platelet removal, while the hybrid protocol is highly similar to MB. Both cell recovery and viability increase with blood sample volume.
Collapse
|
23
|
Roth A, Subramanian S, Ganapathiraju MK. Towards Extracting Supporting Information About Predicted Protein-Protein Interactions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1239-1246. [PMID: 26672046 DOI: 10.1109/tcbb.2015.2505278] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
One of the goals of relation extraction is to identify protein-protein interactions (PPIs) in biomedical literature. Current systems are capturing binary relations and also the direction and type of an interaction. Besides assisting in the curation PPIs into databases, there has been little real-world application of these algorithms. We describe UPSITE, a text mining tool for extracting evidence in support of a hypothesized interaction. Given a predicted PPI, UPSITE uses a binary relation detector to check whether a PPI is found in abstracts in PubMed. If it is not found, UPSITE retrieves documents relevant to each of the two proteins separately, and extracts contextual information about biological events surrounding each protein, and calculates semantic similarity of the two proteins to provide evidential support for the predicted PPI. In evaluations, relation extraction achieved an Fscore of 0.88 on the HPRD50 corpus, and semantic similarity measured with angular distance was found to be statistically significant. With the development of PPI prediction algorithms, the burden of interpreting the validity and relevance of novel PPIs is on biologists. We suggest that presenting annotations of the two proteins in a PPI side-by-side and a score that quantifies their similarity lessens this burden to some extent.
Collapse
|
24
|
Reynés B, Priego T, Cifre M, Oliver P, Palou A. Peripheral Blood Cells, a Transcriptomic Tool in Nutrigenomic and Obesity Studies: Current State of the Art. Compr Rev Food Sci Food Saf 2018; 17:1006-1020. [DOI: 10.1111/1541-4337.12363] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2018] [Revised: 04/13/2018] [Accepted: 04/14/2018] [Indexed: 12/11/2022]
Affiliation(s)
- Bàrbara Reynés
- Laboratory of Molecular Biology, Nutrition and Biotechnology; Univ. de les Illes Balears; Palma Spain
- CIBER de Fisiopatología de la Obesidad y Nutrición (CIBEROBN); Madrid Spain
- Inst. d'Investigació Sanitària Illes Balears (IdISBa); Palma Spain
| | - Teresa Priego
- Dept. of Physiology, Faculty of Medicine; Univ. Complutense de Madrid; Madrid Spain
| | - Margalida Cifre
- Laboratory of Molecular Biology, Nutrition and Biotechnology; Univ. de les Illes Balears; Palma Spain
- CIBER de Fisiopatología de la Obesidad y Nutrición (CIBEROBN); Madrid Spain
| | - Paula Oliver
- Laboratory of Molecular Biology, Nutrition and Biotechnology; Univ. de les Illes Balears; Palma Spain
- CIBER de Fisiopatología de la Obesidad y Nutrición (CIBEROBN); Madrid Spain
- Inst. d'Investigació Sanitària Illes Balears (IdISBa); Palma Spain
| | - Andreu Palou
- Laboratory of Molecular Biology, Nutrition and Biotechnology; Univ. de les Illes Balears; Palma Spain
- CIBER de Fisiopatología de la Obesidad y Nutrición (CIBEROBN); Madrid Spain
- Inst. d'Investigació Sanitària Illes Balears (IdISBa); Palma Spain
| |
Collapse
|
25
|
Sharma V, Sarkar IN. Identifying Supplement Use Within Clinical Notes: An Applicationof Natural Language Processing. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2018; 2017:196-205. [PMID: 29888071 PMCID: PMC5961809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Recent statistics indicate that the use of dietary supplements has increased over the years. Although being popular among consumers who use them for a variety of reasons, there have been limited clinical data-driven studies of the impact of dietary supplements on health outcomes. Challenges that impede such analyses in a comprehensive manner include either the sequestered nature of such data or their embedding within biomedical and clinical text. This study explored the feasibility to uncover patterns in the use of supplements, focusing on vitamin use among patients diagnosed with mental illness within patient records from the MIMIC-III database. The relevance of vitamin(s) was calculated at different levels of granularity and compared with association identified from Dietary Supplement Subset of MEDLINE. The results reveal insights into vitamin use for specific mental health related diagnosis and highlight challenges with identifying supplement information from clinical sources.
Collapse
Affiliation(s)
- Vivekanand Sharma
- Center for Biomedical Informatics, Brown University, Providence, Rhode Island
| | - Indra Neil Sarkar
- Center for Biomedical Informatics, Brown University, Providence, Rhode Island
| |
Collapse
|
26
|
Zhou J, Fu BQ. The research on gene-disease association based on text-mining of PubMed. BMC Bioinformatics 2018; 19:37. [PMID: 29415654 PMCID: PMC5804013 DOI: 10.1186/s12859-018-2048-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Accepted: 01/29/2018] [Indexed: 11/23/2022] Open
Abstract
Background The associations between genes and diseases are of critical significance in aspects of prevention, diagnosis and treatment. Although gene-disease relationships have been investigated extensively, much of the underpinnings of these associations are yet to be elucidated. Methods A novel method integrates MeSH database, term weight (TW), and co-occurrence methods to predict gene-disease associations based on the cosine similarity between gene vectors and disease vectors. Vectors are transformed from the texts of documents in the PubMed database according to the appearance and location of the gene or disease terms. The disease related text data has been optimized during the process of constructing vectors. Results The overall distribution of cosine similarity value was investigated. By using the gene-disease association data in OMIM database as golden standard, the performance of cosine similarity in predicting gene-disease linkage was evaluated. The effects of applying weight matrix, penalty weights for keywords (PWK), and normalization were also investigated. Finally, we demonstrated that our method outperforms heterogeneous network edge prediction (HNEP) in aspects of precision rate and recall rate. Conclusions Our method proposed in this paper is easy to be conducted and the results can be integrated with other models to improve the overall performance of gene-disease association predictions.
Collapse
Affiliation(s)
- Jie Zhou
- Guangdong Key Laboratory of Computer Network, School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006, China.
| | - Bo-Quan Fu
- Guangdong Key Laboratory of Computer Network, School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006, China
| |
Collapse
|
27
|
Zhang P, Wu H, Chiang C, Wang L, Binkheder S, Wang X, Zeng D, Quinney SK, Li L. Translational Biomedical Informatics and Pharmacometrics Approaches in the Drug Interactions Research. CPT-PHARMACOMETRICS & SYSTEMS PHARMACOLOGY 2017; 7:90-102. [PMID: 29193890 PMCID: PMC5824109 DOI: 10.1002/psp4.12267] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Accepted: 11/08/2017] [Indexed: 12/18/2022]
Abstract
Drug interaction is a leading cause of adverse drug events and a major obstacle for current clinical practice. Pharmacovigilance data mining, pharmacokinetic modeling, and text mining are computation and informatic tools on integrating drug interaction knowledge and generating drug interaction hypothesis. We provide a comprehensive overview of these translational biomedical informatics methodologies with related databases. We hope this review illustrates the complementary nature of these informatic approaches and facilitates the translational drug interaction research.
Collapse
Affiliation(s)
- Pengyue Zhang
- Department of Biomedical InformaticsCollege of Medicine, the Ohio State UniversityColumbusOhioUSA
| | - Heng‐Yi Wu
- Department of Biomedical InformaticsCollege of Medicine, the Ohio State UniversityColumbusOhioUSA
| | - Chien‐Wei Chiang
- Department of Biomedical InformaticsCollege of Medicine, the Ohio State UniversityColumbusOhioUSA
| | - Lei Wang
- Department of Biomedical InformaticsCollege of Medicine, the Ohio State UniversityColumbusOhioUSA
- Intelligent Systems and Bioinformatics Institute, College of Automation, Harbin Engineering UniversityHarbinHeilongjiangChina
| | - Samar Binkheder
- Department of Biohealth InformaticsIndiana University School of Informatics and ComputingIndianapolisIndianaUSA
- Medical Informatics Unit, College of Medicine, King Saud UniversityRiyadhSaudi Arabia
| | - Xueying Wang
- Intelligent Systems and Bioinformatics Institute, College of Automation, Harbin Engineering UniversityHarbinHeilongjiangChina
| | - Donglin Zeng
- Department of BiostatisticsUniversity of North Carolina at Chapel HillChapel HillNorth CarolinaUSA
| | - Sara K. Quinney
- Department of Obstetrics and GynecologyIndiana UniversityIndianapolisIndianaUSA
| | - Lang Li
- Department of Biomedical InformaticsCollege of Medicine, the Ohio State UniversityColumbusOhioUSA
| |
Collapse
|
28
|
Smalheiser NR. Rediscovering Don Swanson: the Past, Present and Future of Literature-Based Discovery. JOURNAL OF DATA AND INFORMATION SCIENCE 2017; 2:43-64. [PMID: 29355246 PMCID: PMC5771422 DOI: 10.1515/jdis-2017-0019] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
The late Don R. Swanson was well appreciated during his lifetime as Dean of the Graduate Library School at University of Chicago, as winner of the American Society for Information Science Award of Merit for 2000, and as author of many seminal articles. In this informal essay, I will give my personal perspective on Don's contributions to science, and outline some current and future directions in literature-based discovery that are rooted in concepts that he developed.
Collapse
Affiliation(s)
- Neil R Smalheiser
- Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 60612 USA, +1 312-413-4581
| |
Collapse
|
29
|
Henry S, McInnes BT. Literature Based Discovery: Models, methods, and trends. J Biomed Inform 2017; 74:20-32. [PMID: 28838802 DOI: 10.1016/j.jbi.2017.08.011] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Revised: 07/21/2017] [Accepted: 08/20/2017] [Indexed: 01/25/2023]
Abstract
OBJECTIVES This paper provides an introduction and overview of literature based discovery (LBD) in the biomedical domain. It introduces the reader to modern and historical LBD models, key system components, evaluation methodologies, and current trends. After completion, the reader will be familiar with the challenges and methodologies of LBD. The reader will be capable of distinguishing between recent LBD systems and publications, and be capable of designing an LBD system for a specific application. TARGET AUDIENCE From biomedical researchers curious about LBD, to someone looking to design an LBD system, to an LBD expert trying to catch up on trends in the field. The reader need not be familiar with LBD, but knowledge of biomedical text processing tools is helpful. SCOPE This paper describes a unifying framework for LBD systems. Within this framework, different models and methods are presented to both distinguish and show overlap between systems. Topics include term and document representation, system components, and an overview of models including co-occurrence models, semantic models, and distributional models. Other topics include uninformative term filtering, term ranking, results display, system evaluation, an overview of the application areas of drug development, drug repurposing, and adverse drug event prediction, and challenges and future directions. A timeline showing contributions to LBD, and a table summarizing the works of several authors is provided. Topics are presented from a high level perspective. References are given if more detailed analysis is required.
Collapse
Affiliation(s)
- Sam Henry
- Department of Computer Science, Virginia Commonwealth University, 401 S. Main St., Rm E4222, Richmond, VA 23284, USA.
| | - Bridget T McInnes
- Department of Computer Science, Virginia Commonwealth University, 401 S. Main St., Rm E4222, Richmond, VA 23284, USA
| |
Collapse
|
30
|
Enriching plausible new hypothesis generation in PubMed. PLoS One 2017; 12:e0180539. [PMID: 28678852 PMCID: PMC5498031 DOI: 10.1371/journal.pone.0180539] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2016] [Accepted: 05/30/2017] [Indexed: 12/17/2022] Open
Abstract
Background Most of earlier studies in the field of literature-based discovery have adopted Swanson's ABC model that links pieces of knowledge entailed in disjoint literatures. However, the issue concerning their practicability remains to be solved since most of them did not deal with the context surrounding the discovered associations and usually not accompanied with clinical confirmation. In this study, we aim to propose a method that expands and elaborates the existing hypothesis by advanced text mining techniques for capturing contexts. We extend ABC model to allow for multiple B terms with various biological types. Results We were able to concretize a specific, metabolite-related hypothesis with abundant contextual information by using the proposed method. Starting from explaining the relationship between lactosylceramide and arterial stiffness, the hypothesis was extended to suggest a potential pathway consisting of lactosylceramide, nitric oxide, malondialdehyde, and arterial stiffness. The experiment by domain experts showed that it is clinically valid. Conclusions The proposed method is designed to provide plausible candidates of the concretized hypothesis, which are based on extracted heterogeneous entities and detailed relation information, along with a reliable ranking criterion. Statistical tests collaboratively conducted with biomedical experts provide the validity and practical usefulness of the method unlike previous studies. Applying the proposed method to other cases, it would be helpful for biologists to support the existing hypothesis and easily expect the logical process within it.
Collapse
|
31
|
Zhu Y, Yan E, Wang F. Semantic relatedness and similarity of biomedical terms: examining the effects of recency, size, and section of biomedical publications on the performance of word2vec. BMC Med Inform Decis Mak 2017; 17:95. [PMID: 28673289 PMCID: PMC5496182 DOI: 10.1186/s12911-017-0498-1] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2017] [Accepted: 06/28/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Understanding semantic relatedness and similarity between biomedical terms has a great impact on a variety of applications such as biomedical information retrieval, information extraction, and recommender systems. The objective of this study is to examine word2vec's ability in deriving semantic relatedness and similarity between biomedical terms from large publication data. Specifically, we focus on the effects of recency, size, and section of biomedical publication data on the performance of word2vec. METHODS We download abstracts of 18,777,129 articles from PubMed and 766,326 full-text articles from PubMed Central (PMC). The datasets are preprocessed and grouped into subsets by recency, size, and section. Word2vec models are trained on these subtests. Cosine similarities between biomedical terms obtained from the word2vec models are compared against reference standards. Performance of models trained on different subsets are compared to examine recency, size, and section effects. RESULTS Models trained on recent datasets did not boost the performance. Models trained on larger datasets identified more pairs of biomedical terms than models trained on smaller datasets in relatedness task (from 368 at the 10% level to 494 at the 100% level) and similarity task (from 374 at the 10% level to 491 at the 100% level). The model trained on abstracts produced results that have higher correlations with the reference standards than the one trained on article bodies (i.e., 0.65 vs. 0.62 in the similarity task and 0.66 vs. 0.59 in the relatedness task). However, the latter identified more pairs of biomedical terms than the former (i.e., 344 vs. 498 in the similarity task and 339 vs. 503 in the relatedness task). CONCLUSIONS Increasing the size of dataset does not always enhance the performance. Increasing the size of datasets can result in the identification of more relations of biomedical terms even though it does not guarantee better precision. As summaries of research articles, compared with article bodies, abstracts excel in accuracy but lose in coverage of identifiable relations.
Collapse
Affiliation(s)
- Yongjun Zhu
- Healthcare Policy and Research, Weill Cornell Medicine, Cornell University, New York, NY, USA.
| | - Erjia Yan
- College of Computing and Informatics, Drexel University, Philadelphia, PA, USA
| | - Fei Wang
- Healthcare Policy and Research, Weill Cornell Medicine, Cornell University, New York, NY, USA
| |
Collapse
|
32
|
Abstract
AbstractLiterature-based discovery systems aim at discovering valuable latent connections between previously disparate research areas. This is achieved by analyzing the contents of their respective literatures with the help of various intelligent computational techniques. In this paper, we review the progress of literature-based discovery research, focusing on understanding their technical features and evaluating their performance. The present literature-based discovery techniques can be divided into two general approaches: the traditional approach and the emerging approach. The traditional approach, which dominate the current research landscape, comprises mainly of techniques that rely on utilizing lexical statistics, knowledge-based and visualization methods in order to address literature-based discovery problems. On the other hand, we have also observed the births of new trends and unprecedented paradigm shifts among the recently emerging literature-based discovery approach. These trends are likely to shape the future trajectory of the next generation literature-based discovery systems.
Collapse
|
33
|
Cifre M, Díaz-Rúa R, Varela-Calviño R, Reynés B, Pericás-Beltrán J, Palou A, Oliver P. Human peripheral blood mononuclear cell in vitro system to test the efficacy of food bioactive compounds: Effects of polyunsaturated fatty acids and their relation with BMI. Mol Nutr Food Res 2016; 61. [PMID: 27873461 DOI: 10.1002/mnfr.201600353] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Revised: 11/14/2016] [Accepted: 11/17/2016] [Indexed: 01/04/2023]
Abstract
SCOPE To analyse the usefulness of isolated human peripheral blood mononuclear cells (PBMC) to rapidly/easily reflect n-3 long-chain polyunsaturated fatty acid (LCPUFA) effects on lipid metabolism/inflammation gene profile, and evaluate if these effects are body mass index (BMI) dependent. METHODS AND RESULTS PBMC from normoweight (NW) and overweight/obese (OW/OB) subjects were incubated with physiological doses of docosahexaenoic (DHA), eicosapentaenoic acid (EPA), or their combination. PBMC reflected increased beta-oxidation-like capacity (CPT1A expression) in OW/OB but only after DHA treatment. However, insensitivity to n-3 LCPUFA was evident in OW/OB for lipogenic genes: both PUFA diminished FASN and SREBP1C expression in NW, but no effect was observed for DHA in PBMC from high-BMI subjects. This insensitivity was also evident for inflammation gene profile: all treatments inhibited key inflammatory genes in NW; nevertheless, no effect was observed in OW/OB after DHA treatment, and EPA effect was impaired. SLC27A2, IL6 and TNFα PBMC expression analysis resulted especially interesting to determine obesity-related n-3 LCPUFA insensitivity. CONCLUSION A PBMC-based human in vitro system reflects n-3 LCPUFA effects on lipid metabolism/inflammation which is impaired in OW/OB. These results confirm the utility of PBMC ex vivo systems for bioactive-compound screening to promote functional food development and to establish appropriate dietary strategies for obese population.
Collapse
Affiliation(s)
- Margalida Cifre
- Laboratory of Molecular Biology, Nutrition and Biotechnology, Universitat de les Illes Balears and CIBER de Fisiopatología de la Obesidad y Nutrición (CIBERobn), Palma de Mallorca, Spain
| | - Rubén Díaz-Rúa
- Laboratory of Molecular Biology, Nutrition and Biotechnology, Universitat de les Illes Balears and CIBER de Fisiopatología de la Obesidad y Nutrición (CIBERobn), Palma de Mallorca, Spain
| | - Rubén Varela-Calviño
- Department of Biochemistry and Molecular Biology, University of Santiago de Compostela, Santiago de Compostela, Spain
| | - Bàrbara Reynés
- Laboratory of Molecular Biology, Nutrition and Biotechnology, Universitat de les Illes Balears and CIBER de Fisiopatología de la Obesidad y Nutrición (CIBERobn), Palma de Mallorca, Spain
| | - Jordi Pericás-Beltrán
- Research Group on Evidence, Lifestyles & Health, Universitat de les Illes Balears, Palma de Mallorca, Spain
| | - Andreu Palou
- Laboratory of Molecular Biology, Nutrition and Biotechnology, Universitat de les Illes Balears and CIBER de Fisiopatología de la Obesidad y Nutrición (CIBERobn), Palma de Mallorca, Spain
| | - Paula Oliver
- Laboratory of Molecular Biology, Nutrition and Biotechnology, Universitat de les Illes Balears and CIBER de Fisiopatología de la Obesidad y Nutrición (CIBERobn), Palma de Mallorca, Spain
| |
Collapse
|
34
|
Sharma V, Law W, Balick MJ, Sarkar IN. Identifying Plant-Human Disease Associations in Biomedical Literature: A Case Study. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2016; 2016:84-93. [PMID: 27595045 PMCID: PMC5009952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The impact of ethnobotanical data from surveys of traditional medicinal uses ofplants can be enhanced through the validation of biomedical knowledge that may be embedded in literature. This study aimed to explore the use of informatics approaches, including natural language processing and terminology resources, for extracting and comparing ethnobotanical leads from biomedical literature indexed in MEDLINE. Using ethnobotanical data for plant species described in Primary Health Care Manuals of the Micronesian islands of Palau and Pohnpei, the results of this study were done relative to disease concepts from the "Mental, Behavioral And Neurodevelopmental Disorders " ICD-9-CM category. The results from this feasibility study suggest that informatics methods can be used to extract and prioritize relevant ethnobotanical information from biomedical knowledge literature.
Collapse
Affiliation(s)
- Vivekanand Sharma
- Center for Biomedical Informatics, Brown University, Providence, RI USA
| | - Wayne Law
- Institute of Economic Botany, The New York Botanical Garden, Bronx, NY USA
| | - Michael J. Balick
- Institute of Economic Botany, The New York Botanical Garden, Bronx, NY USA
| | - Indra Neil Sarkar
- Center for Biomedical Informatics, Brown University, Providence, RI USA
| |
Collapse
|
35
|
Yang HT, Ju JH, Wong YT, Shmulevich I, Chiang JH. Literature-based discovery of new candidates for drug repurposing. Brief Bioinform 2016; 18:488-497. [DOI: 10.1093/bib/bbw030] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2015] [Indexed: 11/14/2022] Open
|
36
|
Li TS, Bravo À, Furlong LI, Good BM, Su AI. A crowdsourcing workflow for extracting chemical-induced disease relations from free text. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw051. [PMID: 27087308 PMCID: PMC4834205 DOI: 10.1093/database/baw051] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2015] [Accepted: 03/17/2016] [Indexed: 01/05/2023]
Abstract
Relations between chemicals and diseases are one of the most queried biomedical interactions. Although expert manual curation is the standard method for extracting these relations from the literature, it is expensive and impractical to apply to large numbers of documents, and therefore alternative methods are required. We describe here a crowdsourcing workflow for extracting chemical-induced disease relations from free text as part of the BioCreative V Chemical Disease Relation challenge. Five non-expert workers on the CrowdFlower platform were shown each potential chemical-induced disease relation highlighted in the original source text and asked to make binary judgments about whether the text supported the relation. Worker responses were aggregated through voting, and relations receiving four or more votes were predicted as true. On the official evaluation dataset of 500 PubMed abstracts, the crowd attained a 0.505 F-score (0.475 precision, 0.540 recall), with a maximum theoretical recall of 0.751 due to errors with named entity recognition. The total crowdsourcing cost was $1290.67 ($2.58 per abstract) and took a total of 7 h. A qualitative error analysis revealed that 46.66% of sampled errors were due to task limitations and gold standard errors, indicating that performance can still be improved. All code and results are publicly available at https://github.com/SuLab/crowd_cid_relex Database URL: https://github.com/SuLab/crowd_cid_relex
Collapse
Affiliation(s)
- Tong Shu Li
- Department of Molecular and Experimental Medicine, the Scripps Research Institute, La Jolla, CA 92037, USA
| | - Àlex Bravo
- Research Programme on Biomedical Informatics (GRIB), IMIM, UPF, Barcelona, Spain
| | - Laura I Furlong
- Research Programme on Biomedical Informatics (GRIB), IMIM, UPF, Barcelona, Spain
| | - Benjamin M Good
- Department of Molecular and Experimental Medicine, the Scripps Research Institute, La Jolla, CA 92037, USA
| | - Andrew I Su
- Department of Molecular and Experimental Medicine, the Scripps Research Institute, La Jolla, CA 92037, USA
| |
Collapse
|
37
|
Fang Y. Compound annotation with real time cellular activity profiles to improve drug discovery. Expert Opin Drug Discov 2016; 11:269-80. [PMID: 26787137 DOI: 10.1517/17460441.2016.1143460] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
INTRODUCTION In the past decade, a range of innovative strategies have been developed to improve the productivity of pharmaceutical research and development. In particular, compound annotation, combined with informatics, has provided unprecedented opportunities for drug discovery. AREAS COVERED In this review, a literature search from 2000 to 2015 was conducted to provide an overview of the compound annotation approaches currently used in drug discovery. Based on this, a framework related to a compound annotation approach using real-time cellular activity profiles for probe, drug, and biology discovery is proposed. EXPERT OPINION Compound annotation with chemical structure, drug-like properties, bioactivities, genome-wide effects, clinical phenotypes, and textural abstracts has received significant attention in early drug discovery. However, these annotations are mostly associated with endpoint results. Advances in assay techniques have made it possible to obtain real-time cellular activity profiles of drug molecules under different phenotypes, so it is possible to generate compound annotation with real-time cellular activity profiles. Combining compound annotation with informatics, such as similarity analysis, presents a good opportunity to improve the rate of discovery of novel drugs and probes, and enhance our understanding of the underlying biology.
Collapse
Affiliation(s)
- Ye Fang
- a Biochemical Technologies, Science and Technology Division , Corning Incorporated , Corning , NY , USA
| |
Collapse
|
38
|
Drug-symptom networking: Linking drug-likeness screening to drug discovery. Pharmacol Res 2015; 103:105-13. [PMID: 26615785 DOI: 10.1016/j.phrs.2015.11.015] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/24/2015] [Revised: 10/26/2015] [Accepted: 11/11/2015] [Indexed: 01/19/2023]
Abstract
Understanding the relationships between drugs and symptoms has broad medical consequences, yet a comprehensive description of the drug-symptom associations is currently lacking. Here, 1441 FDA-approved drugs were collected, and PCA was used to extract 122 descriptors which explained 91% of the variance. Then, a k-means++ method was employed to partition the drug dataset into 3 clusters, and 3 corresponding SVDD models (drug-likeness screening models) were constructed with an overall accuracy of up to 95.6%. Furthermore, 6878 herbal molecules from the TcmSP™ database were screened by the above 3 SVDD model to obtain 5309 candidate drug molecules with highly accept classification of 77.19%. To assess the accuracy of the SVDD models, 8559 herbal molecule-symptom co-occurrences were mined from Pubmed abstracts, involving 697 herbal molecules and 314 symptoms. Most of the 697 herbal molecules could be found in the accepted SVDD data (5309 molecules), showing the potential of the SVDD for the screening of drug candidates. Moreover, a herbal molecule-herbal molecule network and a herbal molecule-symptom were constructed. Overall, the results provided a new drug-likeness screening approach independent to abnormal training data, and the comprehensive collection of herbal molecule-symptom associations formed a new data resource for systematic characterization of the symptom-oriented medicines.
Collapse
|
39
|
Cheng L, Li J, Hu Y, Jiang Y, Liu Y, Chu Y, Wang Z, Wang Y. Using Semantic Association to Extend and Infer Literature-Oriented Relativity Between Terms. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:1219-1226. [PMID: 26684460 DOI: 10.1109/tcbb.2015.2430289] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Relative terms often appear together in the literature. Methods have been presented for weighting relativity of pairwise terms by their co-occurring literature and inferring new relationship. Terms in the literature are also in the directed acyclic graph of ontologies, such as Gene Ontology and Disease Ontology. Therefore, semantic association between terms may help for establishing relativities between terms in literature. However, current methods do not use these associations. In this paper, an adjusted R-scaled score (ARSS) based on information content (ARSSIC) method is introduced to infer new relationship between terms. First, set inclusion relationship between terms of ontology was exploited to extend relationships between these terms and literature. Next, the ARSS method was presented to measure relativity between terms across ontologies according to these extensional relationships. Then, the ARSSIC method using ratios of information shared of term's ancestors was designed to infer new relationship between terms across ontologies. The result of the experiment shows that ARSS identified more pairs of statistically significant terms based on corresponding gene sets than other methods. And the high average area under the receiver operating characteristic curve (0.9293) shows that ARSSIC achieved a high true positive rate and a low false positive rate. Data is available at http://mlg.hit.edu.cn/ARSSIC/.
Collapse
|
40
|
Song M, Heo GE, Ding Y. SemPathFinder: Semantic path analysis for discovering publicly unknown knowledge. J Informetr 2015. [DOI: 10.1016/j.joi.2015.06.004] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
41
|
Gonzalez GH, Tahsin T, Goodale BC, Greene AC, Greene CS. Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery. Brief Bioinform 2015; 17:33-42. [PMID: 26420781 PMCID: PMC4719073 DOI: 10.1093/bib/bbv087] [Citation(s) in RCA: 103] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Indexed: 02/06/2023] Open
Abstract
Precision medicine will revolutionize the way we treat and prevent disease. A major barrier to the implementation of precision medicine that clinicians and translational scientists face is understanding the underlying mechanisms of disease. We are starting to address this challenge through automatic approaches for information extraction, representation and analysis. Recent advances in text and data mining have been applied to a broad spectrum of key biomedical questions in genomics, pharmacogenomics and other fields. We present an overview of the fundamental methods for text and data mining, as well as recent advances and emerging applications toward precision medicine.
Collapse
|
42
|
Weissenborn D, Schroeder M, Tsatsaronis G. Discovering relations between indirectly connected biomedical concepts. J Biomed Semantics 2015; 6:28. [PMID: 26150906 PMCID: PMC4492092 DOI: 10.1186/s13326-015-0021-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2014] [Accepted: 04/17/2015] [Indexed: 11/10/2022] Open
Abstract
Background The complexity and scale of the knowledge in the biomedical domain has motivated research work towards mining heterogeneous data from both structured and unstructured knowledge bases. Towards this direction, it is necessary to combine facts in order to formulate hypotheses or draw conclusions about the domain concepts. This work addresses this problem by using indirect knowledge connecting two concepts in a knowledge graph to discover hidden relations between them. The graph represents concepts as vertices and relations as edges, stemming from structured (ontologies) and unstructured (textual) data. In this graph, path patterns, i.e. sequences of relations, are mined using distant supervision that potentially characterize a biomedical relation. Results It is possible to identify characteristic path patterns of biomedical relations from this representation using machine learning. For experimental evaluation two frequent biomedical relations, namely “has target”, and “may treat”, are chosen. Results suggest that relation discovery using indirect knowledge is possible, with an AUC that can reach up to 0.8, a result which is a great improvement compared to the random classification, and which shows that good predictions can be prioritized by following the suggested approach. Conclusions Analysis of the results indicates that the models can successfully learn expressive path patterns for the examined relations. Furthermore, this work demonstrates that the constructed graph allows for the easy integration of heterogeneous information and discovery of indirect connections between biomedical concepts. Electronic supplementary material The online version of this article (doi:10.1186/s13326-015-0021-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Dirk Weissenborn
- DFKI Projektbüro Berlin, Alt-Moabit 91c, Berlin, 10559 Germany ; Biotechnology Center, Technische Universität Dresden, Tatzberg 47/49, Dresden, 01307 Germany
| | - Michael Schroeder
- Biotechnology Center, Technische Universität Dresden, Tatzberg 47/49, Dresden, 01307 Germany
| | - George Tsatsaronis
- Biotechnology Center, Technische Universität Dresden, Tatzberg 47/49, Dresden, 01307 Germany
| |
Collapse
|
43
|
Yan E, Zhu Y. Identifying entities from scientific publications: A comparison of vocabulary- and model-based methods. J Informetr 2015. [DOI: 10.1016/j.joi.2015.04.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
44
|
Rüping S. Big Data in Medizin und Gesundheitswesen. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 2015; 58:794-798. [DOI: 10.1007/s00103-015-2181-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
45
|
Cameron D, Kavuluru R, Rindflesch TC, Sheth AP, Thirunarayan K, Bodenreider O. Context-driven automatic subgraph creation for literature-based discovery. J Biomed Inform 2015; 54:141-57. [PMID: 25661592 PMCID: PMC4888806 DOI: 10.1016/j.jbi.2015.01.014] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2014] [Revised: 01/21/2015] [Accepted: 01/25/2015] [Indexed: 01/29/2023]
Abstract
BACKGROUND Literature-based discovery (LBD) is characterized by uncovering hidden associations in non-interacting scientific literature. Prior approaches to LBD include use of: (1) domain expertise and structured background knowledge to manually filter and explore the literature, (2) distributional statistics and graph-theoretic measures to rank interesting connections, and (3) heuristics to help eliminate spurious connections. However, manual approaches to LBD are not scalable and purely distributional approaches may not be sufficient to obtain insights into the meaning of poorly understood associations. While several graph-based approaches have the potential to elucidate associations, their effectiveness has not been fully demonstrated. A considerable degree of a priori knowledge, heuristics, and manual filtering is still required. OBJECTIVES In this paper we implement and evaluate a context-driven, automatic subgraph creation method that captures multifaceted complex associations between biomedical concepts to facilitate LBD. Given a pair of concepts, our method automatically generates a ranked list of subgraphs, which provide informative and potentially unknown associations between such concepts. METHODS To generate subgraphs, the set of all MEDLINE articles that contain either of the two specified concepts (A, C) are first collected. Then binary relationships or assertions, which are automatically extracted from the MEDLINE articles, called semantic predications, are used to create a labeled directed predications graph. In this predications graph, a path is represented as a sequence of semantic predications. The hierarchical agglomerative clustering (HAC) algorithm is then applied to cluster paths that are bounded by the two concepts (A, C). HAC relies on implicit semantics captured through Medical Subject Heading (MeSH) descriptors, and explicit semantics from the MeSH hierarchy, for clustering. Paths that exceed a threshold of semantic relatedness are clustered into subgraphs based on their shared context. Finally, the automatically generated clusters are provided as a ranked list of subgraphs. RESULTS The subgraphs generated using this approach facilitated the rediscovery of 8 out of 9 existing scientific discoveries. In particular, they directly (or indirectly) led to the recovery of several intermediates (or B-concepts) between A- and C-terms, while also providing insights into the meaning of the associations. Such meaning is derived from predicates between the concepts, as well as the provenance of the semantic predications in MEDLINE. Additionally, by generating subgraphs on different thematic dimensions (such as Cellular Activity, Pharmaceutical Treatment and Tissue Function), the approach may enable a broader understanding of the nature of complex associations between concepts. Finally, in a statistical evaluation to determine the interestingness of the subgraphs, it was observed that an arbitrary association is mentioned in only approximately 4 articles in MEDLINE on average. CONCLUSION These results suggest that leveraging the implicit and explicit semantics provided by manually assigned MeSH descriptors is an effective representation for capturing the underlying context of complex associations, along multiple thematic dimensions in LBD situations.
Collapse
Affiliation(s)
- Delroy Cameron
- Ohio Center of Excellence in Knowledge-Enabled Computing (Kno.e.sis), Wright State University, Dayton, OH 45435, USA.
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics, University of Kentucky, Lexington, KY 40506, USA
| | | | - Amit P Sheth
- Ohio Center of Excellence in Knowledge-Enabled Computing (Kno.e.sis), Wright State University, Dayton, OH 45435, USA
| | - Krishnaprasad Thirunarayan
- Ohio Center of Excellence in Knowledge-Enabled Computing (Kno.e.sis), Wright State University, Dayton, OH 45435, USA
| | | |
Collapse
|
46
|
Bellera CL, Balcazar DE, Vanrell MC, Casassa AF, Palestro PH, Gavernet L, Labriola CA, Gálvez J, Bruno-Blanch LE, Romano PS, Carrillo C, Talevi A. Computer-guided drug repurposing: Identification of trypanocidal activity of clofazimine, benidipine and saquinavir. Eur J Med Chem 2015; 93:338-48. [DOI: 10.1016/j.ejmech.2015.01.065] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2014] [Revised: 12/29/2014] [Accepted: 01/28/2015] [Indexed: 01/31/2023]
|
47
|
|
48
|
Luo J, Liang S. Prioritization of potential candidate disease genes by topological similarity of protein–protein interaction network and phenotype data. J Biomed Inform 2015; 53:229-36. [DOI: 10.1016/j.jbi.2014.11.004] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2014] [Revised: 10/31/2014] [Accepted: 11/07/2014] [Indexed: 11/28/2022]
|
49
|
Application of text mining in the biomedical domain. Methods 2015; 74:97-106. [PMID: 25641519 DOI: 10.1016/j.ymeth.2015.01.015] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2014] [Revised: 01/21/2015] [Accepted: 01/23/2015] [Indexed: 12/12/2022] Open
Abstract
In recent years the amount of experimental data that is produced in biomedical research and the number of papers that are being published in this field have grown rapidly. In order to keep up to date with developments in their field of interest and to interpret the outcome of experiments in light of all available literature, researchers turn more and more to the use of automated literature mining. As a consequence, text mining tools have evolved considerably in number and quality and nowadays can be used to address a variety of research questions ranging from de novo drug target discovery to enhanced biological interpretation of the results from high throughput experiments. In this paper we introduce the most important techniques that are used for a text mining and give an overview of the text mining tools that are currently being used and the type of problems they are typically applied for.
Collapse
|
50
|
Xie B, Ding Q, Wu D. Text Mining on Big and Complex Biomedical Literature. BIG DATA ANALYTICS IN BIOINFORMATICS AND HEALTHCARE 2015. [DOI: 10.4018/978-1-4666-6611-5.ch002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Driven by the rapidly advancing techniques and increasing interests in biology and medicine, about 2,000 to 4,000 references are added daily to MEDLINE, the US national biomedical bibliographic database. Even for a specific research topic, extracting useful and comprehensive information out of the huge literature data pool is challenging. Text mining techniques become extremely useful when dealing with the abundant biomedical information and they have been applied to various areas in the realm of biomedical research. Instead of providing a brief overview of all text mining techniques and every major biomedical text mining application, this chapter explores in-depth the microRNA profiling area and related text mining tools. As an illustrative example, one rule-based text mining system developed by the authors is discussed in detail. This chapter also includes the discussion of the challenges and potential research areas in biomedical text mining.
Collapse
|