51
|
Li J, Le TD, Liu L, Liu J, Jin Z, Sun B, Ma S. From Observational Studies to Causal Rule Mining. ACM T INTEL SYST TEC 2016. [DOI: 10.1145/2746410] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Randomised controlled trials (RCTs) are the most effective approach to causal discovery, but in many circumstances it is impossible to conduct RCTs. Therefore, observational studies based on passively observed data are widely accepted as an alternative to RCTs. However, in observational studies, prior knowledge is required to generate the hypotheses about the cause-effect relationships to be tested, and hence they can only be applied to problems with available domain knowledge and a handful of variables. In practice, many datasets are of high dimensionality, which leaves observational studies out of the opportunities for causal discovery from such a wealth of data sources. In another direction, many efficient data mining methods have been developed to identify associations among variables in large datasets. The problem is that causal relationships imply associations, but the reverse is not always true. However, we can see the synergy between the two paradigms here. Specifically, association rule mining can be used to deal with the high-dimensionality problem, whereas observational studies can be utilised to eliminate noncausal associations. In this article, we propose the concept of causal rules (CRs) and develop an algorithm for mining CRs in large datasets. We use the idea of retrospective cohort studies to detect CRs based on the results of association rule mining. Experiments with both synthetic and real-world datasets have demonstrated the effectiveness and efficiency of CR mining. In comparison with the commonly used causal discovery methods, the proposed approach generally is faster and has better or competitive performance in finding correct or sensible causes. It is also capable of finding a cause consisting of multiple variables—a feature that other causal discovery methods do not possess.
Collapse
Affiliation(s)
- Jiuyong Li
- University of South Australia, Mawson Lakes, SA, Australia
| | - Thuc Duy Le
- University of South Australia, Mawson Lakes, SA, Australia
| | - Lin Liu
- University of South Australia, Mawson Lakes, SA, Australia
| | - Jixue Liu
- University of South Australia, Mawson Lakes, SA, Australia
| | - Zhou Jin
- University of Science and Technology China, Hefei, China
| | - Bingyu Sun
- Chinese Academy of Sciences, Hefei, China
| | - Saisai Ma
- University of South Australia, Mawson Lakes, SA, Australia
| |
Collapse
|
52
|
Altman R, Lim S, Steen RG, Dasa V. Hyaluronic Acid Injections Are Associated with Delay of Total Knee Replacement Surgery in Patients with Knee Osteoarthritis: Evidence from a Large U.S. Health Claims Database. PLoS One 2015; 10:e0145776. [PMID: 26694145 PMCID: PMC4687851 DOI: 10.1371/journal.pone.0145776] [Citation(s) in RCA: 97] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2015] [Accepted: 12/08/2015] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND The growing prevalence of osteoarthritis (OA) and the medical costs associated with total knee replacement (TKR) surgery for end-stage OA motivate a search for agents that can delay OA progression. We test a hypothesis that hyaluronic acid (HA) injection is associated with delay of TKR in a dose-dependent manner. METHODS AND FINDINGS We retrospectively evaluated records in an administrative claims database of ~79 million patients, to identify all patients with knee OA who received TKR during a 6-year period. Only patients with continuous plan enrollment from diagnosis until TKR were included, so that complete medical records were available. OA diagnosis was the index event and we evaluated time-to-TKR as a function of the number of HA injections. The database included 182,022 patients with knee OA who had TKR; 50,349 (27.7%) of these patients were classified as HA Users, receiving ≥1 courses of HA prior to TKR, while 131,673 patients (72.3%) were HA Non-users prior to TKR, receiving no HA. Cox proportional hazards modelling shows that TKR risk decreases as a function of the number of HA injection courses, if patient age, gender, and disease comorbidity are used as background covariates. Multiple HA injections are therefore associated with delay of TKR (all, P < 0.0001). Half of HA Non-users had a TKR by 114 days post-diagnosis of knee OA, whereas half of HA Users had a TKR by 484 days post-diagnosis (χ2 = 19,769; p < 0.0001). Patients who received no HA had a mean time-to-TKR of 0.7 years; with one course of HA, the mean time to TKR was 1.4 years (χ2 = 13,725; p < 0.0001); patients who received ≥5 courses delayed TKR by 3.6 years (χ2 = 19,935; p < 0.0001). CONCLUSIONS HA injection in patients with knee OA is associated with a dose-dependent increase in time-to-TKR.
Collapse
Affiliation(s)
- Roy Altman
- Department of Rheumatology, University of California Los Angeles, Los Angeles, California, United States of America
| | - Sooyeol Lim
- North American Business Unit, Seikagaku Corporation, Tokyo, Japan
| | - R. Grant Steen
- Department of Medical Affairs, Bioventus LLC, Durham, NC, United States of America
| | - Vinod Dasa
- Department of Orthopaedics, Louisiana State University Health Sciences Center, New Orleans, Louisiana, United States of America
| |
Collapse
|
53
|
Rahman SA, Huang Y, Claassen J, Heintzman N, Kleinberg S. Combining Fourier and lagged k-nearest neighbor imputation for biomedical time series data. J Biomed Inform 2015; 58:198-207. [PMID: 26477633 DOI: 10.1016/j.jbi.2015.10.004] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2015] [Revised: 09/29/2015] [Accepted: 10/05/2015] [Indexed: 01/23/2023]
Abstract
Most clinical and biomedical data contain missing values. A patient's record may be split across multiple institutions, devices may fail, and sensors may not be worn at all times. While these missing values are often ignored, this can lead to bias and error when the data are mined. Further, the data are not simply missing at random. Instead the measurement of a variable such as blood glucose may depend on its prior values as well as that of other variables. These dependencies exist across time as well, but current methods have yet to incorporate these temporal relationships as well as multiple types of missingness. To address this, we propose an imputation method (FLk-NN) that incorporates time lagged correlations both within and across variables by combining two imputation methods, based on an extension to k-NN and the Fourier transform. This enables imputation of missing values even when all data at a time point is missing and when there are different types of missingness both within and across variables. In comparison to other approaches on three biological datasets (simulated and actual Type 1 diabetes datasets, and multi-modality neurological ICU monitoring) the proposed method has the highest imputation accuracy. This was true for up to half the data being missing and when consecutive missing values are a significant fraction of the overall time series length.
Collapse
Affiliation(s)
- Shah Atiqur Rahman
- Department of Computer Science, Stevens Institute of Technology, NJ, United States.
| | - Yuxiao Huang
- Department of Computer Science, Stevens Institute of Technology, NJ, United States.
| | - Jan Claassen
- Division of Critical Care Neurology, Department of Neurology, Columbia University, College of Physicians and Surgeons, New York, NY, United States.
| | | | - Samantha Kleinberg
- Department of Computer Science, Stevens Institute of Technology, NJ, United States.
| |
Collapse
|
54
|
Brodersen KH, Gallusser F, Koehler J, Remy N, Scott SL. Inferring causal impact using Bayesian structural time-series models. Ann Appl Stat 2015. [DOI: 10.1214/14-aoas788] [Citation(s) in RCA: 328] [Impact Index Per Article: 32.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
55
|
Dammann O, Gray P, Gressens P, Wolkenhauer O, Leviton A. Systems Epidemiology: What's in a Name? Online J Public Health Inform 2014; 6:e198. [PMID: 25598870 PMCID: PMC4292535 DOI: 10.5210/ojphi.v6i3.5571] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Systems biology is an interdisciplinary effort to integrate molecular, cellular, tissue, organ, and organism levels of function into computational models that facilitate the identification of general principles. Systems medicine adds a disease focus. Systems epidemiology adds yet another level consisting of antecedents that might contribute to the disease process in populations. In etiologic and prevention research, systems-type thinking about multiple levels of causation will allow epidemiologists to identify contributors to disease at multiple levels as well as their interactions. In public health, systems epidemiology will contribute to the improvement of syndromic surveillance methods. We encourage the creation of computational simulation models that integrate information about disease etiology, pathogenetic data, and the expertise of investigators from different disciplines.
Collapse
Affiliation(s)
- O. Dammann
- Dept of Public Health and Community Medicine, Tufts
University School of Medicine, Boston, MA
- Perinatal Epidemiology Unit, Dept. of Gynecology and
Obstetrics, Hannover Medical School, Hannover, Germany
| | - P. Gray
- Dept of Public Health and Community Medicine, Tufts
University School of Medicine, Boston, MA
| | - P. Gressens
- Inserm, U676, Paris, France
- Department of Perinatal Imaging and Health,
Department of Division of Imaging Sciences and Biomedical Engineering,
King’s College London, King’s Health Partners, St. Thomas’
Hospital, London, United Kingdom
| | - O. Wolkenhauer
- Department of Systems Biology and Bioinformatics,
University of Rostock, Rostock, Germany
- Stellenbosch Institute for Advanced Study (STIAS),
Stellenbosch, South Africa
| | - A. Leviton
- Neuroepidemiology Unit, Children’s Hospital,
Boston, MA
| |
Collapse
|
56
|
Mihăilă C, Ananiadou S. Semi-supervised learning of causal relations in biomedical scientific discourse. Biomed Eng Online 2014; 13 Suppl 2:S1. [PMID: 25559746 PMCID: PMC4304242 DOI: 10.1186/1475-925x-13-s2-s1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Background The increasing number of daily published articles in the biomedical domain has become too large for humans to handle on their own. As a result, bio-text mining technologies have been developed to improve their workload by automatically analysing the text and extracting important knowledge. Specific bio-entities, bio-events between these and facts can now be recognised with sufficient accuracy and are widely used by biomedical researchers. However, understanding how the extracted facts are connected in text is an extremely difficult task, which cannot be easily tackled by machinery. Results In this article, we describe our method to recognise causal triggers and their arguments in biomedical scientific discourse. We introduce new features and show that a self-learning approach improves the performance obtained by supervised machine learners to 83.47% for causal triggers. Furthermore, the spans of causal arguments can be recognised to a slightly higher level that by using supervised or rule-based methods that have been employed before. Conclusion Exploiting the large amount of unlabelled data that is already available can help improve the performance of recognising causal discourse relations in the biomedical domain. This improvement will further benefit the development of multiple tasks, such as hypothesis generation for experimental laboratories, contradiction detection, and the creation of causal networks.
Collapse
|
57
|
Korsunsky I, McGovern K, LaGatta T, Olde Loohuis L, Grosso-Applewhite T, Griffeth N, Mishra B. Systems biology of cancer: a challenging expedition for clinical and quantitative biologists. Front Bioeng Biotechnol 2014; 2:27. [PMID: 25191654 PMCID: PMC4137540 DOI: 10.3389/fbioe.2014.00027] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2014] [Accepted: 07/18/2014] [Indexed: 11/25/2022] Open
Abstract
A systems-biology approach to complex disease (such as cancer) is now complementing traditional experience-based approaches, which have typically been invasive and expensive. The rapid progress in biomedical knowledge is enabling the targeting of disease with therapies that are precise, proactive, preventive, and personalized. In this paper, we summarize and classify models of systems biology and model checking tools, which have been used to great success in computational biology and related fields. We demonstrate how these models and tools have been used to study some of the twelve biochemical pathways implicated in but not unique to pancreatic cancer, and conclude that the resulting mechanistic models will need to be further enhanced by various abstraction techniques to interpret phenomenological models of cancer progression.
Collapse
Affiliation(s)
- Ilya Korsunsky
- Department of Computer Science, Courant Institute, New York University, New York, NY, USA
| | - Kathleen McGovern
- Department of Mathematics and Statistics, Hunter College, City University of New York, New York, NY, USA
| | - Tom LaGatta
- Department of Mathematics, Courant Institute, New York University, New York, NY, USA
| | - Loes Olde Loohuis
- Department of Computer Science, The Graduate Center, City University of New York, New York, NY, USA
| | - Terri Grosso-Applewhite
- Department of Computer Science, The Graduate Center, City University of New York, New York, NY, USA
| | - Nancy Griffeth
- Department of Mathematics and Computer Science, Lehman College, City University of New York, New York, NY, USA
| | - Bud Mishra
- Department of Computer Science, Courant Institute, New York University, New York, NY, USA
- Department of Mathematics, Courant Institute, New York University, New York, NY, USA
| |
Collapse
|
58
|
Shang N, Xu H, Rindflesch TC, Cohen T. Identifying plausible adverse drug reactions using knowledge extracted from the literature. J Biomed Inform 2014; 52:293-310. [PMID: 25046831 DOI: 10.1016/j.jbi.2014.07.011] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2014] [Revised: 06/06/2014] [Accepted: 07/10/2014] [Indexed: 01/08/2023]
Abstract
Pharmacovigilance involves continually monitoring drug safety after drugs are put to market. To aid this process; algorithms for the identification of strongly correlated drug/adverse drug reaction (ADR) pairs from data sources such as adverse event reporting systems or Electronic Health Records have been developed. These methods are generally statistical in nature, and do not draw upon the large volumes of knowledge embedded in the biomedical literature. In this paper, we investigate the ability of scalable Literature Based Discovery (LBD) methods to identify side effects of pharmaceutical agents. The advantage of LBD methods is that they can provide evidence from the literature to support the plausibility of a drug/ADR association, thereby assisting human review to validate the signal, which is an essential component of pharmacovigilance. To do so, we draw upon vast repositories of knowledge that has been extracted from the biomedical literature by two Natural Language Processing tools, MetaMap and SemRep. We evaluate two LBD methods that scale comfortably to the volume of knowledge available in these repositories. Specifically, we evaluate Reflective Random Indexing (RRI), a model based on concept-level co-occurrence, and Predication-based Semantic Indexing (PSI), a model that encodes the nature of the relationship between concepts to support reasoning analogically about drug-effect relationships. An evaluation set was constructed from the Side Effect Resource 2 (SIDER2), which contains known drug/ADR relations, and models were evaluated for their ability to "rediscover" these relations. In this paper, we demonstrate that both RRI and PSI can recover known drug-adverse event associations. However, PSI performed better overall, and has the additional advantage of being able to recover the literature underlying the reasoning pathways it used to make its predictions.
Collapse
Affiliation(s)
- Ning Shang
- School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, United States.
| | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, United States
| | | | - Trevor Cohen
- School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, United States
| |
Collapse
|
59
|
Singleton KW, Bui AAT, Hsu W. Transfer and transport: incorporating causal methods for improving predictive models. J Am Med Inform Assoc 2014; 21:e374-5. [PMID: 25008007 DOI: 10.1136/amiajnl-2014-002968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Affiliation(s)
- Kyle W Singleton
- Department of Bioengineering, University of California, Los Angeles, USA Department of Radiological Sciences, Medical Imaging Informatics, University of California, Los Angeles, USA
| | - Alex A T Bui
- Department of Bioengineering, University of California, Los Angeles, USA Department of Radiological Sciences, Medical Imaging Informatics, University of California, Los Angeles, USA
| | - William Hsu
- Department of Bioengineering, University of California, Los Angeles, USA Department of Radiological Sciences, Medical Imaging Informatics, University of California, Los Angeles, USA
| |
Collapse
|
60
|
Uhlig BL, Engstrøm M, Ødegård SS, Hagen KK, Sand T. Headache and insomnia in population-based epidemiological studies. Cephalalgia 2014; 34:745-51. [PMID: 24973418 DOI: 10.1177/0333102414540058] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
BACKGROUND Several epidemiological studies on the association between primary headaches and insomnia have been published in recent years. Both disorders are frequent, and our purpose was to review results from population-based studies exploring this association. METHODS We performed a literature search in PubMed for "insomnia" (or sleep disturbance) and "headache" (or migraine) linked with "epidemiology." Two hundred and eight records were identified. Three longitudinal and 10 cross-sectional studies met our inclusion criteria: population-based design with at least 200 participants including a numerical estimate of the association between headache and insomnia. RESULTS AND CONCLUSIONS In nearly all studies, primary headaches, including migraine and tension-type headache, were significantly related to insomnia symptoms with OR estimates ranging from 1.4 to 1.7. The odds were even greater, from 2.0 to 2.6, for frequent, comorbid or severe headache. Recent large longitudinal studies from Norway found a bidirectional, possibly causal, association between headache and insomnia. However, not all studies used standardized diagnostic criteria for either headache or insomnia. Further research should use well defined and validated diagnostic criteria both for insomnia and headache types in order to improve the comparability between studies, investigate causality and clarify the relevance of the findings for clinical practice.
Collapse
Affiliation(s)
- B L Uhlig
- Department of Neuroscience, Norwegian University of Science and Technology, Norway
| | - M Engstrøm
- Department of Neuroscience, Norwegian University of Science and Technology, Norway Department of Neurology and Clinical Neurophysiology, St. Olavs Hospital, Norway
| | - S S Ødegård
- Department of Neuroscience, Norwegian University of Science and Technology, Norway
| | - K K Hagen
- Department of Neuroscience, Norwegian University of Science and Technology, Norway Department of Neurology and Clinical Neurophysiology, St. Olavs Hospital, Norway Norwegian National Headache Centre, St. Olavs Hospital, Norway
| | - T Sand
- Department of Neuroscience, Norwegian University of Science and Technology, Norway Department of Neurology and Clinical Neurophysiology, St. Olavs Hospital, Norway
| |
Collapse
|
61
|
Mihăilă C, Ananiadou S. Recognising discourse causality triggers in the biomedical domain. J Bioinform Comput Biol 2013; 11:1343008. [PMID: 24372037 DOI: 10.1142/s0219720013430087] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Current domain-specific information extraction systems represent an important resource for biomedical researchers, who need to process vast amounts of knowledge in a short time. Automatic discourse causality recognition can further reduce their workload by suggesting possible causal connections and aiding in the curation of pathway models. We describe here an approach to the automatic identification of discourse causality triggers in the biomedical domain using machine learning. We create several baselines and experiment with and compare various parameter settings for three algorithms, i.e. Conditional Random Fields (CRF), Support Vector Machines (SVM) and Random Forests (RF). We also evaluate the impact of lexical, syntactic, and semantic features on each of the algorithms, showing that semantics improves the performance in all cases. We test our comprehensive feature set on two corpora containing gold standard annotations of causal relations, and demonstrate the need for more gold standard data. The best performance of 79.35% F-score is achieved by CRFs when using all three feature types.
Collapse
Affiliation(s)
- Claudiu Mihăilă
- The National Centre for Text Mining, School of Computer Science, The University of Manchester, 131 Princess Street, Manchester M1 7DN, United Kingdom
| | | |
Collapse
|
62
|
Abstract
Immune thrombocytopenia can have several causes including the use of certain drugs. Thrombocytopenia has been documented as a rare adverse effect of some nonsteroidal antiinflammatory drugs (NSAIDs) including diclofenac, naproxen, and ibuprofen. However, only one previously documented case of meloxicam-associated thrombocytopenia has been reported in the literature. We describe an 84-year-old woman who developed a case of immune-mediated thrombocytopenia that was attributed to meloxicam therapy. The patient's platelet count decreased from a baseline of 267 × 10(3) /mm(3) to 2 × 10(3) /mm(3) 1 week after she received her first lifetime dose of meloxicam. She also experienced black stools and bruising that coincided with the meloxicam administration. The almost immediate onset of thrombocytopenia and symptoms after initiation of meloxicam, as well as the marked reduction in her platelet count, suggest an idiosyncratic reaction. According to the Hill criteria for assessing causality of adverse drug events, it is plausible that this reaction was due to meloxicam. Health care providers should be aware of the possibility of thrombocytopenia secondary to NSAID therapy including meloxicam. Immune thrombocytopenia can be life threatening if it is not identified and treated promptly. A thorough medication history is particularly important when patients present with unusual symptoms, with a focus on those drugs that have been recently initiated. Although thrombocytopenia is a rare adverse effect of NSAID therapy, it should be considered a potential cause in patients receiving these drugs who have signs and symptoms consistent with this blood dyscrasia.
Collapse
Affiliation(s)
- Melissa M Ranieri
- Department of Pharmacy Practice, Temple University School of Pharmacy, Philadelphia, Pennsylvania
| | | | | |
Collapse
|
63
|
Melas IN, Kretsos K, Alexopoulos LG. Leveraging systems biology approaches in clinical pharmacology. Biopharm Drug Dispos 2013; 34:477-88. [PMID: 23983165 PMCID: PMC4034589 DOI: 10.1002/bdd.1859] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2013] [Accepted: 08/12/2013] [Indexed: 01/15/2023]
Abstract
Computational modeling has been adopted in all aspects of drug research and development, from the early phases of target identification and drug discovery to the late-stage clinical trials. The different questions addressed during each stage of drug R&D has led to the emergence of different modeling methodologies. In the research phase, systems biology couples experimental data with elaborate computational modeling techniques to capture lifecycle and effector cellular functions (e.g. metabolism, signaling, transcription regulation, protein synthesis and interaction) and integrates them in quantitative models. These models are subsequently used in various ways, i.e. to identify new targets, generate testable hypotheses, gain insights on the drug's mode of action (MOA), translate preclinical findings, and assess the potential of clinical drug efficacy and toxicity. In the development phase, pharmacokinetic/pharmacodynamic (PK/PD) modeling is the established way to determine safe and efficacious doses for testing at increasingly larger, and more pertinent to the target indication, cohorts of subjects. First, the relationship between drug input and its concentration in plasma is established. Second, the relationship between this concentration and desired or undesired PD responses is ascertained. Recognizing that the interface of systems biology with PK/PD will facilitate drug development, systems pharmacology came into existence, combining methods from PK/PD modeling and systems engineering explicitly to account for the implicated mechanisms of the target system in the study of drug–target interactions. Herein, a number of popular system biology methodologies are discussed, which could be leveraged within a systems pharmacology framework to address major issues in drug development.
Collapse
Affiliation(s)
- Ioannis N Melas
- National Technical University of Athens, Athens, Greece; Protatonce Ltd, Athens, Greece
| | | | | |
Collapse
|
64
|
Boland MR, Hripcsak G, Albers DJ, Wei Y, Wilcox AB, Wei J, Li J, Lin S, Breene M, Myers R, Zimmerman J, Papapanou PN, Weng C. Discovering medical conditions associated with periodontitis using linked electronic health records. J Clin Periodontol 2013; 40:474-82. [PMID: 23495669 DOI: 10.1111/jcpe.12086] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/20/2013] [Indexed: 12/12/2022]
Abstract
AIM To use linked electronic medical and dental records to discover associations between periodontitis and medical conditions independent of a priori hypotheses. MATERIALS AND METHODS This case-control study included 2475 patients who underwent dental treatment at the College of Dental Medicine at Columbia University and medical treatment at NewYork-Presbyterian Hospital. Our cases are patients who received periodontal treatment and our controls are patients who received dental maintenance but no periodontal treatment. Chi-square analysis was performed for medical treatment codes and logistic regression was used to adjust for confounders. RESULTS Our method replicated several important periodontitis associations in a largely Hispanic population, including diabetes mellitus type I (OR = 1.6, 95% CI 1.30-1.99, p < 0.001) and type II (OR = 1.4, 95% CI 1.22-1.67, p < 0.001), hypertension (OR = 1.2, 95% CI 1.10-1.37, p < 0.001), hypercholesterolaemia (OR = 1.2, 95% CI 1.07-1.38, p = 0.004), hyperlipidaemia (OR = 1.2, 95% CI 1.06-1.43, p = 0.008) and conditions pertaining to pregnancy and childbirth (OR = 2.9, 95% CI: 1.32-7.21, p = 0.014). We also found a previously unreported association with benign prostatic hyperplasia (OR = 1.5, 95% CI 1.05-2.10, p = 0.026) after adjusting for age, gender, ethnicity, hypertension, diabetes, obesity, lipid and circulatory system conditions, alcohol and tobacco abuse. CONCLUSIONS This study contributes a high-throughput method for associating periodontitis with systemic diseases using linked electronic records.
Collapse
Affiliation(s)
- Mary Regina Boland
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
65
|
Lindsay DeVane C. What Evidence is Required for Drug Exposure to be Causally Associated with Adverse Events? The Case for Case Reports Published inPharmacotherapy. Pharmacotherapy 2013; 33:115-7. [DOI: 10.1002/phar.1249] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
66
|
Mihăilă C, Ohta T, Pyysalo S, Ananiadou S. BioCause: Annotating and analysing causality in the biomedical domain. BMC Bioinformatics 2013; 14:2. [PMID: 23323613 PMCID: PMC3621543 DOI: 10.1186/1471-2105-14-2] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2012] [Accepted: 12/29/2012] [Indexed: 11/24/2022] Open
Abstract
Background Biomedical corpora annotated with event-level information represent an important resource for domain-specific information extraction (IE) systems. However, bio-event annotation alone cannot cater for all the needs of biologists. Unlike work on relation and event extraction, most of which focusses on specific events and named entities, we aim to build a comprehensive resource, covering all statements of causal association present in discourse. Causality lies at the heart of biomedical knowledge, such as diagnosis, pathology or systems biology, and, thus, automatic causality recognition can greatly reduce the human workload by suggesting possible causal connections and aiding in the curation of pathway models. A biomedical text corpus annotated with such relations is, hence, crucial for developing and evaluating biomedical text mining. Results We have defined an annotation scheme for enriching biomedical domain corpora with causality relations. This schema has subsequently been used to annotate 851 causal relations to form BioCause, a collection of 19 open-access full-text biomedical journal articles belonging to the subdomain of infectious diseases. These documents have been pre-annotated with named entity and event information in the context of previous shared tasks. We report an inter-annotator agreement rate of over 60% for triggers and of over 80% for arguments using an exact match constraint. These increase significantly using a relaxed match setting. Moreover, we analyse and describe the causality relations in BioCause from various points of view. This information can then be leveraged for the training of automatic causality detection systems. Conclusion Augmenting named entity and event annotations with information about causal discourse relations could benefit the development of more sophisticated IE systems. These will further influence the development of multiple tasks, such as enabling textual inference to detect entailments, discovering new facts and providing new hypotheses for experimental work.
Collapse
Affiliation(s)
- Claudiu Mihăilă
- The National Centre for Text Mining, School of Computer Science, The University of Manchester, 131 Princess Street, Manchester M1 7DN, UK.
| | | | | | | |
Collapse
|
67
|
Hoffman S, Podgurski A. The use and misuse of biomedical data: is bigger really better? AMERICAN JOURNAL OF LAW & MEDICINE 2013; 39:497-538. [PMID: 24494442 DOI: 10.1177/009885881303900401] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Very large biomedical research databases, containing electronic health records (EHR) and genomic data from millions of patients, have been heralded recently for their potential to accelerate scientific discovery and produce dramatic improvements in medical treatments. Research enabled by these databases may also lead to profound changes in law, regulation, social policy, and even litigation strategies. Yet, is "big data" necessarily better data? This paper makes an original contribution to the legal literature by focusing on what can go wrong in the process of biomedical database research and what precautions are necessary to avoid critical mistakes. We address three main reasons for approaching such research with care and being cautious in relying on its outcomes for purposes of public policy or litigation. First, the data contained in biomedical databases is surprisingly likely to be incorrect or incomplete. Second, systematic biases, arising from both the nature of the data and the preconceptions of investigators, are serious threats to the validity of research results, especially in answering causal questions. Third, data mining of biomedical databases makes it easier for individuals with political, social, or economic agendas to generate ostensibly scientific but misleading research findings for the purpose of manipulating public opinion and swaying policymakers. In short, this paper sheds much-needed light on the problems of credulous and uninformed acceptance of research results derived from biomedical databases. An understanding of the pitfalls of big data analysis is of critical importance to anyone who will rely on or dispute its outcomes, including lawyers, policymakers, and the public at large. The Article also recommends technical, methodological, and educational interventions to combat the dangers of database errors and abuses.
Collapse
Affiliation(s)
- Sharona Hoffman
- Law-Medicine Center, Case Western Reserve University School of Law, USA
| | | |
Collapse
|
68
|
Wu JL, Yu LC, Chang PC. Detecting causality from online psychiatric texts using inter-sentential language patterns. BMC Med Inform Decis Mak 2012; 12:72. [PMID: 22809317 PMCID: PMC3441867 DOI: 10.1186/1472-6947-12-72] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2012] [Accepted: 07/18/2012] [Indexed: 12/03/2022] Open
Abstract
Background Online psychiatric texts are natural language texts expressing depressive problems, published by Internet users via community-based web services such as web forums, message boards and blogs. Understanding the cause-effect relations embedded in these psychiatric texts can provide insight into the authors’ problems, thus increasing the effectiveness of online psychiatric services. Methods Previous studies have proposed the use of word pairs extracted from a set of sentence pairs to identify cause-effect relations between sentences. A word pair is made up of two words, with one coming from the cause text span and the other from the effect text span. Analysis of the relationship between these words can be used to capture individual word associations between cause and effect sentences. For instance, (broke up, life) and (boyfriend, meaningless) are two word pairs extracted from the sentence pair: “I broke up with my boyfriend. Life is now meaningless to me”. The major limitation of word pairs is that individual words in sentences usually cannot reflect the exact meaning of the cause and effect events, and thus may produce semantically incomplete word pairs, as the previous examples show. Therefore, this study proposes the use of inter-sentential language patterns such as ≪broke up, boyfriend>, <life, meaningless≫ to detect causality between sentences. The inter-sentential language patterns can capture associations among multiple words within and between sentences, thus can provide more precise information than word pairs. To acquire inter-sentential language patterns, we develop a text mining framework by extending the classical association rule mining algorithm such that it can discover frequently co-occurring patterns across the sentence boundary. Results Performance was evaluated on a corpus of texts collected from PsychPark (http://www.psychpark.org), a virtual psychiatric clinic maintained by a group of volunteer professionals from the Taiwan Association of Mental Health Informatics. Experimental results show that the use of inter-sentential language patterns outperformed the use of word pairs proposed in previous studies. Conclusions This study demonstrates the acquisition of inter-sentential language patterns for causality detection from online psychiatric texts. Such semantically more complete and precise features can improve causality detection performance.
Collapse
Affiliation(s)
- Jheng-Long Wu
- College of Informatics, Department of Information Management, Yuan Ze University, Chung-Li, Taiwan, Republic of China
| | | | | |
Collapse
|
69
|
Hahn U, Cohen KB, Garten Y, Shah NH. Mining the pharmacogenomics literature--a survey of the state of the art. Brief Bioinform 2012; 13:460-94. [PMID: 22833496 PMCID: PMC3404399 DOI: 10.1093/bib/bbs018] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2011] [Accepted: 03/23/2012] [Indexed: 01/05/2023] Open
Abstract
This article surveys efforts on text mining of the pharmacogenomics literature, mainly from the period 2008 to 2011. Pharmacogenomics (or pharmacogenetics) is the field that studies how human genetic variation impacts drug response. Therefore, publications span the intersection of research in genotypes, phenotypes and pharmacology, a topic that has increasingly become a focus of active research in recent years. This survey covers efforts dealing with the automatic recognition of relevant named entities (e.g. genes, gene variants and proteins, diseases and other pathological phenomena, drugs and other chemicals relevant for medical treatment), as well as various forms of relations between them. A wide range of text genres is considered, such as scientific publications (abstracts, as well as full texts), patent texts and clinical narratives. We also discuss infrastructure and resources needed for advanced text analytics, e.g. document corpora annotated with corresponding semantic metadata (gold standards and training data), biomedical terminologies and ontologies providing domain-specific background knowledge at different levels of formality and specificity, software architectures for building complex and scalable text analytics pipelines and Web services grounded to them, as well as comprehensive ways to disseminate and interact with the typically huge amounts of semiformal knowledge structures extracted by text mining tools. Finally, we consider some of the novel applications that have already been developed in the field of pharmacogenomic text mining and point out perspectives for future research.
Collapse
Affiliation(s)
- Udo Hahn
- Jena University Language and Information Engineering (JULIE) Lab, Friedrich-Schiller-Universität Jena, Jena, Germany.
| | | | | | | |
Collapse
|