1
|
Malec SA, Taneja SB, Albert SM, Elizabeth Shaaban C, Karim HT, Levine AS, Munro P, Callahan TJ, Boyce RD. Causal feature selection using a knowledge graph combining structured knowledge from the biomedical literature and ontologies: A use case studying depression as a risk factor for Alzheimer's disease. J Biomed Inform 2023; 142:104368. [PMID: 37086959 PMCID: PMC10355339 DOI: 10.1016/j.jbi.2023.104368] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 03/03/2023] [Accepted: 04/17/2023] [Indexed: 04/24/2023]
Abstract
BACKGROUND Causal feature selection is essential for estimating effects from observational data. Identifying confounders is a crucial step in this process. Traditionally, researchers employ content-matter expertise and literature review to identify confounders. Uncontrolled confounding from unidentified confounders threatens validity, conditioning on intermediate variables (mediators) weakens estimates, and conditioning on common effects (colliders) induces bias. Additionally, without special treatment, erroneous conditioning on variables combining roles introduces bias. However, the vast literature is growing exponentially, making it infeasible to assimilate this knowledge. To address these challenges, we introduce a novel knowledge graph (KG) application enabling causal feature selection by combining computable literature-derived knowledge with biomedical ontologies. We present a use case of our approach specifying a causal model for estimating the total causal effect of depression on the risk of developing Alzheimer's disease (AD) from observational data. METHODS We extracted computable knowledge from a literature corpus using three machine reading systems and inferred missing knowledge using logical closure operations. Using a KG framework, we mapped the output to target terminologies and combined it with ontology-grounded resources. We translated epidemiological definitions of confounder, collider, and mediator into queries for searching the KG and summarized the roles played by the identified variables. We compared the results with output from a complementary method and published observational studies and examined a selection of confounding and combined role variables in-depth. RESULTS Our search identified 128 confounders, including 58 phenotypes, 47 drugs, 35 genes, 23 collider, and 16 mediator phenotypes. However, only 31 of the 58 confounder phenotypes were found to behave exclusively as confounders, while the remaining 27 phenotypes played other roles. Obstructive sleep apnea emerged as a potential novel confounder for depression and AD. Anemia exemplified a variable playing combined roles. CONCLUSION Our findings suggest combining machine reading and KG could augment human expertise for causal feature selection. However, the complexity of causal feature selection for depression with AD highlights the need for standardized field-specific databases of causal variables. Further work is needed to optimize KG search and transform the output for human consumption.
Collapse
Affiliation(s)
- Scott A Malec
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Sanya B Taneja
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Steven M Albert
- Department of Behavioral and Community Health Sciences, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - C Elizabeth Shaaban
- Department of Epidemiology, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Helmet T Karim
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, USA; Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA, USA
| | - Arthur S Levine
- Department of Neurobiology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA; The Brain Institute, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Paul Munro
- School of Computing and Information, University of Pittsburgh, Pittsburgh, PA, USA
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Richard D Boyce
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA; Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
2
|
Addila AE, Azale T, Gete YK, Yitayal M. The effects of maternal alcohol consumption during pregnancy on adverse fetal outcomes among pregnant women attending antenatal care at public health facilities in Gondar town, Northwest Ethiopia: a prospective cohort study. Subst Abuse Treat Prev Policy 2021; 16:64. [PMID: 34446055 PMCID: PMC8390259 DOI: 10.1186/s13011-021-00401-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/05/2021] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND The teratogenic effect of fetal alcohol exposure may lead to actual and potential problems, instantly after birth, at infancy; or even later, and mental impairment in life. This study aimed to investigate the effects of maternal alcohol consumption during pregnancy on adverse fetal outcomes at Gondar town public health facilities, Northwest Ethiopia. METHODS A facility-based prospective cohort study was performed among 1778 pregnant women who were booked for antenatal care in selected public health facilities from 29 October 2019 to 7 May 2020 in Gondar town. We used a two-stage random sampling technique to recruit and include participants in the cohort. Data were collected using the Alcohol Use Disorders Identification Test - Consumption (AUDIT-C) standardized and pre-tested questionnaire. Multivariable analysis was performed to examine the association between reported prenatal alcohol exposure (non-hazardous and hazardous) and interested adverse birth outcomes using log-binomial regression modeling. The burden of outcomes was reported using the adjusted risk ratio and population-attributable risk (PAR). RESULTS A total of 1686 pregnant women were included in the analysis, which revealed that the incidences of low birth weight, preterm, and stillbirth were 12.63% (95% CI: 11.12, 14.31), 6.05% (95% CI: 5.00, 7.29) and 4.27% (95% CI: 3.4, 5.35), respectively. Non-hazardous and hazardous alcohol consumption during pregnancy was significantly associated with low birth weight (ARR = 1.50; 95% CI: 1.31, 1.98) and (ARR = 2.34; 95% CI: 1.66, 3.30), respectively. Hazardous alcohol consumption during pregnancy was also significantly associated with preterm birth (ARR = 2.06; 95% CI: 1.21, 3.52). The adjusted PAR of low birth weight related to non-hazardous and hazardous alcohol drinking during pregnancy was 11.72 and 8.44%, respectively. The adjusted PAR of hazardous alcohol consumption was 6.80% for preterm. CONCLUSIONS Our findings suggest that there is an increasing risk of adverse birth outcomes, particularly preterm delivery and low birth weight, with increasing levels of alcohol intake. This result showed that the prevention of maternal alcohol use during pregnancy has the potential to reduce low birth weight and preterm birth. Hence, screening women for alcohol use during antenatal care visits and providing advice with rigorous follow-up of women who used alcohol may save the fetus from the potential risks of adverse birth outcomes.
Collapse
Affiliation(s)
- Alemu Earsido Addila
- Department of Public Health, College of Medicine and Health Sciences, Wachemo University, Hossana, Ethiopia.
- Department of Epidemiology and Biostatistics, College of Medicine and Health Sciences, Institute of Public Health, University of Gondar, Gondar, Ethiopia.
| | - Telake Azale
- Department of Health Education and Behavioral Sciences, College of Medicine and Health Sciences, Institute of Public Health, University of Gondar, Gondar, Ethiopia
| | - Yigzaw Kebede Gete
- Department of Epidemiology and Biostatistics, College of Medicine and Health Sciences, Institute of Public Health, University of Gondar, Gondar, Ethiopia
| | - Mezgebu Yitayal
- Department of Health Systems and Policy, College of Medicine and Health Sciences, Institute of Public Health, University of Gondar, Gondar, Ethiopia
| |
Collapse
|
3
|
Blum MR, Tan YJ, Ioannidis JPA. Use of E-values for addressing confounding in observational studies-an empirical assessment of the literature. Int J Epidemiol 2021; 49:1482-1494. [PMID: 31930286 DOI: 10.1093/ije/dyz261] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Accepted: 12/06/2019] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND E-values are a recently introduced approach to evaluate confounding in observational studies. We aimed to empirically assess the current use of E-values in published literature. METHODS We conducted a systematic literature search for all publications, published up till the end of 2018, which cited at least one of two inceptive E-value papers and presented E-values for original data. For these case publications we identified control publications, matched by journal and issue, where the authors had not calculated E-values. RESULTS In total, 87 papers presented 516 E-values. Of the 87 papers, 14 concluded that residual confounding likely threatens at least some of the main conclusions. Seven of these 14 named potential uncontrolled confounders. 19 of 87 papers related E-value magnitudes to expected strengths of field-specific confounders. The median E-value was 1.88, 1.82, and 2.02 for the 43, 348, and 125 E-values where confounding was felt likely to affect the results, unlikely to affect the results, or not commented upon, respectively. The 69 case-control publication pairs dealt with effect sizes of similar magnitude. Of 69 control publications, 52 did not comment on unmeasured confounding and 44/69 case publications concluded that confounding was unlikely to affect study conclusions. CONCLUSIONS Few papers using E-values conclude that confounding threatens their results, and their E-values overlap in magnitude with those of papers acknowledging susceptibility to confounding. Facile automation in calculating E-values may compound the already poor handling of confounding. E-values should not be a substitute for careful consideration of potential sources of unmeasured confounding. If used, they should be interpreted in the context of expected confounding in specific fields.
Collapse
Affiliation(s)
- Manuel R Blum
- Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, USA.,Department of General Internal Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland.,Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA
| | - Yuan Jin Tan
- Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, USA.,Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA
| | - John P A Ioannidis
- Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, USA.,Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA.,Stanford Prevention Research Center, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA.,Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA.,Department of Statistics, Stanford University School of Humanities and Science, Stanford, CA, USA
| |
Collapse
|
4
|
Trinquart L, Galea S. TWO AUTHORS REPLY. Am J Epidemiol 2019; 188:1-2. [PMID: 31150042 DOI: 10.1093/aje/kwz129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Accepted: 05/15/2019] [Indexed: 11/13/2022] Open
Affiliation(s)
- Ludovic Trinquart
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA
| | - Sandro Galea
- Department of Epidemiology, School of Public Health, Boston University, Boston, MA
| |
Collapse
|
5
|
Ioannidis JPA. Unreformed nutritional epidemiology: a lamp post in the dark forest. Eur J Epidemiol 2019; 34:327-331. [DOI: 10.1007/s10654-019-00487-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
6
|
Ioannidis JPA, Tan YJ, Blum MR. Limitations and Misinterpretations of E-Values for Sensitivity Analyses of Observational Studies. Ann Intern Med 2019; 170:108-111. [PMID: 30597486 DOI: 10.7326/m18-2159] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The E-value was recently introduced on the basis of earlier work as "the minimum strength of association…that an unmeasured confounder would need to have with both the treatment and the outcome to fully explain away a specific treatment-outcome association, conditional on the measured covariates." E-values have been proposed for wide application in observational studies evaluating causality. However, they have limitations and are prone to misinterpretation. E-values have a monotonic, almost linear relationship with effect estimates and thus offer no additional information beyond what effect estimates can convey. Whereas effect estimates are based on real data, E-values may make unrealistic assumptions. No general rule can exist about what is a "small enough" E-value, and users of the biomedical literature are not familiar with how to interpret a range of E-values. Problems arise for any measure dependent on effect estimates and their CIs-for example, bias due to selective reporting and dependence on choice of exposure contrast and level of confidence. The automation of E-values may give an excuse not to think seriously about confounding. Moreover, biases other than confounding may still undermine results. Instead of misused or misinterpreted E-values, the authors recommend judicious use of existing methods for sensitivity analyses with careful assumptions; systematic assessments of whether and how known confounders have been handled, along with consideration of their prevalence and magnitude; thorough discussion of the potential for unknown confounders considering the study design and field of application; and explicit caution in making causal claims from observational studies.
Collapse
Affiliation(s)
- John P A Ioannidis
- Stanford Prevention Research Center, Stanford University School of Medicine, Stanford, California (J.P.I.)
| | - Yuan Jin Tan
- Meta-Research Innovation Center at Stanford (METRICS), Stanford University Stanford, California (Y.J.T.)
| | - Manuel R Blum
- Meta-Research Innovation Center at Stanford (METRICS), Stanford University Stanford, California; and Bern University Hospital, University of Bern, Bern, Switzerland (M.R.B.)
| |
Collapse
|
7
|
Koutkias V, Jaulent MC. A Multiagent System for Integrated Detection of Pharmacovigilance Signals. J Med Syst 2015; 40:37. [DOI: 10.1007/s10916-015-0378-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2015] [Accepted: 10/09/2015] [Indexed: 12/23/2022]
Affiliation(s)
- Vassilis Koutkias
- INSERM, U1142, LIMICS, 75006, Paris, France. .,Sorbonne Universités, UPMC University Paris 06, UMR_S 1142, LIMICS, 75006, Paris, France. .,Université Paris 13, Sorbonne Paris Cité, LIMICS, UMR_S 1142, 93430, Villetaneuse, France.
| | - Marie-Christine Jaulent
- INSERM, U1142, LIMICS, 75006, Paris, France. .,Sorbonne Universités, UPMC University Paris 06, UMR_S 1142, LIMICS, 75006, Paris, France. .,Université Paris 13, Sorbonne Paris Cité, LIMICS, UMR_S 1142, 93430, Villetaneuse, France.
| |
Collapse
|
8
|
Voss EA, Ma Q, Ryan PB. The impact of standardizing the definition of visits on the consistency of multi-database observational health research. BMC Med Res Methodol 2015; 15:13. [PMID: 25887092 PMCID: PMC4369827 DOI: 10.1186/s12874-015-0001-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2014] [Accepted: 01/26/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Use of administrative claims from multiple sources for research purposes is challenged by the lack of consistency in the structure of the underlying data and definition of data across claims data providers. This paper evaluates the impact of applying a standardized revenue code-based logic for defining inpatient encounters across two different claims databases. METHODS We selected members who had complete enrollment in 2012 from the Truven MarketScan Commercial Claims and Encounters (CCAE) and the Optum Clinformatics (Optum) databases. The overall prevalence of inpatient conditions in the raw data was compared to that in the common data model (CDM) with the standardized visit definition applied. RESULTS In CCAE, 87.18% of claims from 2012 that were classified as part of inpatient visits in the raw data were also classified as part of inpatient visits after the data were standardized to CDM, and this overlap was consistent from 2006 to 2011. In contrast, Optum had 83.18% concordance in classification of 2012 claims from inpatient encounters before and after standardization, but the consistency varied over time. The re-classification of inpatient encounters substantially impacted the observed prevalence of medical conditions occurring in the inpatient setting and the consistency in prevalence estimates between the databases. On average, before standardization, each condition in Optum was 12% more prevalent than that same condition in CCAE; after standardization, the prevalence of conditions had a mean difference of only 1% between databases. Amongst 7,039 conditions reviewed, the difference in the prevalence of 67% of conditions in these two databases was reduced after standardization. CONCLUSIONS In an effort to improve consistency in research results across database one should review sources of database heterogeneity, such as the way data holders process raw claims data. Our study showed that applying the Observational Medical Outcomes Partnership (OMOP) CDM with a standardized approach for defining inpatient visits during the extract, transfer, and load process can decrease the heterogeneity observed in disease prevalence estimates across two different claims data sources.
Collapse
Affiliation(s)
- Erica A Voss
- Janssen Research & Development, 920 Route 202, Raritan, NJ, 08869, USA.
| | - Qianli Ma
- Janssen Research & Development, 920 Route 202, Raritan, NJ, 08869, USA.
| | - Patrick B Ryan
- Janssen Research & Development, 920 Route 202, Raritan, NJ, 08869, USA.
| |
Collapse
|
9
|
Koutkias VG, Jaulent MC. Computational approaches for pharmacovigilance signal detection: toward integrated and semantically-enriched frameworks. Drug Saf 2015; 38:219-32. [PMID: 25749722 PMCID: PMC4374117 DOI: 10.1007/s40264-015-0278-8] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Computational signal detection constitutes a key element of postmarketing drug monitoring and surveillance. Diverse data sources are considered within the 'search space' of pharmacovigilance scientists, and respective data analysis methods are employed, all with their qualities and shortcomings, towards more timely and accurate signal detection. Recent systematic comparative studies highlighted not only event-based and data-source-based differential performance across methods but also their complementarity. These findings reinforce the arguments for exploiting all possible information sources for drug safety and the parallel use of multiple signal detection methods. Combinatorial signal detection has been pursued in few studies up to now, employing a rather limited number of methods and data sources but illustrating well-promising outcomes. However, the large-scale realization of this approach requires systematic frameworks to address the challenges of the concurrent analysis setting. In this paper, we argue that semantic technologies provide the means to address some of these challenges, and we particularly highlight their contribution in (a) annotating data sources and analysis methods with quality attributes to facilitate their selection given the analysis scope; (b) consistently defining study parameters such as health outcomes and drugs of interest, and providing guidance for study setup; (c) expressing analysis outcomes in a common format enabling data sharing and systematic comparisons; and (d) assessing/supporting the novelty of the aggregated outcomes through access to reference knowledge sources related to drug safety. A semantically-enriched framework can facilitate seamless access and use of different data sources and computational methods in an integrated fashion, bringing a new perspective for large-scale, knowledge-intensive signal detection.
Collapse
Affiliation(s)
- Vassilis G Koutkias
- INSERM, U1142, LIMICS, Campus des Cordeliers, 15 rue de l' École de Médecine, 75006, Paris, France,
| | | |
Collapse
|
10
|
Authors' reply to Hennessy and Leonard's comment on "Desideratum for evidence-based epidemiology". Drug Saf 2014; 38:105-7. [PMID: 25511912 DOI: 10.1007/s40264-014-0254-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
11
|
Empirical performance of a new user cohort method: lessons for developing a risk identification and analysis system. Drug Saf 2014; 36 Suppl 1:S59-72. [PMID: 24166224 DOI: 10.1007/s40264-013-0099-6] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
BACKGROUND Observational healthcare data offer the potential to enable identification of risks of medical products, but appropriate methodology has not yet been defined. The new user cohort method, which compares the post-exposure rate among the target drug to a referent comparator group, is the prevailing approach for many pharmacoepidemiology evaluations and has been proposed as a promising approach for risk identification but its performance in this context has not been fully assessed. OBJECTIVES To evaluate the performance of the new user cohort method as a tool for risk identification in observational healthcare data. RESEARCH DESIGN The method was applied to 399 drug-outcome scenarios (165 positive controls and 234 negative controls across 4 health outcomes of interest) in 5 real observational databases (4 administrative claims and 1 electronic health record) and in 6 simulated datasets with no effect and injected relative risks of 1.25, 1.5, 2, 4, and 10, respectively. MEASURES Method performance was evaluated through Area Under ROC Curve (AUC), bias, and coverage probability. RESULTS The new user cohort method achieved modest predictive accuracy across the outcomes and databases under study, with the top-performing analysis near AUC >0.70 in most scenarios. The performance of the method was particularly sensitive to the choice of comparator population. For almost all drug-outcome pairs there was a large difference, either positive or negative, between the true effect size and the estimate produced by the method, although this error was near zero on average. Simulation studies showed that in the majority of cases, the true effect estimate was not within the 95 % confidence interval produced by the method. CONCLUSION The new user cohort method can contribute useful information toward a risk identification system, but should not be considered definitive evidence given the degree of error observed within the effect estimates. Careful consideration of the comparator selection and appropriate calibration of the effect estimates is required in order to properly interpret study findings.
Collapse
|
12
|
Overhage JM, Ryan PB, Schuemie MJ, Stang PE. Desideratum for Evidence Based Epidemiology. Drug Saf 2013; 36 Suppl 1:S5-14. [DOI: 10.1007/s40264-013-0102-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|