1
|
Kucukakcali Z, Akbulut S, Colak C. Prediction of genomic biomarkers for endometriosis using the transcriptomic dataset. World J Clin Cases 2025; 13:104556. [DOI: 10.12998/wjcc.v13.i20.104556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/23/2024] [Revised: 03/03/2025] [Accepted: 03/13/2025] [Indexed: 04/09/2025] Open
Abstract
BACKGROUND Endometriosis is a clinical condition characterized by the presence of endometrial glands outside the uterine cavity. While its incidence remains mostly uncertain, endometriosis impacts around 180 million women worldwide. Despite the presentation of several epidemiological and clinical explanations, the precise mechanism underlying the disease remains ambiguous. In recent years, researchers have examined the hereditary dimension of the disease. Genetic research has aimed to discover the gene or genes responsible for the disease through association or linkage studies involving candidate genes or DNA mapping techniques.
AIM To identify genetic biomarkers linked to endometriosis by the application of machine learning (ML) approaches.
METHODS This case-control study accounted for the open-access transcriptomic data set of endometriosis and the control group. We included data from 22 controls and 16 endometriosis patients for this purpose. We used AdaBoost, XGBoost, Stochasting Gradient Boosting, Bagged Classification and Regression Trees (CART) for classification using five-fold cross validation. We evaluated the performance of the models using the performance measures of accuracy, balanced accuracy, sensitivity, specificity, positive predictive value, negative predictive value and F1 score.
RESULTS Bagged CART gave the best classification metrics. The metrics obtained from this model are 85.7%, 85.7%, 100%, 75%, 75%, 100% and 85.7% for accuracy, balanced accuracy, sensitivity, specificity, positive predictive value, negative predictive value and F1 score, respectively. Based on the variable importance of modeling, we can use the genes CUX2, CLMP, CEP131, EHD4, CDH24, ILRUN, LINC01709, HOTAIR, SLC30A2 and NKG7 and other transcripts with inaccessible gene names as potential biomarkers for endometriosis.
CONCLUSION This study determined possible genomic biomarkers for endometriosis using transcriptomic data from patients with/without endometriosis. The applied ML model successfully classified endometriosis and created a highly accurate diagnostic prediction model. Future genomic studies could explain the underlying pathology of endometriosis, and a non-invasive diagnostic method could replace the invasive ones.
Collapse
Affiliation(s)
- Zeynep Kucukakcali
- Department of Biostatistics and Medical Informatics, Inonu University Faculty of Medicine, Malatya 44280, Türkiye
| | - Sami Akbulut
- Department of Biostatistics and Medical Informatics, Inonu University Faculty of Medicine, Malatya 44280, Türkiye
- Surgery and Liver Transplant Institute, Inonu University Faculty of Medicine, Malatya 44280, Türkiye
| | - Cemil Colak
- Department of Biostatistics and Medical Informatics, Inonu University Faculty of Medicine, Malatya 44280, Türkiye
| |
Collapse
|
2
|
Cohn ER, Zubizarreta JR. IJMPR Didactic Paper: Weighting for Causal Inference in Mental Health Research. Int J Methods Psychiatr Res 2025; 34:e70018. [PMID: 40166979 PMCID: PMC11959416 DOI: 10.1002/mpr.70018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Revised: 02/10/2025] [Accepted: 02/24/2025] [Indexed: 04/02/2025] Open
Abstract
OBJECTIVE Inverse probability weighting is a fundamental and general methodology for estimating the causal effects of exposures and interventions, but standard approaches to constructing such weights are often suboptimal. METHODS In this paper, we describe a recent approach for constructing such weights that directly balances covariates while optimizing the stability of the resulting weighting estimator. RESULTS To illustrate the use of this approach in mental health research, we present an exploratory study of the effects of exposure to violence on the risk of suicide attempt. CONCLUSIONS The direct balancing approach to weighting should be given strong consideration in empirical research due to its robustness and transparency in building weighting estimators.
Collapse
Grants
- Champalimaud Foundation, Gulbenkian Foundation, Foundation for Science and Technology, FCT
- the Lebanese Ministry of Public Health, the WHO, Lebanon
- INPRFMDIES 4280 the National Institute of Psychiatry Ramon de la Fuente
- CONACyT-G30544-H the National Council on Science and Technology
- the Ministry of Public Health
- King Saud University
- the Ministry of Health and the National Center for Public Health Protection
- the Portuguese Ministry of Health
- the Ministry of Health and European Economic Area Grants
- the National Institute of Drug Abuse, NIDA
- 2002-17270/13-5 the Argentinian Ministry of Health, Ministerio de Salud de la Nación
- SAF 2000-158-CE Ministerio de Ciencia y Tecnología, Spain
- R01 DA016558 NIDA NIH HHS
- Eli Lilly Romania SRL
- SANCO 2004123 the European Commission
- the Polish Ministry of Health
- the Ministry of Health, Saudi Arabia
- the South African Department of Health and the University of Michigan
- RO1-MH61905 NIMH NIH HHS
- IDRAAC, Lebanon
- King Abdulaziz City for Science and Technology, KACST
- CIBER CB06/02/0046 Instituto de Salud Carlos III
- H14-TOKUBETSU-026 the Japan Ministry of Health, Labor and Welfare
- R13-MH066849 John D. and Catherine T. MacArthur Foundation
- Fondo de Investigación Sanitaria
- the Regional Health Authorities of Murcia, Servicio Murciano de Salud and Consejería de Sanidad y Política Social
- the National Institute of Health of the Ministry of Health of Peru
- H16-KOKORO-013 the Japan Ministry of Health, Labor and Welfare
- QLG5-1999-01042 the European Commission
- Pan American Health Organization
- 044708 Robert Wood Johnson Foundation
- Fundación para la Formación e Investigación Sanitarias, FFIS of Murcia
- R03 TW006481 FIC NIH HHS
- Algorithm, AstraZeneca, Benta, Bella Pharma, Lundbeck, Novartis, OmniPharma, Pfizer, Phenicia, Servier, UPO
- R01-MH069864 Pfizer Foundation
- U01-MH60220 NIMH NIH HHS
- ME-2022C1-25648 Patient-Centered Outcomes Research Institute
- the Ministry of Social Protection
- R01 MH069864 NIMH NIH HHS
- the Secretary of Health of Medellín
- the WHO, Nigeria
- 03/00204-3 the State of São Paulo Research Foundation Thematic Project
- Saudi Basic Industries Corporation, SABIC
- Ortho-McNeil Pharmaceutical
- the World Health Organization, Geneva
- R01 MH061905 NIMH NIH HHS
- the Health & Social Care Research & Development Division of the Public Health Agency
- H25-SEISHIN-IPPAN-006 the Japan Ministry of Health, Labor and Welfare
- the John W. Alden Trust
- U01 MH060220 NIMH NIH HHS
- 2014 SGR 748 Generalitat de Catalunya
- the Center for Excellence on Research in Mental Health, CES University
- FIS 00/0028 Instituto de Salud Carlos III, Spain
- Bristol-Myers Squibb
- R01 MH070884 NIMH NIH HHS
- the Federal Ministry of Health, Abuja, Nigeria
- EAHC 20081308 the European Commission
- GlaxoSmithKline
- H13-SHOGAI-023 the Japan Ministry of Health, Labor and Welfare
- Eli Lilly and Company
- R01 MH059575 NIMH NIH HHS
- U13 MH066849 NIMH NIH HHS
- RETICS RD06/0011 REM-TAP Instituto de Salud Carlos III
- R01-MH059575 NIMH NIH HHS
- King Faisal Specialist Hospital and Research Center, and the Ministry of Economy and Planning, General Authority for Statistics
- the Substance Abuse and Mental Health Services Administration, SAMHSA
- 2017 SGR 452 Generalitat de Catalunya
- the Piedmont Region, Italy
- R03-TW006481 FIC NIH HHS
- R13 MH066849 NIMH NIH HHS
- Ortho‐McNeil Pharmaceutical
- Robert Wood Johnson Foundation
- National Institute of Mental Health
- Eli Lilly and Company
- Pan American Health Organization
- Fogarty International Center
- U.S. Public Health Service
- Pfizer Foundation
- John D. and Catherine T. MacArthur Foundation
- Patient‐Centered Outcomes Research Institute
Collapse
Affiliation(s)
| | - José R. Zubizarreta
- Department of Health Care PolicyHarvard Medical SchoolBostonMassachusettsUSA
- Department of BiostatisticsHarvard T.H. Chan of Public HealthBostonMassachusettsUSA
- Department of StatisticsHarvard UniversityCambridgeMassachusettsUSA
| |
Collapse
|
3
|
Kumar RG, Pomeroy ML, Ornstein KA, Juengst SB, Wagner AK, Reckrey JM, Lercher K, Dreer LE, Evans E, de Souza NL, Dams-O'Connor K. Home, but Homebound After Traumatic Brain Injury: Risk Factors and Associations With Nursing Home Entry and Death. Arch Phys Med Rehabil 2025; 106:517-526. [PMID: 39374687 PMCID: PMC11968243 DOI: 10.1016/j.apmr.2024.09.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 09/12/2024] [Accepted: 09/16/2024] [Indexed: 10/09/2024]
Abstract
OBJECTIVE To examine risk factors associated with homeboundness 1-year after traumatic brain injury (TBI) and to explore associations between homebound status and risk of future mortality and nursing home entry. DESIGN Secondary analysis of a longitudinal prospective cohort study. SETTING TBI Model Systems centers. PARTICIPANTS Community-dwelling TBI Model Systems participants (n=6595) who sustained moderate-to-severe TBI between 2006 and 2016, and resided in a private residence 1-year postinjury. INTERVENTIONS Not applicable. MAIN OUTCOME MEASURES Homebound status (leaving home ≤1-2d per week), 5-year mortality, and 2- or 5-year nursing home entry. RESULTS In our sample, 14.2% of individuals were homebound 1-year postinjury, including 2% who never left home. Older age, having less than a bachelor's degree, Medicaid insurance, living in the Northeast or Midwest, dependence on others or special services for transportation, unemployment or retirement, and needing assistance for locomotion, bladder management, and social interactions at 1-year postinjury were associated with being homebound. After adjustment for potential confounders and an inverse probability weight for nonrandom attrition bias, being homebound was associated with a 1.69-times (95% confidence interval, 1.35-2.11) greater risk of 5-year mortality, and a nonsignificant but trending association with nursing home entry by 5 years postinjury (RR=1.90; 95% confidence interval, 0.94-3.87). Associations between homeboundness and mortality were consistent by age subgroup (±65y). CONCLUSIONS The negative long-term health outcomes among persons with TBI who rarely leave home warrants the need to re-evaluate home discharge as unequivocally positive. The identified risk factors for homebound status, and its associated negative long-term outcomes, should be considered when preparing patients and their families for discharge from acute and postacute rehabilitation care settings. Addressing modifiable risk factors for homeboundness, such as accessible public transportation options and home care to address mobility, could be targets for individual referrals and policy intervention.
Collapse
Affiliation(s)
- Raj G Kumar
- Department of Rehabilitation and Human Performance, Icahn School of Medicine at Mount Sinai, New York, New York.
| | - Mary Louise Pomeroy
- Center for Equity in Aging, School of Nursing, Johns Hopkins University, Baltimore, Maryland
| | - Katherine A Ornstein
- Center for Equity in Aging, School of Nursing, Johns Hopkins University, Baltimore, Maryland
| | - Shannon B Juengst
- Brain Injury Research Center, TIRR Memorial Hermann, Houston, Texas; Department of Physical Medicine and Rehabilitation, University of Texas Health Science Center at Houston, Houston, Texas
| | - Amy K Wagner
- Departments of Physical Medicine & Rehabilitation and Neuroscience, Safar Center for Resuscitation Research, Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Jennifer M Reckrey
- Department of Geriatrics and Palliative Medicine, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Kirk Lercher
- Department of Physical Medicine and Rehabilitation, New Jersey Medical School, Kessler Institute for Rehabilitation, Rutgers University, West Orange, New Jersey
| | - Laura E Dreer
- Departments of Ophthalmology & Visual Sciences & Physical Medicine and Rehabilitation, University of Alabama at Birmingham, Birmingham, Alabama
| | - Emily Evans
- Department of Physical Therapy, Sargent College of Health and Rehabilitation Sciences, Boston University, Boston, Massachusetts
| | - Nicola L de Souza
- Department of Rehabilitation and Human Performance, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Kristen Dams-O'Connor
- Department of Rehabilitation and Human Performance, Icahn School of Medicine at Mount Sinai, New York, New York; Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, New York
| |
Collapse
|
4
|
Svensson M, Bendahl PO, Alkner S, Hansson E, Rydén L, Dihge L. Development and validation of prediction models for sentinel lymph node status indicating postmastectomy radiotherapy in breast cancer: population-based study. BJS Open 2025; 9:zraf047. [PMID: 40197824 PMCID: PMC11977109 DOI: 10.1093/bjsopen/zraf047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2024] [Revised: 01/19/2025] [Accepted: 03/05/2025] [Indexed: 04/10/2025] Open
Abstract
BACKGROUND Postmastectomy radiotherapy (PMRT) impairs the outcome of immediate breast reconstruction in patients with breast cancer, and the sentinel lymph node (SLN) status is crucial in evaluating the need for PMRT. The aim of this study was to develop and validate models to stratify the risk of clinically significant SLN macrometastases (macro-SLNMs) before surgery. METHODS Women diagnosed with clinically node-negative (cN0) T1-2 breast cancer were identified within the Swedish National Quality Register for Breast Cancer (2014-2017). Prediction models and corresponding nomograms based on patient and tumour characteristics accessible before surgery were developed using adaptive least absolute shrinkage and selection operator logistic regression. The prediction of at least one and more than two macro-SLNMs adheres to the current guidelines on use of PMRT and reflects the exclusion criteria in ongoing trials aiming to de-escalate locoregional radiotherapy in patients with one or two macro-SLNMs. Predictive performance was evaluated using area under the receiver operating characteristic curve (AUC) and calibration plots. RESULTS Overall, 18 185 women were grouped into development (13 656) and validation (4529) cohorts. The well calibrated models predicting at least one and more than two macro-SLNMs had AUCs of 0.708 and 0.740, respectively, upon validation. By using the prediction model for at least one macro-SLNM, the risk could be updated from the pretest population prevalence of 13.2% to the post-test range of 1.6-74.6%. CONCLUSION Models based on routine patient and tumour characteristics could be used for prediction of SLN status that would indicate the need for PMRT and assist decision-making on immediate breast reconstruction for patients with cN0 breast cancer.
Collapse
Affiliation(s)
- Miriam Svensson
- Department of Clinical Sciences, Division of Surgery, Lund University, Lund, Sweden
| | - Pär-Ola Bendahl
- Department of Clinical Sciences, Division of Oncology and Pathology, Lund University, Lund, Sweden
| | - Sara Alkner
- Department of Haematology, Oncology and Radiation Physics, Skåne University Hospital, Lund, Sweden
| | - Emma Hansson
- Department of Plastic Surgery, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
- Department of Plastic Surgery, Sahlgrenska University Hospital, Region Västra Götaland, Gothenborg, Sweden
| | - Lisa Rydén
- Department of Clinical Sciences, Division of Surgery, Lund University, Lund, Sweden
- Department of Surgery, Skåne University Hospital, Malmö, Sweden
| | - Looket Dihge
- Department of Clinical Sciences, Division of Surgery, Lund University, Lund, Sweden
- Department of Plastic and Reconstructive Surgery, Skåne University Hospital, Malmö, Sweden
| |
Collapse
|
5
|
Shang Y, Chiu YH, Kong L. Robust propensity score estimation via loss function calibration. Stat Methods Med Res 2025; 34:457-472. [PMID: 39943776 PMCID: PMC11951360 DOI: 10.1177/09622802241308709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/04/2025]
Abstract
Propensity score estimation is often used as a preliminary step to estimate the average treatment effect with observational data. Nevertheless, misspecification of propensity score models undermines the validity of effect estimates in subsequent analyses. Prediction-based machine learning algorithms are increasingly used to estimate propensity scores to allow for more complex relationships between covariates. However, these approaches may not necessarily achieve covariates balancing. We propose a calibration-based method to better incorporate covariate balance properties in a general modeling framework. Specifically, we calibrate the loss function by adding a covariate imbalance penalty to standard parametric (e.g. logistic regressions) or machine learning models (e.g. neural networks). Our approach may mitigate the impact of model misspecification by explicitly taking into account the covariate balance in the propensity score estimation process. The empirical results show that the proposed method is robust to propensity score model misspecification. The integration of loss function calibration improves the balance of covariates and reduces the root-mean-square error of causal effect estimates. When the propensity score model is misspecified, the neural-network-based model yields the best estimator with less bias and smaller variance as compared to other methods considered.
Collapse
Affiliation(s)
- Yimeng Shang
- Department of Public Health Sciences, College of Medicine, The Pennsylvania State University, Hershey, PA, USA
| | - Yu-Han Chiu
- Department of Public Health Sciences, College of Medicine, The Pennsylvania State University, Hershey, PA, USA
| | - Lan Kong
- Department of Public Health Sciences, College of Medicine, The Pennsylvania State University, Hershey, PA, USA
| |
Collapse
|
6
|
Du J, Yu Y, Zhang M, Wu Z, Ryan AM, Mukherjee B. Outcome adaptive propensity score methods for handling censoring and high-dimensionality: Application to insurance claims. Stat Methods Med Res 2025:9622802241306856. [PMID: 40013476 DOI: 10.1177/09622802241306856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/28/2025]
Abstract
Propensity scores are commonly used to reduce the confounding bias in non-randomized observational studies for estimating the average treatment effect. An important assumption underlying this approach is that all confounders that are associated with both the treatment and the outcome of interest are measured and included in the propensity score model. In the absence of strong prior knowledge about potential confounders, researchers may agnostically want to adjust for a high-dimensional set of pre-treatment variables. As such, variable selection procedure is needed for propensity score estimation. In addition, studies show that including variables related to treatment only in the propensity score model may inflate the variance of the treatment effect estimators, while including variables that are predictive of only the outcome can improve efficiency. In this article, we propose to incorporate outcome-covariate relationship in the propensity score model by including the predicted binary outcome probability as a covariate. Our approach can be easily adapted to an ensemble of variable selection methods, including regularization methods and modern machine-learning tools based on classification and regression trees. We evaluate our method to estimate the treatment effects on a binary outcome, which is possibly censored, across multiple treatment groups. Simulation studies indicate that incorporating outcome probability for estimating the propensity scores can improve statistical efficiency and protect against model misspecification. The proposed methods are applied to a cohort of advanced-stage prostate cancer patients identified from a private insurance claims database for comparing the adverse effects of four commonly used drugs for treating castration-resistant prostate cancer.
Collapse
Affiliation(s)
- Jiacong Du
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Youfei Yu
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Min Zhang
- Vanke School of Public Health, Tsinghua University, Beijing, China
| | - Zhenke Wu
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Andrew M Ryan
- Department of Health Management and Policy, University of Michigan, Ann Arbor, MI, USA
| | - Bhramar Mukherjee
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
7
|
Gao Q, Wang J, Fang R, Sun H, Wang T. A doubly robust estimator for continuous treatments in high dimensions. BMC Med Res Methodol 2025; 25:35. [PMID: 39948447 PMCID: PMC11823051 DOI: 10.1186/s12874-025-02488-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 02/03/2025] [Indexed: 02/17/2025] Open
Abstract
BACKGROUND Generalized propensity score (GPS) methods have become popular for estimating causal relationships between a continuous treatment and an outcome in observational studies with rich covariate information. The presence of rich covariates enhances the plausibility of the unconfoundedness assumption. Nonetheless, it is also crucial to ensure the correct specification of both marginal and conditional treatment distributions, beyond the assumption of unconfoundedness. METHOD We address limitations in existing GPS methods by extending balance-based approaches to high dimensions and introducing the Generalized Outcome-Adaptive LASSO and Doubly Robust Estimate (GOALDeR). This novel approach integrates a balance-based method that is robust to the misspecification of distributions required for GPS methods, a doubly robust estimator that is robust to the misspecification of models, and a variable selection technique for causal inference that ensures an unbiased and statistically efficient estimation. RESULTS Simulation studies showed that GOALDeR was able to generate nearly unbiased estimates when either the GPS model or the outcome model was correctly specified. Notably, GOALDeR demonstrated greater precision and accuracy compared to existing methods and was slightly affected by the covariate correlation structure and ratio of sample size to covariate dimension. Real data analysis revealed no statistically significant dose-response relationship between epigenetic age acceleration and Alzheimer's disease. CONCLUSION In this study, we proposed GOALDeR as an advanced GPS method for causal inference in high dimensions, and empirically demonstrated that GOALDeR is doubly robust, with improved accuracy and precision compared to existing methods. The R package is available at https://github.com/QianGao-SXMU/GOALDeR .
Collapse
Affiliation(s)
- Qian Gao
- Department of Health Statistics, School of Public Health, MOE Key Laboratory of Coal Environmental Pathogenicity and Prevention, Shanxi Medical University, No.56 Xinjian South Road, Taiyuan, 030001, China
| | - Jiale Wang
- Department of Health Statistics, School of Public Health, MOE Key Laboratory of Coal Environmental Pathogenicity and Prevention, Shanxi Medical University, No.56 Xinjian South Road, Taiyuan, 030001, China
| | - Ruiling Fang
- Department of Health Statistics, School of Public Health, MOE Key Laboratory of Coal Environmental Pathogenicity and Prevention, Shanxi Medical University, No.56 Xinjian South Road, Taiyuan, 030001, China
| | - Hongwei Sun
- Department of Health Statistics, School of Public Health, Binzhou Medical University, Yantai, China
| | - Tong Wang
- Department of Health Statistics, School of Public Health, MOE Key Laboratory of Coal Environmental Pathogenicity and Prevention, Shanxi Medical University, No.56 Xinjian South Road, Taiyuan, 030001, China.
| |
Collapse
|
8
|
Deng Y, Yang N, Wang J, Tu T. Understanding the role of hormones in pediatric growth: Insights from a double-debiased machine learning approach. Steroids 2025; 214:109552. [PMID: 39653159 DOI: 10.1016/j.steroids.2024.109552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/21/2024] [Revised: 12/03/2024] [Accepted: 12/06/2024] [Indexed: 12/13/2024]
Abstract
This study investigates the causal relationships between hormone levels and growth and development of children, focusing specifically on height disparities in cases of dwarfism. Besides utilizing double-debiased machine learning approach, the study integrates three alternative causal inference methods: partialing-out lasso linear regression, cross-fit partialing-out lasso linear regression, and post-double selection LASSO. These machine learning techniques are pivotal in identifying causal effects within observational data. The findings reveal a positive correlation between luteinizing hormone (LH) levels and adolescent height, while follicle-stimulating hormone (FSH) and the LH/FSH ratio show inverse correlations. The study underscores the significant role of hormone levels, particularly LH, in determining height, offering valuable insights that could guide future interventions or treatments for children and adolescents with dwarfism.
Collapse
Affiliation(s)
- Ying Deng
- Department/Institution: National "111" Center for Cellular Regulation and Molecular Pharmaceutics, Key Laboratory of Fermentation Engineering (Ministry of Education), Hubei University of Technology, No.28, Nanli Road, Hongshan District, Wuhan, Hubei Province 430068, China.
| | - Ning Yang
- Department/Institution: National "111" Center for Cellular Regulation and Molecular Pharmaceutics, Key Laboratory of Fermentation Engineering (Ministry of Education), Hubei University of Technology, No.28, Nanli Road, Hongshan District, Wuhan, Hubei Province 430068, China.
| | - Jun Wang
- Department/Institution: National "111" Center for Cellular Regulation and Molecular Pharmaceutics, Key Laboratory of Fermentation Engineering (Ministry of Education), Hubei University of Technology, No.28, Nanli Road, Hongshan District, Wuhan, Hubei Province 430068, China.
| | - Taotao Tu
- Department/Institution: College of Economics and Management, Huazhong Agricultural University, No.1 Shizishan Street, Hongshan District, Wuhan, Hubei Province 430070, China.
| |
Collapse
|
9
|
Wang J, Zhang Z, Wang Y. Utilizing Feature Selection Techniques for AI-Driven Tumor Subtype Classification: Enhancing Precision in Cancer Diagnostics. Biomolecules 2025; 15:81. [PMID: 39858475 PMCID: PMC11763904 DOI: 10.3390/biom15010081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2024] [Revised: 01/02/2025] [Accepted: 01/07/2025] [Indexed: 01/27/2025] Open
Abstract
Cancer's heterogeneity presents significant challenges in accurate diagnosis and effective treatment, including the complexity of identifying tumor subtypes and their diverse biological behaviors. This review examines how feature selection techniques address these challenges by improving the interpretability and performance of machine learning (ML) models in high-dimensional datasets. Feature selection methods-such as filter, wrapper, and embedded techniques-play a critical role in enhancing the precision of cancer diagnostics by identifying relevant biomarkers. The integration of multi-omics data and ML algorithms facilitates a more comprehensive understanding of tumor heterogeneity, advancing both diagnostics and personalized therapies. However, challenges such as ensuring data quality, mitigating overfitting, and addressing scalability remain critical limitations of these methods. Artificial intelligence (AI)-powered feature selection offers promising solutions to these issues by automating and refining the feature extraction process. This review highlights the transformative potential of these approaches while emphasizing future directions, including the incorporation of deep learning (DL) models and integrative multi-omics strategies for more robust and reproducible findings.
Collapse
Affiliation(s)
- Jihan Wang
- Yan’an Medical College of Yan’an University, Yan’an 716000, China
| | - Zhengxiang Zhang
- Yan’an Medical College of Yan’an University, Yan’an 716000, China
| | - Yangyang Wang
- School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710129, China
| |
Collapse
|
10
|
Jaman A, Wang G, Ertefaie A, Bally M, Lévesque R, Platt RW, Schnitzer ME. Penalized G-estimation for effect modifier selection in a structural nested mean model for repeated outcomes. Biometrics 2025; 81:ujae165. [PMID: 39814568 DOI: 10.1093/biomtc/ujae165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 11/18/2024] [Accepted: 12/20/2024] [Indexed: 01/18/2025]
Abstract
Effect modification occurs when the impact of the treatment on an outcome varies based on the levels of other covariates known as effect modifiers. Modeling these effect differences is important for etiological goals and for purposes of optimizing treatment. Structural nested mean models (SNMMs) are useful causal models for estimating the potentially heterogeneous effect of a time-varying exposure on the mean of an outcome in the presence of time-varying confounding. A data-adaptive selection approach is necessary if the effect modifiers are unknown a priori and need to be identified. Although variable selection techniques are available for estimating the conditional average treatment effects using marginal structural models or for developing optimal dynamic treatment regimens, all of these methods consider a single end-of-follow-up outcome. In the context of an SNMM for repeated outcomes, we propose a doubly robust penalized G-estimator for the causal effect of a time-varying exposure with a simultaneous selection of effect modifiers and prove the oracle property of our estimator. We conduct a simulation study for the evaluation of its performance in finite samples and verification of its double-robustness property. Our work is motivated by the study of hemodiafiltration for treating patients with end-stage renal disease at the Centre Hospitalier de l'Université de Montréal. We apply the proposed method to investigate the effect heterogeneity of dialysis facility on the repeated session-specific hemodiafiltration outcomes.
Collapse
Affiliation(s)
- Ajmery Jaman
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC H3A 1G1, Canada
| | - Guanbo Wang
- CAUSALab, Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, United States
| | - Ashkan Ertefaie
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Michèle Bally
- Department of Pharmacy, Centre Hospital of University of Montreal, Montreal, QC H2X 0C1, Canada
- Faculty of Pharmacy, University of Montreal, Montreal, QC H3C 3J7, Canada
| | - Renée Lévesque
- Department of Medicine, University of Montreal, Montreal, QC H3T 1J4, Canada
| | - Robert W Platt
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC H3A 1G1, Canada
| | - Mireille E Schnitzer
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC H3A 1G1, Canada
- Faculty of Pharmacy, University of Montreal, Montreal, QC H3C 3J7, Canada
| |
Collapse
|
11
|
Inoue K, Sakamaki K, Komukai S, Ito Y, Goto A, Shinozaki T. Methodological Tutorial Series for Epidemiological Studies: Confounder Selection and Sensitivity Analyses to Unmeasured Confounding From Epidemiological and Statistical Perspectives. J Epidemiol 2025; 35:3-10. [PMID: 38972732 PMCID: PMC11637813 DOI: 10.2188/jea.je20240082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 05/23/2024] [Indexed: 07/09/2024] Open
Abstract
In observational studies, identifying and adjusting for a sufficient set of confounders is crucial for accurately estimating the causal effect of the exposure on the outcome. Even in studies with large sample sizes, which typically benefit from small variances in estimates, there is a risk of producing estimates that are precisely inaccurate if the study suffers from systematic errors or biases, including confounding bias. To date, several approaches have been developed for selecting confounders. In this article, we first summarize the epidemiological and statistical approaches to identifying a sufficient set of confounders. Particularly, we introduce the modified disjunctive cause criterion as one of the most useful approaches, which involves controlling for any pre-exposure covariate that affects the exposure, outcome, or both. It then excludes instrumental variables but includes proxies for the shared common cause of exposure and outcome. Statistical confounder selection is also useful when dealing with a large number of covariates, even in studies with small sample sizes. After introducing several approaches, we discuss some pitfalls and considerations in confounder selection, such as the adjustment for instrumental variables, intermediate variables, and baseline outcome variables. Lastly, as it is often difficult to comprehensively measure key confounders, we introduce two statistics, E-value and robustness value, for assessing sensitivity to unmeasured confounders. Illustrated examples are provided using the National Health and Nutritional Examination Survey Epidemiologic Follow-up Study. Integrating these principles and approaches will enhance our understanding of confounder selection and facilitate better reporting and interpretation of future epidemiological studies.
Collapse
Affiliation(s)
- Kosuke Inoue
- Department of Social Epidemiology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
- Hakubi Center for Advanced Research, Kyoto University, Kyoto, Japan
| | - Kentaro Sakamaki
- Center for Data Science, Yokohama City University, Yokohama, Japan
| | - Sho Komukai
- Division of Biomedical Statistics, Department of Integrated Medicine, Graduate School of Medicine, Osaka University, Osaka, Japan
| | - Yuri Ito
- Department of Medical Statistics, Research & Development Center, Osaka Medical and Pharmaceutical University, Osaka, Japan
| | - Atsushi Goto
- Department of Public Health, School of Medicine, Yokohama City University, Yokohama, Japan
| | - Tomohiro Shinozaki
- Department of Information and Computer Technology, Faculty of Engineering, Tokyo University of Science, Tokyo, Japan
| |
Collapse
|
12
|
Shoji T, Tsuchida J, Yadohisa H. Quantile outcome adaptive lasso: Covariate selection for inverse probability weighting estimator of quantile treatment effects. Stat Methods Med Res 2025; 34:69-84. [PMID: 39668609 PMCID: PMC11800702 DOI: 10.1177/09622802241299410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2024]
Abstract
When using the propensity score method to estimate the treatment effects, it is important to select the covariates to be included in the propensity score model. The inclusion of covariates unrelated to the outcome in the propensity score model led to bias and large variance in the estimator of treatment effects. Many data-driven covariate selection methods have been proposed for selecting covariates related to outcomes. However, most of them assume an average treatment effect estimation and may not be designed to estimate quantile treatment effects (QTEs), which are the effects of treatment on the quantiles of outcome distribution. In QTE estimation, we consider two relation types with the outcome as the expected value and quantile point. To achieve this, we propose a data-driven covariate selection method for propensity score models that allows for the selection of covariates related to the expected value and quantile of the outcome for QTE estimation. Assuming the quantile regression model as an outcome regression model, covariate selection was performed using a regularization method with the partial regression coefficients of the quantile regression model as weights. The proposed method was applied to artificial data and a dataset of mothers and children born in King County, Washington, to compare the performance of existing methods and QTE estimators. As a result, the proposed method performs well in the presence of covariates related to both the expected value and quantile of the outcome.
Collapse
Affiliation(s)
| | - Jun Tsuchida
- Department of Data Science, Kyoto Women’s University, Kyoto, Japan
| | - Hiroshi Yadohisa
- Department of Culture and Information Science, Doshisha University, Kyotanabe-shi, Kyoto, Japan
| |
Collapse
|
13
|
Duong B, Senadeera M, Nguyen T, Nichols M, Backholer K, Allender S, Nguyen T. Utilising causal inference methods to estimate effects and strategise interventions in observational health data. PLoS One 2024; 19:e0314761. [PMID: 39775396 PMCID: PMC11684594 DOI: 10.1371/journal.pone.0314761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Accepted: 11/16/2024] [Indexed: 01/11/2025] Open
Abstract
Randomised controlled trials (RCTs) are the gold standard for evaluating health interventions but often face ethical and practical challenges. When RCTs are not feasible, large observational data sets emerge as a pivotal resource, though these data sets may be subject to bias and unmeasured confounding. Traditional statistical (or non-causal) learning methods, while useful, face limitations in fully uncovering causal effects, i.e., determining if an intervention truly has a direct impact on the outcome. This gap is bridged by the latest advancements in causal inference methods, building upon machine learning-based approaches to investigate not only population-level effects but also the heterogeneous effects of interventions across population subgroups. We demonstrate a causality approach that utilises causal trees and forests, enhanced by weighting mechanisms to adjust for confounding covariates. This method does more than just predict the overall effect of an intervention on the whole population; it also gives a clear picture of how it works differently in various subgroups. Finally, this method excels in strategising and optimising interventions, by suggesting precise and explainable approaches to targeting the intervention, to maximise overall population health outcomes. These capabilities are crucial for health researchers, offering new insights into existing data and assisting in the decision-making process for future interventions. Using observational data from the 2017-18 Australian National Health Survey, our study demonstrates the power of causal trees in estimating the impact of exercise on BMI levels, understanding how this impact varies across subgroups, and assessing the effectiveness of various intervention targeting strategies for enhanced health benefits.
Collapse
Affiliation(s)
- Bao Duong
- Applied Artificial Intelligence Institute (A2I2), Deakin University, Geelong, Australia
| | - Manisha Senadeera
- Applied Artificial Intelligence Institute (A2I2), Deakin University, Geelong, Australia
| | - Toan Nguyen
- Applied Artificial Intelligence Institute (A2I2), Deakin University, Geelong, Australia
| | - Melanie Nichols
- Global Centre for Preventive Health and Nutrition (GLOBE), Faculty of Health, Deakin University, Geelong, Australia
| | - Kathryn Backholer
- Global Centre for Preventive Health and Nutrition (GLOBE), Faculty of Health, Deakin University, Geelong, Australia
| | - Steven Allender
- Global Centre for Preventive Health and Nutrition (GLOBE), Faculty of Health, Deakin University, Geelong, Australia
| | - Thin Nguyen
- Applied Artificial Intelligence Institute (A2I2), Deakin University, Geelong, Australia
| |
Collapse
|
14
|
Iyassu AS, Fenta HM, Dessie ZG, Zewotir TT. Identification of confounders and estimating the causal effect of place of birth on age-specific childhood vaccination. BMC Med Inform Decis Mak 2024; 24:406. [PMID: 39731133 DOI: 10.1186/s12911-024-02827-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 12/16/2024] [Indexed: 12/29/2024] Open
Abstract
BACKGROUND In causal analyses, some third factor may distort the relationship between the exposure and the outcome variables under study, which gives spurious results. In this case, treatment groups and control groups that receive and do not receive the exposure are different from one another in some other essential variables, called confounders. METHOD Place of birth was used as exposure variable and age-specific childhood vaccination status was used as outcome variables. Three approaches of confounder selection techniques such as all pre-treatment covariates, outcome cause covariates, and common cause covariates were proposed. Multiple logistic regression was used to estimate the propensity score for inverse probability treatment weighting (IPTW) confounder adjustment techniques. The proportional odds model was used to estimate the causal effect of place of birth on age-specific childhood vaccination. To validate the result obtained from observed data, we used a plasmode simulation of resampling 1000 samples from actual data 500 times. RESULT Outcome cause and common cause confounder identification techniques gave comparable results in terms of treatment effect in the plasmode data. However, outcome causes that contain common causes and predictors of the outcome confounder identification gave relatively better treatment effect results. The treatment effect result in the IPTW confounder adjustment method was better than that of the regression adjustment method. The effect of place of birth on log odds of cumulative probability of age-specific childhood vaccination was 0.36 with odds ratio of 1.43 for higher level vaccination status. CONCLUSION It is essential to use plasmode simulation data to validate the reproducibility of the proposed methods on the observed data. It is important to use outcome-cause covariates to adjust their confounding effect on the outcome. Using inverse probability treatment weighting gives unbiased treatment effect results as compared to the regression method of confounder adjustment. Institutional delivery increases the likelihood of childhood vaccination at the recommended schedule.
Collapse
Affiliation(s)
- Ashagrie Sharew Iyassu
- College of Science, Bahir Dar University, Bahir Dar, Ethiopia.
- Debremarkos University, Debre Markos, Ethiopia.
| | - Haile Mekonnen Fenta
- College of Science, Bahir Dar University, Bahir Dar, Ethiopia
- Center for Environmental and Respiratory Health Research, Population Health, University of Oulu, Oulu, Finland
- Biocenter, University of Oulu, Oulu, Finland
| | - Zelalem G Dessie
- College of Science, Bahir Dar University, Bahir Dar, Ethiopia
- School of Mathematics, Statistics & Computer Science, University of KwaZulu Natal, Durban, South Africa
| | - Temesgen T Zewotir
- School of Mathematics, Statistics & Computer Science, University of KwaZulu Natal, Durban, South Africa
| |
Collapse
|
15
|
Wyss R, van der Laan M, Gruber S, Shi X, Lee H, Dutcher SK, Nelson JC, Toh S, Russo M, Wang SV, Desai RJ, Lin KJ. Targeted learning with an undersmoothed LASSO propensity score model for large-scale covariate adjustment in health-care database studies. Am J Epidemiol 2024; 193:1632-1640. [PMID: 38517025 PMCID: PMC11538566 DOI: 10.1093/aje/kwae023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 02/13/2024] [Accepted: 03/18/2024] [Indexed: 03/23/2024] Open
Abstract
Least absolute shrinkage and selection operator (LASSO) regression is widely used for large-scale propensity score (PS) estimation in health-care database studies. In these settings, previous work has shown that undersmoothing (overfitting) LASSO PS models can improve confounding control, but it can also cause problems of nonoverlap in covariate distributions. It remains unclear how to select the degree of undersmoothing when fitting large-scale LASSO PS models to improve confounding control while avoiding issues that can result from reduced covariate overlap. Here, we used simulations to evaluate the performance of using collaborative-controlled targeted learning to data-adaptively select the degree of undersmoothing when fitting large-scale PS models within both singly and doubly robust frameworks to reduce bias in causal estimators. Simulations showed that collaborative learning can data-adaptively select the degree of undersmoothing to reduce bias in estimated treatment effects. Results further showed that when fitting undersmoothed LASSO PS models, the use of cross-fitting was important for avoiding nonoverlap in covariate distributions and reducing bias in causal estimates.
Collapse
Affiliation(s)
- Richard Wyss
- Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02120, United States
| | - Mark van der Laan
- Division of Biostatistics, School of Public Health, University of California, Berkeley, Berkeley, CA 94720, United States
| | - Susan Gruber
- Putnam Data Sciences, LLC, Cambridge, MA 02139, United States
| | - Xu Shi
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI 48109, United States
| | - Hana Lee
- Office of Biostatistics, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD 20903, United States
| | - Sarah K Dutcher
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD 20903, United States
| | - Jennifer C Nelson
- Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, United States
| | - Sengwee Toh
- Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA 02215, United States
| | - Massimiliano Russo
- Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02120, United States
| | - Shirley V Wang
- Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02120, United States
| | - Rishi J Desai
- Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02120, United States
| | - Kueiyu Joshua Lin
- Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02120, United States
| |
Collapse
|
16
|
Kabata D, Stuart EA, Shintani A. Prognostic score-based model averaging approach for propensity score estimation. BMC Med Res Methodol 2024; 24:228. [PMID: 39363252 PMCID: PMC11448247 DOI: 10.1186/s12874-024-02350-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 09/23/2024] [Indexed: 10/05/2024] Open
Abstract
BACKGROUND Propensity scores (PS) are typically evaluated using balance metrics that focus on covariate balance, often without considering their predictive power for the outcome. This approach may not always result in optimal bias reduction in the treatment effect estimate. To address this issue, evaluating covariate balance through prognostic scores, which account for the relationship between covariates and the outcome, has been proposed. Similarly, using a typical model averaging approach for PS estimation that minimizes prediction error for treatment status and covariate imbalance does not necessarily optimize PS-based confounding adjustment. As an alternative approach, using the averaged PS model that minimizes inter-group differences in the prognostic score may further reduce bias in the treatment effect estimate. Moreover, since the prognostic score is also an estimated quantity, model averaging in the prognostic scores can help identify a better prognostic score model. Utilizing the model-averaged prognostic scores as the balance metric for constructing the averaged PS model can contribute to further decreasing bias in treatment effect estimates. This paper demonstrates the effectiveness of the PS model averaging approach based on prognostic score balance and proposes a method that uses the model-averaged prognostic score as a balance metric, evaluating its performance through simulations and empirical analysis. METHODS We conduct a series of simulations alongside an analysis of empirical observational data to compare the performances of weighted treatment effect estimates using the proposed and existing approaches. In our examination, we separately provid four candidate estimates for the PS and prognostic score models using traditional regression and machine learning methods. The model averaging of PS based on these candidate estimators is performed to either maximize the prediction accuracy of the treatment or to minimize intergroup differences in covariate distributions or prognostic scores. We also utilize not only the prognostic scores from each candidate model but also an averaged score that best predicted the outcome, for the balance assessment. RESULTS The simulation and empirical data analysis reveal that our proposed model-averaging approaches for PS estimation consistently yield lower bias and less variability in treatment effect estimates across various scenarios compared to existing methods. Specifically, using the optimally averaged prognostic scores as a balance metric significantly improves the robustness of the weighted treatment effect estimates. DISCUSSION The prognostic score-based model averaging approach for estimating PS can outperform existing model averaging methods. In particular, the estimator using the model averaging prognostic score as a balance metric can produce more robust estimates. Since our results are obtained under relatively simple conditions, applying them to real data analysis requires adjustments to obtain accurate estimates according to the complexity and dimensionality of the data. CONCLUSIONS Using the prognostic score as the balance metric for the PS model averaging enhances the performance of the treatment effect estimator, which can be recommended for a wide variety of situations. When applying the proposed method to real-world data, it is important to use it in conjunction with techniques that mitigate issues arising from the complexity and high dimensionality of the data.
Collapse
Affiliation(s)
- Daijiro Kabata
- Center for Mathematical and Data Science, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe, Hyogo, 657-8501, Japan.
- Department of Medical Statistics, Graduate School of Medicine, Osaka Metropolitan University, Osaka, Japan.
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
| | - Elizabeth A Stuart
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Ayumi Shintani
- Department of Medical Statistics, Graduate School of Medicine, Osaka Metropolitan University, Osaka, Japan
| |
Collapse
|
17
|
Chen LP, Yi GY. AteMeVs: An R package for the estimation of the average treatment effect with measurement error and variable selection for confounders. PLoS One 2024; 19:e0296951. [PMID: 39348368 PMCID: PMC11441663 DOI: 10.1371/journal.pone.0296951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 12/20/2023] [Indexed: 10/02/2024] Open
Abstract
In causal inference, the estimation of the average treatment effect is often of interest. For example, in cancer research, an interesting question is to assess the effects of the chemotherapy treatment on cancer, with the information of gene expressions taken into account. Two crucial challenges in this analysis involve addressing measurement error in gene expressions and handling noninformative gene expressions. While analytical methods have been developed to address those challenges, no user-friendly computational software packages seem to be available to implement those methods. To close this gap, we develop an R package, called AteMeVs, to estimate the average treatment effect using the inverse-probability-weighting estimation method to handle data with both measurement error and spurious variables. This developed package accommodates the method proposed by Yi and Chen (2023) as a special case, and further extends its application to a broader scope. The usage of the developed R package is illustrated by applying it to analyze a cancer dataset with information of gene expressions.
Collapse
Affiliation(s)
- Li-Pang Chen
- Department of Statistics, National Chengchi University, Taipei, Taiwan, ROC
| | - Grace Y. Yi
- Department of Statistical and Actuarial Sciences, Department of Computer Science, University of Western Ontario, London, Canada
| |
Collapse
|
18
|
Debertin J, Jurado Vélez JA, Corlin L, Hidalgo B, Murray EJ. Synthesizing Subject-matter Expertise for Variable Selection in Causal Effect Estimation: A Case Study. Epidemiology 2024; 35:642-653. [PMID: 38860706 PMCID: PMC11309331 DOI: 10.1097/ede.0000000000001758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 05/27/2024] [Indexed: 06/12/2024]
Abstract
BACKGROUND Causal graphs are an important tool for covariate selection but there is limited applied research on how best to create them. Here, we used data from the Coronary Drug Project trial to assess a range of approaches to directed acyclic graph (DAG) creation. We focused on the effect of adherence on mortality in the placebo arm, since the true causal effect is believed with a high degree of certainty. METHODS We created DAGs for the effect of placebo adherence on mortality using different approaches for identifying variables and links to include or exclude. For each DAG, we identified minimal adjustment sets of covariates for estimating our causal effect of interest and applied these to analyses of the Coronary Drug Project data. RESULTS When we used only baseline covariate values to estimate the cumulative effect of placebo adherence on mortality, all adjustment sets performed similarly. The specific choice of covariates had minimal effect on these (biased) point estimates, but including nonconfounding prognostic factors resulted in smaller variance estimates. When we additionally adjusted for time-varying covariates of adherence using inverse probability weighting, covariates identified from the DAG created by focusing on prognostic factors performed best. CONCLUSION Theoretical advice on covariate selection suggests that including prognostic factors that are not exposure predictors can reduce variance without increasing bias. In contrast, for exposure predictors that are not prognostic factors, inclusion may result in less bias control. Our results empirically confirm this advice. We recommend that hand-creating DAGs begin with the identification of all potential outcome prognostic factors.
Collapse
Affiliation(s)
- Julia Debertin
- From the Department of Public Health and Community Medicine, Tufts University School of Medicine, Boston, MA
- Mayo Clinic Alix School of Medicine, Mayo Clinic College of Medicine and Science, Rochester, MN
| | | | - Laura Corlin
- From the Department of Public Health and Community Medicine, Tufts University School of Medicine, Boston, MA
- Department of Civil and Environmental Engineering, Tufts University School of Engineering, Medford, MA
| | - Bertha Hidalgo
- Department of Epidemiology, University of Alabama at Birmingham Ryals School of Public Health, Birmingham, AL
| | - Eleanor J. Murray
- Department of Epidemiology, Boston University School of Public Health, Boston, MA
| |
Collapse
|
19
|
Simon TG, Singer DE, Zhang Y, Mastrorilli JM, Cervone A, DiCesare E, Lin KJ. Comparative Effectiveness and Safety of Apixaban, Rivaroxaban, and Warfarin in Patients With Cirrhosis and Atrial Fibrillation : A Nationwide Cohort Study. Ann Intern Med 2024; 177:1028-1038. [PMID: 38976880 PMCID: PMC11671173 DOI: 10.7326/m23-3067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 07/10/2024] Open
Abstract
BACKGROUND Apixaban, rivaroxaban, and warfarin have shown benefit for preventing major ischemic events, albeit with increased bleeding risk, among patients in the general population with atrial fibrillation (AF). However, data are scarce in patients with cirrhosis and AF. OBJECTIVE To compare the effectiveness and safety of apixaban versus rivaroxaban and versus warfarin in patients with cirrhosis and AF. DESIGN Population-based cohort study. SETTING Two U.S. claims data sets (Medicare and Optum's de-identified Clinformatics Data Mart Database [2013 to 2022]). PARTICIPANTS 1:1 propensity score (PS)-matched patients with cirrhosis and nonvalvular AF initiating use of apixaban, rivaroxaban, or warfarin. MEASUREMENTS Primary outcomes included ischemic stroke or systemic embolism and major hemorrhage (intracranial hemorrhage or major gastrointestinal bleeding). Database-specific and pooled PS-matched rate differences (RDs) per 1000 person-years (PY) and Cox proportional hazard ratios (HRs) with 95% CIs were estimated, controlling for 104 preexposure covariates. RESULTS Rivaroxaban initiators had significantly higher rates of major hemorrhagic events than apixaban initiators (RD, 33.1 per 1000 PY [95% CI, 12.9 to 53.2 per 1000 PY]; HR, 1.47 [CI, 1.11 to 1.94]) but no significant differences in rates of ischemic events or death. Consistently higher rates of major hemorrhage were found with rivaroxaban across subgroup and sensitivity analyses. Warfarin initiators also had significantly higher rates of major hemorrhage than apixaban initiators (RD, 26.1 per 1000 PY [CI, 6.8 to 45.3 per 1000 PY]; HR, 1.38 [CI, 1.03 to 1.84]), particularly hemorrhagic stroke (RD, 9.7 per 1000 PY [CI, 2.2 to 17.2 per 1000 PY]; HR, 2.85 [CI, 1.24 to 6.59]). LIMITATION Nonrandomized treatment selection. CONCLUSION Among patients with cirrhosis and nonvalvular AF, initiators of rivaroxaban versus apixaban had significantly higher rates of major hemorrhage and similar rates of ischemic events and death. Initiation of warfarin versus apixaban also contributed to significantly higher rates of major hemorrhagic events, including hemorrhagic stroke. PRIMARY FUNDING SOURCE National Institutes of Health.
Collapse
Affiliation(s)
- Tracey G. Simon
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
- Division of Gastroenterology and Hepatology, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Clinical and Translational Epidemiology Unit (CTEU), Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Daniel E Singer
- Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Yichi Zhang
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Julianna M. Mastrorilli
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Alexander Cervone
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Elyse DiCesare
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Kueiyu Joshua Lin
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
20
|
Moodie EEM, Bian Z, Coulombe J, Lian Y, Yang AY, Shortreed SM. Variable selection in high dimensions for discrete-outcome individualized treatment rules: Reducing severity of depression symptoms. Biostatistics 2024; 25:633-647. [PMID: 37660312 DOI: 10.1093/biostatistics/kxad022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 07/14/2023] [Accepted: 08/03/2023] [Indexed: 09/05/2023] Open
Abstract
Despite growing interest in estimating individualized treatment rules, little attention has been given the binary outcome setting. Estimation is challenging with nonlinear link functions, especially when variable selection is needed. We use a new computational approach to solve a recently proposed doubly robust regularized estimating equation to accomplish this difficult task in a case study of depression treatment. We demonstrate an application of this new approach in combination with a weighted and penalized estimating equation to this challenging binary outcome setting. We demonstrate the double robustness of the method and its effectiveness for variable selection. The work is motivated by and applied to an analysis of treatment for unipolar depression using a population of patients treated at Kaiser Permanente Washington.
Collapse
Affiliation(s)
- Erica E M Moodie
- McGill University, Department of Epidemiology & Biostatistics, 2001 McGill College Ave, Suite 1200, Montreal, QC Canada H3A 1G1
| | - Zeyu Bian
- McGill University, Department of Epidemiology & Biostatistics, 2001 McGill College Ave, Suite 1200, Montreal, QC Canada H3A 1G1
| | - Janie Coulombe
- Université de Montréal, Department of Mathematics & Statistics, Pavillon André-Aisenstadt, Montréal, QC Canada H3C 3J7
| | - Yi Lian
- McGill University, Department of Epidemiology & Biostatistics, 2001 McGill College Ave, Suite 1200, Montreal, QC Canada H3A 1G1
| | - Archer Y Yang
- McGill University, Department of Mathematics & Statistics, 805 Sherbrooke Street West Montreal, QC Canada H3A 0B9
| | - Susan M Shortreed
- Kaiser Permanente Washington Health Research Institute, 1730 Minor Ave, Suite 1600, Seattle, WA 98101
- University of Washington, Department of Biostatistics, 1705 NE Pacific St, Seattle, WA 98195
| |
Collapse
|
21
|
Cao Z, Cho Y, Li F. Transporting randomized trial results to estimate counterfactual survival functions in target populations. Pharm Stat 2024; 23:442-465. [PMID: 38233102 DOI: 10.1002/pst.2354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Revised: 08/27/2023] [Accepted: 11/30/2023] [Indexed: 01/19/2024]
Abstract
When the distributions of treatment effect modifiers differ between a randomized trial and an external target population, the sample average treatment effect in the trial may be substantially different from the target population average treatment, and accurate estimation of the latter requires adjusting for the differential distribution of effect modifiers. Despite the increasingly rich literature on transportability, little attention has been devoted to methods for transporting trial results to estimate counterfactual survival functions in target populations, when the primary outcome is time to event and subject to right censoring. In this article, we study inverse probability weighting and doubly robust estimators to estimate counterfactual survival functions and the target average survival treatment effect in the target population, and provide their respective approximate variance estimators. We focus on a common scenario where the target population information is observed only through a complex survey, and elucidate how the survey weights can be incorporated into each estimator we considered. Simulation studies are conducted to examine the finite-sample performances of the proposed estimators in terms of bias, efficiency and coverage, under both correct and incorrect model specifications. Finally, we apply the proposed method to assess transportability of the results in the Action to Control Cardiovascular Risk in Diabetes-Blood Pressure (ACCORD-BP) trial to all adults with Diabetes in the United States.
Collapse
Affiliation(s)
- Zhiqiang Cao
- Department of Mathematics, College of Big Data and Internet, Shenzhen Technology University, Shenzhen, People's Republic of China
| | - Youngjoo Cho
- Department of Applied Statistics, Konkuk University, Seoul, Republic of Korea
| | - Fan Li
- Department of Biostatistics, Yale University School of Public Health, New Haven, Connecticut, USA
- Center for Methods in Implementation and Prevention Science, Yale University School of Public Health, New Haven, Connecticut, USA
| |
Collapse
|
22
|
Yaeger JP, Fiscella KA, Ertefaie A, Alio AP. Persistent challenges in adjusting for race in analyses and a path forward. J Hosp Med 2024; 19:239-242. [PMID: 38017671 DOI: 10.1002/jhm.13246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 11/03/2023] [Accepted: 11/09/2023] [Indexed: 11/30/2023]
Affiliation(s)
- Jeffrey P Yaeger
- Department of Pediatrics, University of Rochester School of Medicine and Dentistry, Rochester, New York, USA
- Department of Public Health Sciences, University of Rochester Medical Center, Rochester, New York, USA
| | - Kevin A Fiscella
- Department of Family Medicine, University of Rochester School of Medicine and Dentistry, Rochester, New York, USA
| | - Ashkan Ertefaie
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, New York, USA
| | - Amina P Alio
- Department of Public Health Sciences, University of Rochester Medical Center, Rochester, New York, USA
| |
Collapse
|
23
|
Does MB, Adams SR, Kline-Simon AH, Marino C, Charvat-Aguilar N, Weisner CM, Rubinstein AL, Ghadiali M, Cowan P, Young-Wolff KC, Campbell CI. A patient activation intervention in primary care for patients with chronic pain on long term opioid therapy: results from a randomized control trial. BMC Health Serv Res 2024; 24:112. [PMID: 38254073 PMCID: PMC10802020 DOI: 10.1186/s12913-024-10558-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 01/03/2024] [Indexed: 01/24/2024] Open
Abstract
BACKGROUND Given significant risks associated with long-term prescription opioid use, there is a need for non-pharmacological interventions for treating chronic pain. Activating patients to manage chronic pain has the potential to improve health outcomes. The ACTIVATE study was designed to evaluate the effectiveness of a 4-session patient activation intervention in primary care for patients on long-term opioid therapy. METHODS The two-arm, pragmatic, randomized trial was conducted in two primary care clinics in an integrated health system from June 2015-August 2018. Consenting participants were randomized to the intervention (n = 189) or usual care (n = 187). Participants completed online and interviewer-administered surveys at baseline, 6- and 12- months follow-up. Prescription opioid use was extracted from the EHR. The primary outcome was patient activation assessed by the Patient Activation Measure (PAM). Secondary outcomes included mood, function, overall health, non-pharmacologic pain management strategies, and patient portal use. We conducted a repeated measure analysis and reported between-group differences at 12 months. RESULTS At 12 months, the intervention and usual care arms had similar PAM scores. However, compared to usual care at 12 months, the intervention arm demonstrated: less moderate/severe depression (odds ratio [OR] = 0.40, 95%CI 0.18-0.87); higher overall health (OR = 3.14, 95%CI 1.64-6.01); greater use of the patient portal's health/wellness resources (OR = 2.50, 95%CI 1.42-4.40) and lab/immunization history (OR = 2.70, 95%CI 1.29-5.65); and greater use of meditation (OR = 2.72; 95%CI 1.61-4.58) and exercise/physical therapy (OR = 2.24, 95%CI 1.29-3.88). At 12 months, the intervention arm had a higher physical health measure (mean difference 1.63; 95%CI: 0.27-2.98). CONCLUSION This trial evaluated the effectiveness of a primary care intervention in improving patient activation and patient-reported outcomes among adults with chronic pain on long-term opioid therapy. Despite a lack of improvement in patient activation, a brief intervention in primary care can improve outcomes such as depression, overall health, non-pharmacologic pain management, and engagement with the health system. TRIAL REGISTRATION The study was registered on 10/27/14 on ClinicalTrials.gov (NCT02290223).
Collapse
Affiliation(s)
- Monique B Does
- Division of Research, Kaiser Permanente Northern California, 2000 Broadway, Oakland, CA, 94612-2403, USA.
| | - Sara R Adams
- Division of Research, Kaiser Permanente Northern California, 2000 Broadway, Oakland, CA, 94612-2403, USA
| | - Andrea H Kline-Simon
- Division of Research, Kaiser Permanente Northern California, 2000 Broadway, Oakland, CA, 94612-2403, USA
| | - Catherine Marino
- Physical Medicine and Rehabilitation, Kaiser Permanente Northern California, Santa Clara, CA, USA
| | - Nancy Charvat-Aguilar
- Division of Research, Kaiser Permanente Northern California, 2000 Broadway, Oakland, CA, 94612-2403, USA
| | - Constance M Weisner
- Division of Research, Kaiser Permanente Northern California, 2000 Broadway, Oakland, CA, 94612-2403, USA
- Department of Psychiatry, University of California, San Francisco, CA, USA
| | - Andrea L Rubinstein
- Department of Pain Medicine, The Permanente Medical Group, Santa Rosa, CA, USA
| | - Murtuza Ghadiali
- Addiction Medicine and Recovery Services, The Permanente Medical Group, San Francisco, CA, USA
| | - Penney Cowan
- American Chronic Pain Association, Rocklin, CA, USA
| | - Kelly C Young-Wolff
- Division of Research, Kaiser Permanente Northern California, 2000 Broadway, Oakland, CA, 94612-2403, USA
- Department of Psychiatry, University of California, San Francisco, CA, USA
| | - Cynthia I Campbell
- Division of Research, Kaiser Permanente Northern California, 2000 Broadway, Oakland, CA, 94612-2403, USA
- Department of Psychiatry, University of California, San Francisco, CA, USA
| |
Collapse
|
24
|
Liu Y, Gao Q, Wei K, Huang C, Wang C, Yu Y, Qin G, Wang T. High-dimensional generalized median adaptive lasso with application to omics data. Brief Bioinform 2024; 25:bbae059. [PMID: 38436558 PMCID: PMC10939310 DOI: 10.1093/bib/bbae059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Revised: 01/03/2024] [Indexed: 03/05/2024] Open
Abstract
Recently, there has been a growing interest in variable selection for causal inference within the context of high-dimensional data. However, when the outcome exhibits a skewed distribution, ensuring the accuracy of variable selection and causal effect estimation might be challenging. Here, we introduce the generalized median adaptive lasso (GMAL) for covariate selection to achieve an accurate estimation of causal effect even when the outcome follows skewed distributions. A distinctive feature of our proposed method is that we utilize a linear median regression model for constructing penalty weights, thereby maintaining the accuracy of variable selection and causal effect estimation even when the outcome presents extremely skewed distributions. Simulation results showed that our proposed method performs comparably to existing methods in variable selection when the outcome follows a symmetric distribution. Besides, the proposed method exhibited obvious superiority over the existing methods when the outcome follows a skewed distribution. Meanwhile, our proposed method consistently outperformed the existing methods in causal estimation, as indicated by smaller root-mean-square error. We also utilized the GMAL method on a deoxyribonucleic acid methylation dataset from the Alzheimer's disease (AD) neuroimaging initiative database to investigate the association between cerebrospinal fluid tau protein levels and the severity of AD.
Collapse
Affiliation(s)
- Yahang Liu
- Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
| | - Qian Gao
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
- Key Laboratory of Coal Environmental Pathogenicity and Prevention (Shanxi Medical University), Ministry of Education, China
| | - Kecheng Wei
- Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
| | - Chen Huang
- Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
| | - Ce Wang
- Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
| | - Yongfu Yu
- Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
- Shanghai Institute of Infectious Disease and Biosecurity, Shanghai, China
- Key Laboratory of Public Health Safety of Ministry of Education, Key Laboratory for Health Technology Assessment, National Commission of Health, Fudan University, Shanghai, China
| | - Guoyou Qin
- Department of Biostatistics, School of Public Health, Fudan University, Shanghai, China
- Shanghai Institute of Infectious Disease and Biosecurity, Shanghai, China
- Key Laboratory of Public Health Safety of Ministry of Education, Key Laboratory for Health Technology Assessment, National Commission of Health, Fudan University, Shanghai, China
| | - Tong Wang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
- Key Laboratory of Coal Environmental Pathogenicity and Prevention (Shanxi Medical University), Ministry of Education, China
| |
Collapse
|
25
|
Spicker D, Moodie EE, Shortreed SM. Differentially Private Outcome-Weighted Learning for Optimal Dynamic Treatment Regime Estimation. Stat (Int Stat Inst) 2024; 13:e641. [PMID: 39070170 PMCID: PMC11281278 DOI: 10.1002/sta4.641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 11/12/2023] [Indexed: 07/30/2024]
Abstract
Precision medicine is a framework for developing evidence-based medical recommendations that seeks to determine the optimal sequence of treatments tailored to all of the relevant patient-level characteristics which are observable. Because precision medicine relies on highly sensitive, patient-level data, ensuring the privacy of participants is of great importance. Dynamic treatment regimes (DTRs) provide one formalization of precision medicine in a longitudinal setting. Outcome-Weighted Learning (OWL) is a family of techniques for estimating optimal DTRs based on observational data. OWL techniques leverage support vector machine (SVM) classifiers in order to perform estimation. SVMs perform classification based on a set of influential points in the data known as support vectors. The classification rule produced by SVMs often requires direct access to the support vectors. Thus, releasing a treatment policy estimated with OWL requires the release of patient data for a subset of patients in the sample. As a result, the classification rules from SVMs constitute a severe privacy violation for those individuals whose data comprise the support vectors. This privacy violation is a major concern, particularly in light of the potentially highly sensitive medical data which are used in DTR estimation. Differential privacy has emerged as a mathematical framework for ensuring the privacy of individual-level data, with provable guarantees on the likelihood that individual characteristics can be determined by an adversary. We provide the first investigation of differential privacy in the context of DTRs and provide a differentially private OWL estimator, with theoretical results allowing us to quantify the cost of privacy in terms of the accuracy of the private estimators.
Collapse
Affiliation(s)
- Dylan Spicker
- Department of Mathematics and Statistics, University of New Brunswick (Saint John), NB, Canada
| | - Erica E.M. Moodie
- Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, QC, Canada
| | - Susan M. Shortreed
- Kaiser Permanente Washington Health Research Institute, WA, USA
- Department of Biostatistics University of Washington, WA, USA
| |
Collapse
|
26
|
Kim C, Tec M, Zigler C. Bayesian nonparametric adjustment of confounding. Biometrics 2023; 79:3252-3265. [PMID: 36718599 PMCID: PMC11884736 DOI: 10.1111/biom.13833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 01/19/2023] [Indexed: 02/01/2023]
Abstract
Analysis of observational studies increasingly confronts the challenge of determining which of a possibly high-dimensional set of available covariates are required to satisfy the assumption of ignorable treatment assignment for estimation of causal effects. We propose a Bayesian nonparametric approach that simultaneously (1) prioritizes inclusion of adjustment variables in accordance with existing principles of confounder selection; (2) estimates causal effects in a manner that permits complex relationships among confounders, exposures, and outcomes; and (3) provides causal estimates that account for uncertainty in the nature of confounding. The proposal relies on specification of multiple Bayesian additive regression trees models, linked together with a common prior distribution that accrues posterior selection probability to covariates on the basis of association with both the exposure and the outcome of interest. A set of extensive simulation studies demonstrates that the proposed method performs well relative to similarly-motivated methodologies in a variety of scenarios. We deploy the method to investigate the causal effect of emissions from coal-fired power plants on ambient air pollution concentrations, where the prospect of confounding due to local and regional meteorological factors introduces uncertainty around the confounding role of a high-dimensional set of measured variables. Ultimately, we show that the proposed method produces more efficient and more consistent results across adjacent years than alternative methods, lending strength to the evidence of the causal relationship between SO2 emissions and ambient particulate pollution.
Collapse
Affiliation(s)
- Chanmin Kim
- Department of Statistics, SungKyunKwan University, Seoul, South Korea
| | - Mauricio Tec
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Corwin Zigler
- Department of Statistics and Data Science, The University of Texas, Austin, Texas, USA
| |
Collapse
|
27
|
Chen R, Chen G, Yu M. Entropy balancing for causal generalization with target sample summary information. Biometrics 2023; 79:3179-3190. [PMID: 36645231 DOI: 10.1111/biom.13825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Revised: 12/14/2022] [Accepted: 01/05/2023] [Indexed: 01/17/2023]
Abstract
In this paper, we focus on estimating the average treatment effect (ATE) of a target population when individual-level data from a source population and summary-level data (e.g., first or second moments of certain covariates) from the target population are available. In the presence of the heterogeneous treatment effect, the ATE of the target population can be different from that of the source population when distributions of treatment effect modifiers are dissimilar in these two populations, a phenomenon also known as covariate shift. Many methods have been developed to adjust for covariate shift, but most require individual covariates from a representative target sample. We develop a weighting approach based on the summary-level information from the target sample to adjust for possible covariate shift in effect modifiers. In particular, weights of the treated and control groups within a source sample are calibrated by the summary-level information of the target sample. Our approach also seeks additional covariate balance between the treated and control groups in the source sample. We study the asymptotic behavior of the corresponding weighted estimator for the target population ATE under a wide range of conditions. The theoretical implications are confirmed in simulation studies and a real-data application.
Collapse
Affiliation(s)
- Rui Chen
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Guanhua Chen
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Menggang Yu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, USA
| |
Collapse
|
28
|
Choi BY, Brookhart MA. Effects of Adjusting for Instrumental Variables on the Bias and Precision of Propensity Score Weighted Estimators: Analysis Under Complete, Near, and No Positivity Violations. Clin Epidemiol 2023; 15:1055-1068. [PMID: 38025839 PMCID: PMC10644870 DOI: 10.2147/clep.s427933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 10/24/2023] [Indexed: 12/01/2023] Open
Abstract
Purpose To demonstrate that using an instrumental variable (IV) with monotonicity reduces the accuracy of propensity score (PS) weighted estimators for the average treatment effect (ATE). Methods Monotonicity in the relationship between a binary IV and a binary treatment variable is an important assumption to identify the ATE for compliers who would only take treatment when encouraged by the IV. We perform theoretical and numerical investigations to study the impact of using the IV that satisfies monotonicity on the PS of treatment in terms of the positivity assumption, which requires that the PS be strictly between 0 and 1, and the accuracy of PS weighted estimators. Two versions of monotonicity that result in one-sided or two-sided noncompliance are considered. Results The PS adjusting for the IV always violates the positivity assumption when noncompliance occurs in one direction (one-sided noncompliance) and is more extreme than without the IV under two-sided noncompliance. These results are valid if the probability of being encouraged to get treatment and the compliance score, the probability of being a complier, are strictly between 0 and 1. Conclusion Using a binary IV with monotonicity as a covariate for the PS model makes the estimated PSs unnecessarily extreme, reducing the accuracy of the PS weighted estimators.
Collapse
Affiliation(s)
- Byeong Yeob Choi
- Department of Population Health Sciences, UT Health San Antonio, San Antonio, TX, USA
| | - M Alan Brookhart
- Department of Population Health Sciences, Duke University, Durham, NC, USA
| |
Collapse
|
29
|
Rodriguez Duque D, Moodie EEM, Stephens DA. Bayesian inference for optimal dynamic treatment regimes in practice. Int J Biostat 2023; 19:309-331. [PMID: 37192544 DOI: 10.1515/ijb-2022-0073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 03/21/2023] [Indexed: 05/18/2023]
Abstract
In this work, we examine recently developed methods for Bayesian inference of optimal dynamic treatment regimes (DTRs). DTRs are a set of treatment decision rules aimed at tailoring patient care to patient-specific characteristics, thereby falling within the realm of precision medicine. In this field, researchers seek to tailor therapy with the intention of improving health outcomes; therefore, they are most interested in identifying optimal DTRs. Recent work has developed Bayesian methods for identifying optimal DTRs in a family indexed by ψ via Bayesian dynamic marginal structural models (MSMs) (Rodriguez Duque D, Stephens DA, Moodie EEM, Klein MB. Semiparametric Bayesian inference for dynamic treatment regimes via dynamic regime marginal structural models. Biostatistics; 2022. (In Press)); we review the proposed estimation procedure and illustrate its use via the new BayesDTR R package. Although methods in Rodriguez Duque D, Stephens DA, Moodie EEM, Klein MB. (Semiparametric Bayesian inference for dynamic treatment regimes via dynamic regime marginal structural models. Biostatistics; 2022. (In Press)) can estimate optimal DTRs well, they may lead to biased estimators when the model for the expected outcome if everyone in a population were to follow a given treatment strategy, known as a value function, is misspecified or when a grid search for the optimum is employed. We describe recent work that uses a Gaussian process ( G P ) prior on the value function as a means to robustly identify optimal DTRs (Rodriguez Duque D, Stephens DA, Moodie EEM. Estimation of optimal dynamic treatment regimes using Gaussian processes; 2022. Available from: https://doi.org/10.48550/arXiv.2105.12259). We demonstrate how a G P approach may be implemented with the BayesDTR package and contrast it with other value-search approaches to identifying optimal DTRs. We use data from an HIV therapeutic trial in order to illustrate a standard analysis with these methods, using both the original observed trial data and an additional simulated component to showcase a longitudinal (two-stage DTR) analysis.
Collapse
Affiliation(s)
| | - Erica E M Moodie
- Department of Epidemiology & Biostatistics, McGill University, Montréal, QC, Canada
| | - David A Stephens
- Department of Mathematics and Statistics, McGill University, Montréal, QC, Canada
| |
Collapse
|
30
|
Donovan LM, Wai T, Spece LJ, Duan KI, Griffith MF, Leonhard A, Plumley R, Hayes SA, Picazo F, Crothers K, Kapur VK, Palen BN, Au DH, Feemster LC. Sleep Testing and Mortality in a Propensity-matched Cohort of Patients with Chronic Obstructive Pulmonary Disease. Ann Am Thorac Soc 2023; 20:1642-1653. [PMID: 37579136 DOI: 10.1513/annalsats.202303-275oc] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 08/10/2023] [Indexed: 08/16/2023] Open
Abstract
Rationale: Many advocate the application of propensity-matching methods to real-world data to answer key questions around obstructive sleep apnea (OSA) management. One such question is whether identifying undiagnosed OSA impacts mortality in high-risk populations, such as those with chronic obstructive pulmonary disease (COPD). Objectives: Assess the association of sleep testing with mortality among patients with COPD and a high likelihood of undiagnosed OSA. Methods: We identified patients with COPD and a high likelihood of undiagnosed OSA. We then distinguished those receiving sleep testing within 90 days of index COPD encounters. We calculated propensity scores for testing based on 37 variables and compared long-term mortality in matched groups. In sensitivity analyses, we compared mortality using inverse propensity weighting and instrumental variable methods. We also compared the incidence of nonfatal events including adverse outcomes (hospitalizations and COPD exacerbations) and routine services that are regularly indicated in COPD (influenza vaccination and pulmonary function testing). We compared the incidence of each nonfatal event as a composite outcome with death and separately compared the marginal probability of each nonfatal event independently, with death as a competing risk. Results: Among 135,958 patients, 1,957 (1.4%) received sleep testing. We propensity matched all patients with sleep testing to an equal number without testing, achieving excellent balance on observed confounders, with standardized differences < 0.10. We observed lower mortality risk among patients with sleep testing (incidence rate ratio, 0.88; 95% confidence interval [CI], 0.79-0.99) and similar results using inverse propensity weighting and instrumental variable methods. Contrary to mortality, we found that sleep testing was associated with a similar or greater risk for nonfatal adverse events, including inpatient COPD exacerbations (subhazard ratio, 1.29; 95% CI, 1.02-1.62) and routine services like influenza vaccination (subhazard ratio, 1.26; 95% CI, 1.17-1.36). Conclusions: Our disparate findings can be interpreted in multiple ways. Sleep testing may indeed cause both reduced mortality and greater incidence of nonfatal adverse outcomes and routine services. However, it is also possible that our findings stem from residual confounding by patients' likelihood of accessing care. Given the limitations of propensity-based analyses, we cannot confidently distinguish these two possibilities. This uncertainty highlights the limitations of using propensity-based analyses to guide patient care and policy decisions.
Collapse
Affiliation(s)
- Lucas M Donovan
- Seattle-Denver Center of Innovation for Veteran-centered and Value-driven Care, Veterans Affairs Puget Sound Health Care System, Seattle, Washington
- University of Washington, Seattle, Washington; and
| | - Travis Wai
- Seattle-Denver Center of Innovation for Veteran-centered and Value-driven Care, Veterans Affairs Puget Sound Health Care System, Seattle, Washington
| | - Laura J Spece
- Seattle-Denver Center of Innovation for Veteran-centered and Value-driven Care, Veterans Affairs Puget Sound Health Care System, Seattle, Washington
- University of Washington, Seattle, Washington; and
| | - Kevin I Duan
- Seattle-Denver Center of Innovation for Veteran-centered and Value-driven Care, Veterans Affairs Puget Sound Health Care System, Seattle, Washington
- University of Washington, Seattle, Washington; and
| | - Matthew F Griffith
- Seattle-Denver Center of Innovation for Veteran-centered and Value-driven Care, Veterans Affairs Puget Sound Health Care System, Seattle, Washington
- University of Colorado, Aurora, Colorado
| | | | - Robert Plumley
- Seattle-Denver Center of Innovation for Veteran-centered and Value-driven Care, Veterans Affairs Puget Sound Health Care System, Seattle, Washington
| | | | | | - Kristina Crothers
- Seattle-Denver Center of Innovation for Veteran-centered and Value-driven Care, Veterans Affairs Puget Sound Health Care System, Seattle, Washington
- University of Washington, Seattle, Washington; and
| | | | - Brian N Palen
- Seattle-Denver Center of Innovation for Veteran-centered and Value-driven Care, Veterans Affairs Puget Sound Health Care System, Seattle, Washington
- University of Washington, Seattle, Washington; and
| | - David H Au
- Seattle-Denver Center of Innovation for Veteran-centered and Value-driven Care, Veterans Affairs Puget Sound Health Care System, Seattle, Washington
- University of Washington, Seattle, Washington; and
| | - Laura C Feemster
- Seattle-Denver Center of Innovation for Veteran-centered and Value-driven Care, Veterans Affairs Puget Sound Health Care System, Seattle, Washington
- University of Washington, Seattle, Washington; and
| |
Collapse
|
31
|
Wang C, Wei K, Huang C, Yu Y, Qin G. Multiply robust estimator for the difference in survival functions using pseudo-observations. BMC Med Res Methodol 2023; 23:247. [PMID: 37872495 PMCID: PMC10591363 DOI: 10.1186/s12874-023-02065-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 10/11/2023] [Indexed: 10/25/2023] Open
Abstract
BACKGROUND When estimating the causal effect on survival outcomes in observational studies, it is necessary to adjust confounding factors due to unbalanced covariates between treatment and control groups. There is no study on multiple robust method for estimating the difference in survival functions. In this study, we propose a multiply robust (MR) estimator, allowing multiple propensity score models and outcome regression models, to provide multiple protection. METHOD Based on the previous MR estimator (Han 2014) and pseudo-observation approach, we proposed a new MR estimator for estimating the difference in survival functions. The proposed MR estimator based on the pseudo-observation approach has several advantages. First, the proposed estimator has a small bias when any PS and OR models were correctly specified. Second, the proposed estimator considers the advantage pf the pseudo-observation approach, which avoids proportional hazards assumption. A Monte Carlo simulation study was performed to evaluate the performance of the proposed estimator. And the proposed estimator was used to estimate the effect of chemotherapy on triple-negative breast cancer (TNBC) in real data. RESULTS The simulation studies showed that the bias of the proposed estimator was small, and the coverage rate was close to 95% when any model for propensity score or outcome regression is correctly specified regardless of whether the proportional hazard assumption holds, finite sample size and censoring rate. And the simulation results also showed that even though the propensity score models are misspecified, the bias of the proposed estimator was still small when there is a correct model in candidate outcome regression models. And we applied the proposed estimator in real data, finding that chemotherapy could improve the prognosis of TNBC. CONCLUSIONS The proposed estimator, allowing multiple propensity score and outcome regression models, provides multiple protection for estimating the difference in survival functions. The proposed estimator provided a new choice when researchers have a "difficult time" choosing only one model for their studies.
Collapse
Affiliation(s)
- Ce Wang
- Department of Biostatistics, Key Laboratory for Health Technology Assessment, National Commission of Health, Key Laboratory of Public Health Safety of Ministry of Education, School of Public Health, Fudan University, Shanghai, China
| | - Kecheng Wei
- Department of Biostatistics, Key Laboratory for Health Technology Assessment, National Commission of Health, Key Laboratory of Public Health Safety of Ministry of Education, School of Public Health, Fudan University, Shanghai, China
| | - Chen Huang
- Department of Biostatistics, Key Laboratory for Health Technology Assessment, National Commission of Health, Key Laboratory of Public Health Safety of Ministry of Education, School of Public Health, Fudan University, Shanghai, China
| | - Yongfu Yu
- Department of Biostatistics, Key Laboratory for Health Technology Assessment, National Commission of Health, Key Laboratory of Public Health Safety of Ministry of Education, School of Public Health, Fudan University, Shanghai, China.
- Shanghai Institute of Infectious Disease and Biosecurity, Shanghai, China.
| | - Guoyou Qin
- Department of Biostatistics, Key Laboratory for Health Technology Assessment, National Commission of Health, Key Laboratory of Public Health Safety of Ministry of Education, School of Public Health, Fudan University, Shanghai, China.
- Shanghai Institute of Infectious Disease and Biosecurity, Shanghai, China.
| |
Collapse
|
32
|
Yui K, Imataka G, Shiohama T. Lipid Peroxidation of the Docosahexaenoic Acid/Arachidonic Acid Ratio Relating to the Social Behaviors of Individuals with Autism Spectrum Disorder: The Relationship with Ferroptosis. Int J Mol Sci 2023; 24:14796. [PMID: 37834244 PMCID: PMC10572946 DOI: 10.3390/ijms241914796] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 09/11/2023] [Accepted: 09/19/2023] [Indexed: 10/15/2023] Open
Abstract
Polyunsaturated fatty acids (PUFAs) undergo lipid peroxidation and conversion into malondialdehyde (MDA). MDA reacts with acetaldehyde to form malondialdehyde-modified low-density lipoprotein (MDA-LDL). We studied unsettled issues in the association between MDA-LDL and the pathophysiology of ASD in 18 individuals with autism spectrum disorders (ASD) and eight age-matched controls. Social behaviors were assessed using the social responsiveness scale (SRS). To overcome the problem of using small samples, adaptive Lasso was used to enhance the interpretability accuracy, and a coefficient of variation was used for variable selections. Plasma levels of the MDA-LDL levels (91.00 ± 16.70 vs. 74.50 ± 18.88) and the DHA/arachidonic acid (ARA) ratio (0.57 ± 0.16 vs. 0.37 ± 0.07) were significantly higher and the superoxide dismutase levels were significantly lower in the ASD group than those in the control group. Total SRS scores in the ASD group were significantly higher than those in the control group. The unbeneficial DHA/ARA ratio induced ferroptosis via lipid peroxidation. Multiple linear regression analysis and adaptive Lasso revealed an association of the DHA/ARA ratio with total SRS scores and increased MDA-LDL levels in plasma, resulting in neuronal deficiencies. This unbeneficial DHA/ARA-ratio-induced ferroptosis contributes to autistic social behaviors and is available for therapy.
Collapse
Affiliation(s)
- Kunio Yui
- Department of Pediatrics, Graduate School of Medicine, Chiba University, Chiba 260-8677, Japan;
- Department of Pediatrics, Dokkyo Medical University, Mibu 321-0293, Japan;
| | - George Imataka
- Department of Pediatrics, Dokkyo Medical University, Mibu 321-0293, Japan;
| | - Tadashi Shiohama
- Department of Pediatrics, Graduate School of Medicine, Chiba University, Chiba 260-8677, Japan;
| |
Collapse
|
33
|
Williamson BD, Wyss R, Stuart EA, Dang LE, Mertens AN, Neugebauer RS, Wilson A, Gruber S. An application of the Causal Roadmap in two safety monitoring case studies: Causal inference and outcome prediction using electronic health record data. J Clin Transl Sci 2023; 7:e208. [PMID: 37900347 PMCID: PMC10603358 DOI: 10.1017/cts.2023.632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 09/12/2023] [Accepted: 09/13/2023] [Indexed: 10/31/2023] Open
Abstract
Background Real-world data, such as administrative claims and electronic health records, are increasingly used for safety monitoring and to help guide regulatory decision-making. In these settings, it is important to document analytic decisions transparently and objectively to assess and ensure that analyses meet their intended goals. Methods The Causal Roadmap is an established framework that can guide and document analytic decisions through each step of the analytic pipeline, which will help investigators generate high-quality real-world evidence. Results In this paper, we illustrate the utility of the Causal Roadmap using two case studies previously led by workgroups sponsored by the Sentinel Initiative - a program for actively monitoring the safety of regulated medical products. Each case example focuses on different aspects of the analytic pipeline for drug safety monitoring. The first case study shows how the Causal Roadmap encourages transparency, reproducibility, and objective decision-making for causal analyses. The second case study highlights how this framework can guide analytic decisions beyond inference on causal parameters, improving outcome ascertainment in clinical phenotyping. Conclusion These examples provide a structured framework for implementing the Causal Roadmap in safety surveillance and guide transparent, reproducible, and objective analysis.
Collapse
Affiliation(s)
- Brian D. Williamson
- Biostatistics Division, Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - Richard Wyss
- Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Elizabeth A. Stuart
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Lauren E. Dang
- Department of Biostatistics, University of California, Berkeley, CA, USA
| | - Andrew N. Mertens
- Department of Biostatistics, University of California, Berkeley, CA, USA
| | | | | | | |
Collapse
|
34
|
Yu Y, Zhang M, Mukherjee B. An inverse probability weighted regression method that accounts for right-censoring for causal inference with multiple treatments and a binary outcome. Stat Med 2023; 42:3699-3715. [PMID: 37392070 DOI: 10.1002/sim.9826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 05/29/2023] [Accepted: 06/01/2023] [Indexed: 07/02/2023]
Abstract
Comparative effectiveness research often involves evaluating the differences in the risks of an event of interest between two or more treatments using observational data. Often, the post-treatment outcome of interest is whether the event happens within a pre-specified time window, which leads to a binary outcome. One source of bias for estimating the causal treatment effect is the presence of confounders, which are usually controlled using propensity score-based methods. An additional source of bias is right-censoring, which occurs when the information on the outcome of interest is not completely available due to dropout, study termination, or treatment switch before the event of interest. We propose an inverse probability weighted regression-based estimator that can simultaneously handle both confounding and right-censoring, calling the method CIPWR, with the letter C highlighting the censoring component. CIPWR estimates the average treatment effects by averaging the predicted outcomes obtained from a logistic regression model that is fitted using a weighted score function. The CIPWR estimator has a double robustness property such that estimation consistency can be achieved when either the model for the outcome or the models for both treatment and censoring are correctly specified. We establish the asymptotic properties of the CIPWR estimator for conducting inference, and compare its finite sample performance with that of several alternatives through simulation studies. The methods under comparison are applied to a cohort of prostate cancer patients from an insurance claims database for comparing the adverse effects of four candidate drugs for advanced stage prostate cancer.
Collapse
Affiliation(s)
- Youfei Yu
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA
| | - Min Zhang
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA
| | - Bhramar Mukherjee
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
35
|
Yang J, Schuemie MJ, Ji X, Suchard MA. Massive Parallelization of Massive Sample-size Survival Analysis. J Comput Graph Stat 2023; 33:289-302. [PMID: 38716090 PMCID: PMC11070748 DOI: 10.1080/10618600.2023.2213279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 05/05/2023] [Indexed: 01/06/2025]
Abstract
Large-scale observational health databases are increasingly popular for conducting comparative effectiveness and safety studies of medical products. However, increasing number of patients poses computational challenges when fitting survival regression models in such studies. In this paper, we use graphics processing units (GPUs) to parallelize the computational bottlenecks of massive sample-size survival analyses. Specifically, we develop and apply time- and memory-efficient single-pass parallel scan algorithms for Cox proportional hazards models and forward-backward parallel scan algorithms for Fine-Gray models for analysis with and without a competing risk using a cyclic coordinate descent optimization approach. We demonstrate that GPUs accelerate the computation of fitting these complex models in large databases by orders of magnitude as compared to traditional multi-core CPU parallelism. Our implementation enables efficient large-scale observational studies involving millions of patients and thousands of patient characteristics. The above implementation is available in the open-source R package Cyclops (Suchard et al., 2013).
Collapse
Affiliation(s)
- Jianxiao Yang
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Martijn J Schuemie
- Department of Biostatistics, Fielding School of Public Health at UCLA, Los Angeles, CA, USA
- Janssen Research and Development, Titusville, NJ, USA
| | - Xiang Ji
- Department of Mathematics, Tulane University, New Orleans, Louisiana, USA
| | - Marc A Suchard
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- Department of Biostatistics, Fielding School of Public Health at UCLA, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- VA Informatics and Computing Infrastructure, US Department of Veterans Affairs, Salt Lake City, UT, USA
| |
Collapse
|
36
|
Bian Z, Moodie EEM, Shortreed SM, Bhatnagar S. Variable selection in regression-based estimation of dynamic treatment regimes. Biometrics 2023; 79:988-999. [PMID: 34837380 PMCID: PMC11350356 DOI: 10.1111/biom.13608] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 11/18/2021] [Indexed: 11/27/2022]
Abstract
Dynamic treatment regimes (DTRs) consist of a sequence of decision rules, one per stage of intervention, that aim to recommend effective treatments for individual patients according to patient information history. DTRs can be estimated from models which include interactions between treatment and a (typically small) number of covariates which are often chosen a priori. However, with increasingly large and complex data being collected, it can be difficult to know which prognostic factors might be relevant in the treatment rule. Therefore, a more data-driven approach to select these covariates might improve the estimated decision rules and simplify models to make them easier to interpret. We propose a variable selection method for DTR estimation using penalized dynamic weighted least squares. Our method has the strong heredity property, that is, an interaction term can be included in the model only if the corresponding main terms have also been selected. We show our method has both the double robustness property and the oracle property theoretically; and the newly proposed method compares favorably with other variable selection approaches in numerical studies. We further illustrate the proposed method on data from the Sequenced Treatment Alternatives to Relieve Depression study.
Collapse
Affiliation(s)
- Zeyu Bian
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada
| | - Erica E. M. Moodie
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada
| | - Susan M. Shortreed
- Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA
- Department of Biostatistics, University of Washington, Seattle, Washington, USA
| | - Sahir Bhatnagar
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada
- Department of Diagnostic Radiology, McGill University, Montreal, Quebec, Canada
| |
Collapse
|
37
|
Lee D, Yang S, Dong L, Wang X, Zeng D, Cai J. Improving trial generalizability using observational studies. Biometrics 2023; 79:1213-1225. [PMID: 34862966 PMCID: PMC9166225 DOI: 10.1111/biom.13609] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 11/06/2021] [Accepted: 11/22/2021] [Indexed: 11/29/2022]
Abstract
Complementary features of randomized controlled trials (RCTs) and observational studies (OSs) can be used jointly to estimate the average treatment effect of a target population. We propose a calibration weighting estimator that enforces the covariate balance between the RCT and OS, therefore improving the trial-based estimator's generalizability. Exploiting semiparametric efficiency theory, we propose a doubly robust augmented calibration weighting estimator that achieves the efficiency bound derived under the identification assumptions. A nonparametric sieve method is provided as an alternative to the parametric approach, which enables the robust approximation of the nuisance functions and data-adaptive selection of outcome predictors for calibration. We establish asymptotic results and confirm the finite sample performances of the proposed estimators by simulation experiments and an application on the estimation of the treatment effect of adjuvant chemotherapy for early-stage non-small-cell lung patients after surgery.
Collapse
Affiliation(s)
- Dasom Lee
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, USA
| | - Shu Yang
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, USA
| | - Lin Dong
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, USA
| | - Xiaofei Wang
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, USA
| | - Donglin Zeng
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Jianwen Cai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
38
|
Gupta S. Model-Selection Inference for Causal Impact of Clusters and Collaboration on MSMEs in India. JOURNAL OF QUANTITATIVE ECONOMICS : JOURNAL OF THE INDIAN ECONOMETRIC SOCIETY 2023; 21:1-22. [PMID: 37360927 PMCID: PMC10157132 DOI: 10.1007/s40953-023-00349-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 04/14/2023] [Indexed: 06/28/2023]
Abstract
Do firms benefit more from agglomeration-based spillovers than the technical know-how obtained through inter-firm collaboration? Quantifying the relative value of the industrial policy of cluster development vis-à-vis firm's internal decision of collaboration can be valuable for policy-makers and entrepreneurs. I observe the universe of Indian MSMEs inside an industrial cluster (Treatment Group 1), those in collaboration for technical know-how (Treatment Group 2) and those outside clusters with no collaboration (Control Group). Conventional econometric methods to identify the treatment effects would suffer from selection bias and misspecification of the model. I use two data-driven, model-selection methods, developed by (Belloni, A., Chernozhukov, V., and Hansen, C. (2013). Inference on treatment e ects after selection among high-dimensional controls. Review of Economic Studies, 81(2):608 650.) and (Chernozhukov, V., Hansen, C., and Spindler, M. (2015). Post selection and post regulariza- tion inference in linear models with many controls and instruments. American Economic Review, 105(5):486 490.), to estimate the causal impact of the treatments on GVA of firms. The results suggest that ATE of cluster and collaboration is nearly equal at 30%. I conclude by offering policy implications.
Collapse
Affiliation(s)
- Samarth Gupta
- Amrut Mody School of Management, Ahmedabad University, Ahmedabad, India
| |
Collapse
|
39
|
Wyss R, Plasek JM, Zhou L, Bessette LG, Schneeweiss S, Rassen JA, Tsacogianis T, Lin KJ. Scalable Feature Engineering from Electronic Free Text Notes to Supplement Confounding Adjustment of Claims-Based Pharmacoepidemiologic Studies. Clin Pharmacol Ther 2023; 113:832-838. [PMID: 36528788 PMCID: PMC10913938 DOI: 10.1002/cpt.2826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 12/08/2022] [Indexed: 12/23/2022]
Abstract
Natural language processing (NLP) tools turn free-text notes (FTNs) from electronic health records (EHRs) into data features that can supplement confounding adjustment in pharmacoepidemiologic studies. However, current applications are difficult to scale. We used unsupervised NLP to generate high-dimensional feature spaces from FTNs to improve prediction of drug exposure and outcomes compared with claims-based analyses. We linked Medicare claims with EHR data to generate three cohort studies comparing different classes of medications on the risk of various clinical outcomes. We used "bag-of-words" to generate features for the top 20,000 most prevalent terms from FTNs. We compared machine learning (ML) prediction algorithms using different sets of candidate predictors: Set1 (39 researcher-specified variables), Set2 (Set1 + ML-selected claims codes), and Set3 (Set1 + ML-selected NLP-generated features), vs. Set4 (Set1 + 2 + 3). When modeling treatment choice, we observed a consistent pattern across the examples: ML models utilizing Set4 performed best followed by Set2, Set3, then Set1. When modeling the outcome risk, there was little to no improvement beyond models based on Set1. Supplementing claims data with NLP-generated features from free text notes improved prediction of prescribing choices but had little or no improvement on clinical risk prediction. These findings have implications for strategies to improve confounding using EHR data in pharmacoepidemiologic studies.
Collapse
Affiliation(s)
- Richard Wyss
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA
| | - Joseph M. Plasek
- Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA
| | - Li Zhou
- Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA
| | - Lily G. Bessette
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA
| | - Sebastian Schneeweiss
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA
| | | | - Theodore Tsacogianis
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA
| | - Kueiyu Joshua Lin
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA
- Department of Medicine, Massachusetts General Hospital, Harvard Medical School
| |
Collapse
|
40
|
Baldé I, Yang YA, Lefebvre G. Reader reaction to "Outcome-adaptive lasso: Variable selection for causal inference" by Shortreed and Ertefaie (2017). Biometrics 2023; 79:514-520. [PMID: 35642320 DOI: 10.1111/biom.13683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Accepted: 08/26/2021] [Indexed: 11/29/2022]
Abstract
Shortreed and Ertefaie introduced a clever propensity score variable selection approach for estimating average causal effects, namely, the outcome adaptive lasso (OAL). OAL aims to select desirable covariates, confounders, and predictors of outcome, to build an unbiased and statistically efficient propensity score estimator. Due to its design, a potential limitation of OAL is how it handles the collinearity problem, which is often encountered in high-dimensional data. As seen in Shortreed and Ertefaie, OAL's performance degraded with increased correlation between covariates. In this note, we propose the generalized OAL (GOAL) that combines the strengths of the adaptively weighted L1 penalty and the elastic net to better handle the selection of correlated covariates. Two different versions of GOAL, which differ in their procedure (algorithm), are proposed. We compared OAL and GOAL in simulation scenarios that mimic those examined by Shortreed and Ertefaie. Although all approaches performed equivalently with independent covariates, we found that both GOAL versions were more performant than OAL in low and high dimensions with correlated covariates.
Collapse
Affiliation(s)
- Ismaila Baldé
- Département de mathématiques, Université du Québec à Montréal, Montréal, Québec, Canada
| | - Yi Archer Yang
- Department of Mathematics and Statistics, McGill University, Montréal, Québec, Canada
| | - Geneviève Lefebvre
- Département de mathématiques, Université du Québec à Montréal, Montréal, Québec, Canada
| |
Collapse
|
41
|
Moosavi N, Häggström J, de Luna X. The Costs and Benefits of Uniformly Valid Causal Inference with High-Dimensional Nuisance Parameters. Stat Sci 2023. [DOI: 10.1214/21-sts843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Affiliation(s)
- Niloofar Moosavi
- Niloofar Moosavi is Ph.D. Student, Department of Statistics, USBE, Umeå University, 901 87, Umeå, Sweden
| | - Jenny Häggström
- Jenny Häggström is Associate Professor, Department of Statistics, USBE, Umeå University, 901 87, Umeå, Sweden
| | - Xavier de Luna
- Xavier de Luna is Professor, Department of Statistics, USBE, Umeå University, 901 87, Umeå, Sweden
| |
Collapse
|
42
|
Yi GY, Chen LP. Estimation of the average treatment effect with variable selection and measurement error simultaneously addressed for potential confounders. Stat Methods Med Res 2023; 32:691-711. [PMID: 36694932 PMCID: PMC10119903 DOI: 10.1177/09622802221146308] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
In the framework of causal inference, the inverse probability weighting estimation method and its variants have been commonly employed to estimate the average treatment effect. Such methods, however, are challenged by the presence of irrelevant pre-treatment variables and measurement error. Ignoring these features and naively applying the usual inverse probability weighting estimation procedures may typically yield biased inference results. In this article, we develop an inference method for estimating the average treatment effect with those features taken into account. We establish theoretical properties for the resulting estimator and carry out numerical studies to assess the finite sample performance of the proposed estimator.
Collapse
Affiliation(s)
- Grace Y Yi
- Department of Statistical and Actuarial Sciences, 6221University of Western Ontario, London, Canada.,Department of Computer Science, 6221University of Western Ontario, London, Canada
| | - Li-Pang Chen
- Department of Statistical and Actuarial Sciences, 6221University of Western Ontario, London, Canada.,Department of Statistics, 34913National Chengchi University, Taipei, Taiwan
| |
Collapse
|
43
|
Papadogeorgou G. Discussion on "Spatial+: a novel approach to spatial confounding" by Emiko Dupont, Simon N. Wood, and Nicole H. Augustin. Biometrics 2022; 78:1305-1308. [PMID: 35712896 DOI: 10.1111/biom.13655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Revised: 09/23/2021] [Accepted: 10/12/2021] [Indexed: 12/30/2022]
Abstract
I congratulate Dupont, Wood, and Augustin (DWA hereon) for providing an easy-to-implement method for estimation in the presence of spatial confounding, and for addressing some of the complicated aspects on the topic. I discuss conceptual and operational issues that are fundamental to inference in spatial settings: (i) the target quantity and its interpretability, (ii) the nonspatial aspect of covariates and their relative spatial scales, and (iii) the impact of spatial smoothing. While DWA provide some insights on these issues, I believe that the audience might benefit from a deeper discussion.
Collapse
|
44
|
High-dimensional causal mediation analysis based on partial linear structural equation models. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
45
|
Remiro-Azócar A. Two-stage matching-adjusted indirect comparison. BMC Med Res Methodol 2022; 22:217. [PMID: 35941551 PMCID: PMC9358807 DOI: 10.1186/s12874-022-01692-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 07/19/2022] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND Anchored covariate-adjusted indirect comparisons inform reimbursement decisions where there are no head-to-head trials between the treatments of interest, there is a common comparator arm shared by the studies, and there are patient-level data limitations. Matching-adjusted indirect comparison (MAIC), based on propensity score weighting, is the most widely used covariate-adjusted indirect comparison method in health technology assessment. MAIC has poor precision and is inefficient when the effective sample size after weighting is small. METHODS A modular extension to MAIC, termed two-stage matching-adjusted indirect comparison (2SMAIC), is proposed. This uses two parametric models. One estimates the treatment assignment mechanism in the study with individual patient data (IPD), the other estimates the trial assignment mechanism. The first model produces inverse probability weights that are combined with the odds weights produced by the second model. The resulting weights seek to balance covariates between treatment arms and across studies. A simulation study provides proof-of-principle in an indirect comparison performed across two randomized trials. Nevertheless, 2SMAIC can be applied in situations where the IPD trial is observational, by including potential confounders in the treatment assignment model. The simulation study also explores the use of weight truncation in combination with MAIC for the first time. RESULTS Despite enforcing randomization and knowing the true treatment assignment mechanism in the IPD trial, 2SMAIC yields improved precision and efficiency with respect to MAIC in all scenarios, while maintaining similarly low levels of bias. The two-stage approach is effective when sample sizes in the IPD trial are low, as it controls for chance imbalances in prognostic baseline covariates between study arms. It is not as effective when overlap between the trials' target populations is poor and the extremity of the weights is high. In these scenarios, truncation leads to substantial precision and efficiency gains but induces considerable bias. The combination of a two-stage approach with truncation produces the highest precision and efficiency improvements. CONCLUSIONS Two-stage approaches to MAIC can increase precision and efficiency with respect to the standard approach by adjusting for empirical imbalances in prognostic covariates in the IPD trial. Further modules could be incorporated for additional variance reduction or to account for missingness and non-compliance in the IPD trial.
Collapse
Affiliation(s)
- Antonio Remiro-Azócar
- Medical Affairs Statistics, Bayer plc, 400 South Oak Way, Reading, UK.
- Department of Statistical Science, University College London, 1-19 Torrington Place, London, UK.
| |
Collapse
|
46
|
Gao Q, Zhang Y, Sun H, Wang T. Evaluation of propensity score methods for causal inference with high-dimensional covariates. Brief Bioinform 2022; 23:6603435. [PMID: 35667004 DOI: 10.1093/bib/bbac227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2021] [Revised: 05/11/2022] [Accepted: 05/17/2022] [Indexed: 11/12/2022] Open
Abstract
In recent work, researchers have paid considerable attention to the estimation of causal effects in observational studies with a large number of covariates, which makes the unconfoundedness assumption plausible. In this paper, we review propensity score (PS) methods developed in high-dimensional settings and broadly group them into model-based methods that extend models for prediction to causal inference and balance-based methods that combine covariate balancing constraints. We conducted systematic simulation experiments to evaluate these two types of methods, and studied whether the use of balancing constraints further improved estimation performance. Our comparison methods were post-double-selection (PDS), double-index PS (DiPS), outcome-adaptive LASSO (OAL), group LASSO and doubly robust estimation (GLiDeR), high-dimensional covariate balancing PS (hdCBPS), regularized calibrated estimators (RCAL) and approximate residual balancing method (balanceHD). For the four model-based methods, simulation studies showed that GLiDeR was the most stable approach, with high estimation accuracy and precision, followed by PDS, OAL and DiPS. For balance-based methods, hdCBPS performed similarly to GLiDeR in terms of accuracy, and outperformed balanceHD and RCAL. These findings imply that PS methods do not benefit appreciably from covariate balancing constraints in high-dimensional settings. In conclusion, we recommend the preferential use of GLiDeR and hdCBPS approaches for estimating causal effects in high-dimensional settings; however, further studies on the construction of valid confidence intervals are required.
Collapse
Affiliation(s)
- Qian Gao
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Yu Zhang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Hongwei Sun
- Department of Health Statistics, School of Public Health and Management, Binzhou Medical University, Yantai, China
| | - Tong Wang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| |
Collapse
|
47
|
Moodie EEM, Coulombe J, Danieli C, Renoux C, Shortreed SM. Privacy-preserving estimation of an optimal individualized treatment rule: a case study in maximizing time to severe depression-related outcomes. LIFETIME DATA ANALYSIS 2022; 28:512-542. [PMID: 35499604 PMCID: PMC10805063 DOI: 10.1007/s10985-022-09554-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 03/08/2022] [Indexed: 06/14/2023]
Abstract
Estimating individualized treatment rules-particularly in the context of right-censored outcomes-is challenging because the treatment effect heterogeneity of interest is often small, thus difficult to detect. While this motivates the use of very large datasets such as those from multiple health systems or centres, data privacy may be of concern with participating data centres reluctant to share individual-level data. In this case study on the treatment of depression, we demonstrate an application of distributed regression for privacy protection used in combination with dynamic weighted survival modelling (DWSurv) to estimate an optimal individualized treatment rule whilst obscuring individual-level data. In simulations, we demonstrate the flexibility of this approach to address local treatment practices that may affect confounding, and show that DWSurv retains its double robustness even when performed through a (weighted) distributed regression approach. The work is motivated by, and illustrated with, an analysis of treatment for unipolar depression using the United Kingdom's Clinical Practice Research Datalink.
Collapse
Affiliation(s)
- Erica E M Moodie
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montréal, QC, Canada.
| | - Janie Coulombe
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montréal, QC, Canada
| | - Coraline Danieli
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montréal, QC, Canada
| | - Christel Renoux
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montréal, QC, Canada
- Centre for Clinical Epidemiology, Lady Davis Institute for Medical Research, Jewish General Hospital, Montréal, QC, Canada
- Department of Neurology and Neurosurgery, McGill University, Montréal, QC, Canada
| | - Susan M Shortreed
- Biostatistics Unit, Kaiser Permanente Washington Health Research Institute, Seattle, USA
- Biostatistics Department, University of Washington, Seattle, USA
| |
Collapse
|
48
|
Wyss R, Schneeweiss S, Lin KJ, Miller DP, Kalilani L, Franklin JM. Synthetic Negative Controls: Using Simulation to Screen Large-scale Propensity Score Analyses. Epidemiology 2022; 33:541-550. [PMID: 35439779 PMCID: PMC9156547 DOI: 10.1097/ede.0000000000001482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The propensity score has become a standard tool to control for large numbers of variables in healthcare database studies. However, little has been written on the challenge of comparing large-scale propensity score analyses that use different methods for confounder selection and adjustment. In these settings, balance diagnostics are useful but do not inform researchers on which variables balance should be assessed or quantify the impact of residual covariate imbalance on bias. Here, we propose a framework to supplement balance diagnostics when comparing large-scale propensity score analyses. Instead of focusing on results from any single analysis, we suggest conducting and reporting results for many analytic choices and using both balance diagnostics and synthetically generated control studies to screen analyses that show signals of bias caused by measured confounding. To generate synthetic datasets, the framework does not require simulating the outcome-generating process. In healthcare database studies, outcome events are often rare, making it difficult to identify and model all predictors of the outcome to simulate a confounding structure closely resembling the given study. Therefore, the framework uses a model for treatment assignment to divide the comparator population into pseudo-treatment groups where covariate differences resemble those in the study cohort. The partially simulated datasets have a confounding structure approximating the study population under the null (synthetic negative control studies). The framework is used to screen analyses that likely violate partial exchangeability due to lack of control for measured confounding. We illustrate the framework using simulations and an empirical example.
Collapse
Affiliation(s)
- Richard Wyss
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA
| | - Sebastian Schneeweiss
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA
| | - Kueiyu Joshua Lin
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA
- Division of General Internal Medicine, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA
| | | | | | - Jessica M Franklin
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA
| |
Collapse
|
49
|
Wyss R, Yanover C, El-Hay T, Bennett D, Platt RW, Zullo AR, Sari G, Wen X, Ye Y, Yuan H, Gokhale M, Patorno E, Lin KJ. Machine learning for improving high-dimensional proxy confounder adjustment in healthcare database studies: an overview of the current literature. Pharmacoepidemiol Drug Saf 2022; 31:932-943. [PMID: 35729705 PMCID: PMC9541861 DOI: 10.1002/pds.5500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 06/01/2022] [Accepted: 06/05/2022] [Indexed: 11/10/2022]
Abstract
Controlling for large numbers of variables that collectively serve as 'proxies' for unmeasured factors can often improve confounding control in pharmacoepidemiologic studies utilizing administrative healthcare databases. There is a growing body of evidence showing that data-driven machine learning algorithms for high-dimensional proxy confounder adjustment can supplement investigator-specified variables to improve confounding control compared to adjustment based on investigator-specified variables alone. Consequently, there has been a recent focus on the development of data-driven methods for high-dimensional proxy confounder adjustment. In this paper, we discuss the considerations underpinning three areas for data-driven high-dimensional proxy confounder adjustment: 1) feature generation-transforming raw data into covariates (or features) to be used for proxy adjustment; 2) covariate prioritization, selection and adjustment; and 3) diagnostic assessment. We survey current approaches and recent advancements within each area, including the most widely used approach to proxy confounder adjustment in healthcare database studies (the high-dimensional propensity score or hdPS). We also discuss limitations of the hdPS and outline recent advancements that incorporate the principles of proxy adjustment with machine learning extensions to improve performance. We further discuss challenges and avenues of future development within each area. This manuscript is endorsed by the International Society for Pharmacoepidemiology (ISPE). This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Richard Wyss
- Division of Pharmacoepidemioogy and Pharmacoeconomics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | | | - Tal El-Hay
- KI Research Institute, Kfar Malal, Israel.,IBM Research-Haifa Labs, Haifa, Israel
| | - Dimitri Bennett
- Global Evidence and Outcomes, Takeda Pharmaceutical Company Ltd., Cambridge, MA, USA
| | | | - Andrew R Zullo
- Department of Health Services, Policy, and Practice, Brown University School of Public Health and Center of Innovation in Long-Term Services and Supports, Providence Veterans Affairs Medical Center, Providence, RI, USA
| | - Grammati Sari
- Real World Evidence Strategy Lead, Visible Analytics Ltd, Oxford, UK
| | - Xuerong Wen
- Health Outcomes, Pharmacy Practice, College of Pharmacy, University of Rhode Island, Kingston, RI, USA
| | - Yizhou Ye
- Global Epidemiology, AbbVie Inc. North Chicago, IL, USA
| | - Hongbo Yuan
- Canadian Agency for Drugs and Technologies in Health, Ottawa, Canada
| | - Mugdha Gokhale
- Pharmacoepidemiology, Center for Observational and Real-world Evidence, Merck, PA, USA
| | - Elisabetta Patorno
- Division of Pharmacoepidemioogy and Pharmacoeconomics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Kueiyu Joshua Lin
- Division of Pharmacoepidemioogy and Pharmacoeconomics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.,Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
50
|
Yu D, Wang L, Kong D, Zhu H. Mapping the Genetic-Imaging-Clinical Pathway with Applications to Alzheimer’s Disease. J Am Stat Assoc 2022; 117:1656-1668. [PMID: 37009529 PMCID: PMC10062702 DOI: 10.1080/01621459.2022.2087658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Alzheimer's disease is a progressive form of dementia that results in problems with memory, thinking, and behavior. It often starts with abnormal aggregation and deposition of β amyloid and tau, followed by neuronal damage such as atrophy of the hippocampi, leading to Alzheimers Disease (AD). The aim of this paper is to map the genetic-imaging-clinical pathway for AD in order to delineate the genetically-regulated brain changes that drive disease progression based on the Alzheimers Disease Neuroimaging Initiative (ADNI) dataset. We develop a novel two-step approach to delineate the association between high-dimensional 2D hippocampal surface exposures and the Alzheimers Disease Assessment Scale (ADAS) cognitive score, while taking into account the ultra-high dimensional clinical and genetic covariates at baseline. Analysis results suggest that the radial distance of each pixel of both hippocampi is negatively associated with the severity of behavioral deficits conditional on observed clinical and genetic covariates. These associations are stronger in Cornu Ammonis region 1 (CA1) and subiculum subregions compared to Cornu Ammonis region 2 (CA2) and Cornu Ammonis region 3 (CA3) subregions. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
Collapse
Affiliation(s)
- Dengdeng Yu
- Department of Mathematics, University of Texas at Arlington
| | - Linbo Wang
- Department of Statistical Sciences, University of Toronto
| | - Dehan Kong
- Department of Statistical Sciences, University of Toronto
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina, Chapel Hill for the Alzheimer’s Disease Neuroimaging Initiative*
| |
Collapse
|