1
|
Chiaruttini MV, Lorenzoni G, Daverio M, Marchetto L, Izzo F, Chidini G, Picconi E, Nettuno C, Zanonato E, Sagredini R, Rossetti E, Mondardini MC, Cecchetti C, Vitale P, Alaimo N, Colosimo D, Sacco F, Genoni G, Perrotta D, Micalizzi C, Moggia S, Chisari G, Rulli I, Wolfler A, Amigoni A, Gregori D. Non-Invasive Ventilation Failure in Pediatric ICU: A Machine Learning Driven Prediction. Diagnostics (Basel) 2024; 14:2857. [PMID: 39767219 PMCID: PMC11675706 DOI: 10.3390/diagnostics14242857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2024] [Revised: 12/16/2024] [Accepted: 12/18/2024] [Indexed: 01/11/2025] Open
Abstract
Background/Objectives: Non-invasive ventilation (NIV) has emerged as a possible first-step treatment to avoid invasive intubation in pediatric intensive care units (PICUs) due to its advantages in reducing intubation-associated risks. However, the timely identification of NIV failure is crucial to prevent adverse outcomes. This study aims to identify predictors of first-attempt NIV failure in PICU patients by testing various machine learning techniques and comparing their predictive abilities. Methods: Data were sourced from the TIPNet registry, which comprised patients admitted to 23 Italian Paediatric Intensive Care Units (PICUs). We selected patients between January 2010 and January 2024 who received non-invasive ventilation (NIV) as their initial approach to respiratory support. The study aimed to develop a predictive model for NIV failure, selecting the best Machine Learning technique, including Generalized Linear Models, Random Forest, Extreme Gradient Boosting, and Neural Networks. Additionally, an ensemble approach was implemented. Model performances were measured using sensitivity, specificity, AUROC, and predictive values. Moreover, the model calibration was evaluated. Results: Out of 43,794 records, 1861 admissions met the inclusion criteria, with 678 complete cases and 97 NIV failures. The RF model demonstrated the highest AUROC and sensitivity equal to 0.83 (0.64, 0.94). Base excess, weight, age, systolic blood pressure, and fraction of inspired oxygen were identified as the most predictive features. A check for model calibration ensured the model's reliability in predicting NIV failure probabilities. Conclusions: This study identified highly sensitive models for predicting NIV failure in PICU patients, with RF as a robust option.
Collapse
Affiliation(s)
- Maria Vittoria Chiaruttini
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, and Vascular Sciences and Public Health, University of Padova, Via Loredan 18, 35131 Padova, Italy; (M.V.C.); (G.L.)
| | - Giulia Lorenzoni
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, and Vascular Sciences and Public Health, University of Padova, Via Loredan 18, 35131 Padova, Italy; (M.V.C.); (G.L.)
| | - Marco Daverio
- Pediatric Intensive Care Unit, Department of Women’s and Children’s Health, University Hospital of Padova, Via Giustiniani 3, 35128 Padova, Italy; (M.D.); (L.M.); (A.A.)
| | - Luca Marchetto
- Pediatric Intensive Care Unit, Department of Women’s and Children’s Health, University Hospital of Padova, Via Giustiniani 3, 35128 Padova, Italy; (M.D.); (L.M.); (A.A.)
| | - Francesca Izzo
- Pediatric Intensive Care Unit, Buzzi Children’s Hospital, Via Lodovico Castelvetro 32, 20154 Milan, Italy;
| | - Giovanna Chidini
- Department of Anesthesia Resuscitation Emergency Care, Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico di Milano, Via Francesco Sforza 35, 20122 Milan, Italy;
| | - Enzo Picconi
- Pediatric Intensive Care Unit, Pediatric Trauma Center, Fondazione IRCCS Policlinico Universitario “A. Gemelli”, Largo Agostino Gemelli 8, 00136 Rome, Italy;
| | - Claudio Nettuno
- Anaesthesia and Pediatric Resuscitation, AOU Alessandria, SS Antonio e Biagio e Cesare Arrigo Hospital, Spalto Marengo 43, 15121 Alessandria, Italy;
| | - Elisa Zanonato
- Pediatric Intensive Care Unit, San Bortolo Hospital, Viale Ferdinando Rodolfi 37, 36100 Vicenza, Italy;
| | - Raffaella Sagredini
- Anesthesia and Resuscitation Unit, IRCCS Burlo Garofolo, Via dell’Istria 65, 34137 Trieste, Italy;
| | - Emanuele Rossetti
- Anaesthesia, Emergency and Pediatric Intensive Care Unit, Bambino Gesu’ Children Hospital IRCCS, Piazza di Sant’Onofrio 4, 00165 Rome, Italy;
| | | | - Corrado Cecchetti
- Department of Emergency Acceptance, Bambino Gesù Children’s Hospital, Piazza di Sant’Onofrio 4, 00165 Rome, Italy;
| | - Pasquale Vitale
- Pediatric and Neonatal Intensive Care Unit, Children’s Hospital Regina Margherita, Piazza Polonia 94, 10126 Turin, Italy;
| | - Nicola Alaimo
- ARNAS G. di Cristina Hospital, 90127 Palermo, Italy;
| | - Denise Colosimo
- Pediatric Intensive Care Unit, Children’s Hospital Meyer, IRCCS, Viale Gaetano Pieraccini 24, 50139 Florence, Italy;
| | - Francesco Sacco
- Paediatric Intensive Care Unit, Azienda Ospedaliera Universitaria Integrata di Verona, Piazzale Aristide Stefani 1, 37126 Verona, Italy;
| | - Giulia Genoni
- Neonatal and Pediatric Intensive Care Unit, Maggiore della Carità University Hospital, L.go Bellini, 28100 Novara, Italy;
| | - Daniela Perrotta
- A.R.C.O. Palidoro, Bambino Gesù Children’s Hospital, Piazza di Sant’Onofrio 4, 00165 Rome, Italy;
| | - Camilla Micalizzi
- Pediatric and Neonatal Intensive Care Unit, IRCCS G Gaslini, Via Gerolamo Gaslini 5, 16147 Genoa, Italy;
| | - Silvia Moggia
- Pediatric Intensive Care Unit, AORN Santobono-Pausilipon, Via della Croce Rossa 8, 80122 Naples, Italy;
| | - Giosuè Chisari
- UOSD Pediatric Resuscitation, ARNAS Garibaldi PO Nesima, Piazza Santa Maria di Gesù 5, 95124 Catania, Italy;
| | - Immacolata Rulli
- UOC Neonatal Pathology and TIN, AOU G MARTINO, Via Consolare Valeria 1, 98124 Messina, Italy;
| | - Andrea Wolfler
- Department of Emergency, Division of Anesthesia IRCCS G Gaslini, Via Gerolamo Gaslini 5, 16147 Genoa, Italy;
| | - Angela Amigoni
- Pediatric Intensive Care Unit, Department of Women’s and Children’s Health, University Hospital of Padova, Via Giustiniani 3, 35128 Padova, Italy; (M.D.); (L.M.); (A.A.)
| | - Dario Gregori
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, and Vascular Sciences and Public Health, University of Padova, Via Loredan 18, 35131 Padova, Italy; (M.V.C.); (G.L.)
| |
Collapse
|
2
|
Wolock CJ, Williamson BD, Shortreed SM, Simon GE, Coleman KJ, Yeargans R, Ahmedani BK, Daida Y, Lynch FL, Rossom RC, Ziebell RA, Cruz M, Wellman RD, Coley RY. Importance of variables from different time frames for predicting self-harm using health system data. J Biomed Inform 2024; 160:104750. [PMID: 39557209 PMCID: PMC11891787 DOI: 10.1016/j.jbi.2024.104750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Revised: 09/20/2024] [Accepted: 11/08/2024] [Indexed: 11/20/2024]
Abstract
OBJECTIVE Self-harm risk prediction models developed using health system data (electronic health records and insurance claims information) often use patient information from up to several years prior to the index visit when the prediction is made. Measurements from some time periods may not be available for all patients. Using the framework of algorithm-agnostic variable importance, we study the predictive potential of variables corresponding to different time horizons prior to the index visit and demonstrate the application of variable importance techniques in the biomedical informatics setting. MATERIALS AND METHODS We use variable importance to quantify the potential of recent (up to three months before the index visit) and distant (more than one year before the index visit) patient mental health information for predicting self-harm risk using data from seven health systems. We quantify importance as the decrease in predictiveness when the variable set of interest is excluded from the prediction task. We define predictiveness using discriminative metrics: area under the receiver operating characteristic curve (AUC), sensitivity, and positive predictive value. RESULTS Mental health predictors corresponding to the three months prior to the index visit show strong signal of importance; in one setting, excluding these variables decreased AUC from 0.85 to 0.77. Predictors corresponding to more distant information were less important. DISCUSSION Predictors from the months immediately preceding the index visit are highly important. Implementation of self-harm prediction models may be challenging in settings where recent data are not completely available (e.g., due to lags in insurance claims processing) at the time a prediction is made. CONCLUSION Clinically derived variables from different time frames exhibit varying levels of importance for predicting self-harm. Variable importance analyses can inform whether and how to implement risk prediction models into clinical practice given real-world data limitations. These analyses be applied more broadly in biomedical informatics research to provide insight into general clinical risk prediction tasks.
Collapse
Affiliation(s)
- Charles J Wolock
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, 423 Guardian Dr., Philadelphia, PA, 19104, USA.
| | - Brian D Williamson
- Kaiser Permanente Washington Health Research Institute, 1730 Minor Ave., Suite 1360, Seattle, WA, 98101, USA; Department of Biostatistics, University of Washington, 3980 15th Ave. NE, Box 351617, Seattle, WA, 98195, USA
| | - Susan M Shortreed
- Kaiser Permanente Washington Health Research Institute, 1730 Minor Ave., Suite 1360, Seattle, WA, 98101, USA; Department of Biostatistics, University of Washington, 3980 15th Ave. NE, Box 351617, Seattle, WA, 98195, USA
| | - Gregory E Simon
- Kaiser Permanente Washington Health Research Institute, 1730 Minor Ave., Suite 1360, Seattle, WA, 98101, USA; Department of Health Systems Science, Bernard J. Tyson Kaiser Permanente School of Medicine, 98 S. Los Robles Ave., Pasadena, CA, 91101, USA
| | - Karen J Coleman
- Department of Health Systems Science, Bernard J. Tyson Kaiser Permanente School of Medicine, 98 S. Los Robles Ave., Pasadena, CA, 91101, USA; Department of Research and Evaluation, Kaiser Permanente Southern California, 100 S. Los Robles Ave., Pasadena, CA, 91101, USA
| | - Rodney Yeargans
- Department of Research and Evaluation, Kaiser Permanente Southern California, 100 S. Los Robles Ave., Pasadena, CA, 91101, USA
| | - Brian K Ahmedani
- Center for Health Policy and Health Services Research, Henry Ford Health, One Ford Place - 3A, Detroit, MI, 48202, USA
| | - Yihe Daida
- Center for Integrated Health Care Research, Kaiser Permanente Hawaii, 501 Alakawa St., Suite 201, Honolulu, HI, 96817, USA
| | - Frances L Lynch
- Center for Health Research, Kaiser Permanente Northwest, 3800 N. Interstate Ave., Portland, OR, 97227, USA
| | - Rebecca C Rossom
- HealthPartners Institute, 8170 33rd Ave. S., Bloomington, MN, 55425, USA
| | - Rebecca A Ziebell
- Kaiser Permanente Washington Health Research Institute, 1730 Minor Ave., Suite 1360, Seattle, WA, 98101, USA
| | - Maricela Cruz
- Kaiser Permanente Washington Health Research Institute, 1730 Minor Ave., Suite 1360, Seattle, WA, 98101, USA; Department of Biostatistics, University of Washington, 3980 15th Ave. NE, Box 351617, Seattle, WA, 98195, USA
| | - Robert D Wellman
- Kaiser Permanente Washington Health Research Institute, 1730 Minor Ave., Suite 1360, Seattle, WA, 98101, USA
| | - R Yates Coley
- Kaiser Permanente Washington Health Research Institute, 1730 Minor Ave., Suite 1360, Seattle, WA, 98101, USA; Department of Biostatistics, University of Washington, 3980 15th Ave. NE, Box 351617, Seattle, WA, 98195, USA
| |
Collapse
|
3
|
Smith HL, Biggs PJ, French NP, Smith ANH, Marshall JC. Out of (the) bag-encoding categorical predictors impacts out-of-bag samples. PeerJ Comput Sci 2024; 10:e2445. [PMID: 39650463 PMCID: PMC11623134 DOI: 10.7717/peerj-cs.2445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Accepted: 10/01/2024] [Indexed: 12/11/2024]
Abstract
Performance of random forest classification models is often assessed and interpreted using out-of-bag (OOB) samples. Observations which are OOB when a tree is trained may serve as a test set for that tree and predictions from the OOB observations used to calculate OOB error and variable importance measures (VIM). OOB errors are popular because they are fast to compute and, for large samples, are a good estimate of the true prediction error. In this study, we investigate how target-based vs. target-agnostic encoding of categorical predictor variables for random forest can bias performance measures based on OOB samples. We show that, when categorical variables are encoded using a target-based encoding method, and when the encoding takes place prior to bagging, the OOB sample can underestimate the true misclassification rate, and overestimate variable importance. We recommend using a separate test data set when evaluating variable importance and/or predictive performance of tree based methods that utilise a target-based encoding method.
Collapse
Affiliation(s)
- Helen L. Smith
- School of Mathematical and Computational Sciences, Massey University, Palmerston North, New Zealand
| | - Patrick J. Biggs
- School of Food Technology and Natural Sciences, Massey University, Palmerston North, New Zealand
- NZ Food Safety and Science Research Centre, Massey University, Palmerston North, New Zealand
- School of Veterinary Science, Massey University, Palmerston North, New Zealand
| | - Nigel P. French
- NZ Food Safety and Science Research Centre, Massey University, Palmerston North, New Zealand
- School of Veterinary Science, Massey University, Palmerston North, New Zealand
| | - Adam N. H. Smith
- School of Mathematical and Computational Sciences, Massey University, Auckland, New Zealand
| | - Jonathan C. Marshall
- School of Mathematical and Computational Sciences, Massey University, Palmerston North, New Zealand
| |
Collapse
|
4
|
Wolock CJ, Gilbert PB, Simon N, Carone M. Assessing variable importance in survival analysis using machine learning. Biometrika 2024; 112:asae061. [PMID: 40103753 PMCID: PMC11910984 DOI: 10.1093/biomet/asae061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Indexed: 03/20/2025] Open
Abstract
Given a collection of features available for inclusion in a predictive model, it may be of interest to quantify the relative importance of a subset of features for the prediction task at hand. For example, in HIV vaccine trials, participant baseline characteristics are used to predict the probability of HIV acquisition over the intended follow-up period, and investigators may wish to understand how much certain types of predictors, such as behavioural factors, contribute to overall predictiveness. Time-to-event outcomes such as time to HIV acquisition are often subject to right censoring, and existing methods for assessing variable importance are typically not intended to be used in this setting. We describe a broad class of algorithm-agnostic variable importance measures for prediction in the context of survival data. We propose a nonparametric efficient estimation procedure that incorporates flexible learning of nuisance parameters, yields asymptotically valid inference and enjoys double robustness. We assess the performance of our proposed procedure via numerical simulations and analyse data from the HVTN 702 vaccine trial to inform enrolment strategies for future HIV vaccine trials.
Collapse
Affiliation(s)
- C J Wolock
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, 432 Guardian Drive, Philadelphia, Pennsylvania 19104, USA
| | - P B Gilbert
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, 1100 Fairview Avenue North, PO Box 19024, Seattle, Washington 98109, USA
| | - N Simon
- Department of Biostatistics, University of Washington, 3980 15th Avenue NE, Seattle, Washington 98195, USA
| | - M Carone
- Department of Biostatistics, University of Washington, 3980 15th Avenue NE, Seattle, Washington 98195, USA
| |
Collapse
|
5
|
Williamson BD, Huang Y. Flexible variable selection in the presence of missing data. Int J Biostat 2024; 20:347-359. [PMID: 38348882 PMCID: PMC11323294 DOI: 10.1515/ijb-2023-0059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 11/21/2023] [Indexed: 05/22/2024]
Abstract
In many applications, it is of interest to identify a parsimonious set of features, or panel, from multiple candidates that achieves a desired level of performance in predicting a response. This task is often complicated in practice by missing data arising from the sampling design or other random mechanisms. Most recent work on variable selection in missing data contexts relies in some part on a finite-dimensional statistical model, e.g., a generalized or penalized linear model. In cases where this model is misspecified, the selected variables may not all be truly scientifically relevant and can result in panels with suboptimal classification performance. To address this limitation, we propose a nonparametric variable selection algorithm combined with multiple imputation to develop flexible panels in the presence of missing-at-random data. We outline strategies based on the proposed algorithm that achieve control of commonly used error rates. Through simulations, we show that our proposal has good operating characteristics and results in panels with higher classification and variable selection performance compared to several existing penalized regression approaches in cases where a generalized linear model is misspecified. Finally, we use the proposed method to develop biomarker panels for separating pancreatic cysts with differing malignancy potential in a setting where complicated missingness in the biomarkers arose due to limited specimen volumes.
Collapse
Affiliation(s)
- Brian D. Williamson
- Biostatistics Division, Kaiser Permanente Washington Health Research Institute, Seattle, USA
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, USA
- Department of Biostatistics, University of Washington, Seattle, USA
| | - Ying Huang
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, USA
- Department of Biostatistics, University of Washington, Seattle, USA
| |
Collapse
|
6
|
Kook L, Lundborg AR. Algorithm-agnostic significance testing in supervised learning with multimodal data. Brief Bioinform 2024; 25:bbae475. [PMID: 39323092 PMCID: PMC11424510 DOI: 10.1093/bib/bbae475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 09/05/2024] [Accepted: 09/10/2024] [Indexed: 09/27/2024] Open
Abstract
MOTIVATION Valid statistical inference is crucial for decision-making but difficult to obtain in supervised learning with multimodal data, e.g. combinations of clinical features, genomic data, and medical images. Multimodal data often warrants the use of black-box algorithms, for instance, random forests or neural networks, which impede the use of traditional variable significance tests. RESULTS We address this problem by proposing the use of COvariance MEasure Tests (COMETs), which are calibrated and powerful tests that can be combined with any sufficiently predictive supervised learning algorithm. We apply COMETs to several high-dimensional, multimodal data sets to illustrate (i) variable significance testing for finding relevant mutations modulating drug-activity, (ii) modality selection for predicting survival in liver cancer patients with multiomics data, and (iii) modality selection with clinical features and medical imaging data. In all applications, COMETs yield results consistent with domain knowledge without requiring data-driven pre-processing, which may invalidate type I error control. These novel applications with high-dimensional multimodal data corroborate prior results on the power and robustness of COMETs for significance testing. AVAILABILITY AND IMPLEMENTATION COMETs are implemented in the cometsR package available on CRAN and pycometsPython library available on GitHub. Source code for reproducing all results is available at https://github.com/LucasKook/comets. All data sets used in this work are openly available.
Collapse
Affiliation(s)
- Lucas Kook
- Institute for Statistics and Mathematics, Vienna University of Economics and Business, Welthandelsplatz 1, AT-1020 Vienna, Austria
| | - Anton Rask Lundborg
- Department of Mathematical Sciences, University of Copenhagen, Universitetsparken 5, DK-2100 Copenhagen, Denmark
| |
Collapse
|
7
|
Wolock CJ, Williamson BD, Shortreed SM, Simon GE, Coleman KJ, Yeargans R, Ahmedani BK, Daida Y, Lynch FL, Rossom RC, Ziebell RA, Cruz M, Wellman RD, Coley RY. Importance of variables from different time frames for predicting self-harm using health system data. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.29.24306260. [PMID: 39371167 PMCID: PMC11451820 DOI: 10.1101/2024.04.29.24306260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/08/2024]
Abstract
Objective Self-harm risk prediction models developed using health system data (electronic health records and insurance claims information) often use patient information from up to several years prior to the index visit when the prediction is made. Measurements from some time periods may not be available for all patients. Using the framework of algorithm-agnostic variable importance, we study the predictive potential of variables corresponding to different time horizons prior to the index visit and demonstrate the application of variable importance techniques in the biomedical informatics setting. Materials and Methods We use variable importance to quantify the potential of recent (up to three months before the index visit) and distant (more than one year before the index visit) patient mental health information for predicting self-harm risk using data from seven health systems. We quantify importance as the decrease in predictiveness when the variable set of interest is excluded from the prediction task. We define predictiveness using discriminative metrics: area under the receiver operating characteristic curve (AUC), sensitivity, and positive predictive value. Results Mental health predictors corresponding to the three months prior to the index visit show strong signal of importance; in one setting, excluding these variables decreased AUC from 0.85 to 0.77. Predictors corresponding to more distant information were less important. Discussion Predictors from the months immediately preceding the index visit are highly important. Implementation of self-harm prediction models may be challenging in settings where recent data are not completely available (e.g., due to lags in insurance claims processing) at the time a prediction is made. Conclusion Clinically derived variables from different time frames exhibit varying levels of importance for predicting self-harm. Variable importance analyses can inform whether and how to implement risk prediction models into clinical practice given real-world data limitations. These analyses be applied more broadly in biomedical informatics research to provide insight into general clinical risk prediction tasks.
Collapse
Affiliation(s)
- Charles J. Wolock
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania
| | - Brian D. Williamson
- Kaiser Permanente Washington Health Research Institute
- Department of Biostatistics, University of Washington
| | - Susan M. Shortreed
- Kaiser Permanente Washington Health Research Institute
- Department of Biostatistics, University of Washington
| | - Gregory E. Simon
- Kaiser Permanente Washington Health Research Institute
- Department of Health Systems Science, Bernard J. Tyson Kaiser Permanente School of Medicine
| | - Karen J. Coleman
- Department of Health Systems Science, Bernard J. Tyson Kaiser Permanente School of Medicine
- Department of Research and Evaluation, Kaiser Permanente Southern California
| | - Rodney Yeargans
- Department of Research and Evaluation, Kaiser Permanente Southern California
| | - Brian K. Ahmedani
- Center for Health Policy and Health Services Research, Henry Ford Health
| | - Yihe Daida
- Center for Integrated Health Care Research, Kaiser Permanente Hawaii
| | | | | | | | - Maricela Cruz
- Kaiser Permanente Washington Health Research Institute
- Department of Biostatistics, University of Washington
| | | | - R. Yates Coley
- Kaiser Permanente Washington Health Research Institute
- Department of Biostatistics, University of Washington
| |
Collapse
|
8
|
Fisher LH, Kee JJ, Liu A, Espinosa CM, Randhawa AK, Ludwig J, Magaret CA, Robinson ST, Gilbert PB, Hyrien O, Kublin JG, Rouphael N, Falsey AR, Sobieszczyk ME, El Sahly HM, Grinsztejn B, Gray GE, Kotloff KL, Gay CL, Leav B, Hirsch I, Struyf F, Dunkle LM, Neuzil KM, Corey L, Huang Y, Goepfert PA, Walsh SR, Baden LR, Janes H. SARS-CoV-2 Viral Load in the Nasopharynx at Time of First Infection Among Unvaccinated Individuals: A Secondary Cross-Protocol Analysis of 4 Randomized Trials. JAMA Netw Open 2024; 7:e2412835. [PMID: 38780941 PMCID: PMC11117088 DOI: 10.1001/jamanetworkopen.2024.12835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 03/20/2024] [Indexed: 05/25/2024] Open
Abstract
Importance SARS-CoV-2 viral load (VL) in the nasopharynx is difficult to quantify and standardize across settings, but it may inform transmission potential and disease severity. Objective To characterize VL at COVID-19 diagnosis among previously uninfected and unvaccinated individuals by evaluating the association of demographic and clinical characteristics, viral variant, and trial with VL, as well as the ability of VL to predict severe disease. Design, Setting, and Participants This secondary cross-protocol analysis used individual-level data from placebo recipients from 4 harmonized, phase 3 COVID-19 vaccine efficacy trials sponsored by Moderna, AstraZeneca, Janssen, and Novavax. Participants were SARS-CoV-2 negative at baseline and acquired COVID-19 during the blinded phase of the trials. The setting included the US, Brazil, South Africa, Colombia, Argentina, Peru, Chile, and Mexico; start dates were July 27, 2020, to December 27, 2020; data cutoff dates were March 26, 2021, to July 30, 2021. Statistical analysis was performed from November 2022 to June 2023. Main Outcomes and Measures Linear regression was used to assess the association of demographic and clinical characteristics, viral variant, and trial with polymerase chain reaction-measured log10 VL in nasal and/or nasopharyngeal swabs taken at the time of COVID-19 diagnosis. Results Among 1667 participants studied (886 [53.1%] male; 995 [59.7%] enrolled in the US; mean [SD] age, 46.7 [14.7] years; 204 [12.2%] aged 65 years or older; 196 [11.8%] American Indian or Alaska Native, 150 [9%] Black or African American, 1112 [66.7%] White; 762 [45.7%] Hispanic or Latino), median (IQR) log10 VL at diagnosis was 6.18 (4.66-7.12) log10 copies/mL. Participant characteristics and viral variant explained only 5.9% of the variability in VL. The independent factor with the highest observed differences was trial: Janssen participants had 0.54 log10 copies/mL lower mean VL vs Moderna participants (95% CI, 0.20 to 0.87 log10 copies/mL lower). In the Janssen study, which captured the largest number of COVID-19 events and variants and used the most intensive post-COVID surveillance, neither VL at diagnosis nor averaged over days 1 to 28 post diagnosis was associated with COVID-19 severity. Conclusions and Relevance In this study of placebo recipients from 4 randomized phase 3 trials, high variability was observed in SARS-CoV-2 VL at the time of COVID-19 diagnosis, and only a fraction was explained by individual participant characteristics or viral variant. These results suggest challenges for future studies of interventions seeking to influence VL and elevates the importance of standardized methods for specimen collection and viral load quantitation.
Collapse
Affiliation(s)
- Leigh H. Fisher
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, Washington
| | - Jia Jin Kee
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, Washington
| | - Albert Liu
- Bridge HIV, San Francisco Department of Public Health, San Francisco, California
| | | | - April K. Randhawa
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, Washington
| | - James Ludwig
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, Washington
| | - Craig A. Magaret
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, Washington
| | - Samuel T. Robinson
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, Washington
| | - Peter B. Gilbert
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, Washington
| | - Ollivier Hyrien
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, Washington
| | - James G. Kublin
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, Washington
| | | | - Ann R. Falsey
- Infectious Disease Division, University of Rochester, Rochester, New York
| | | | - Hana M. El Sahly
- Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, Texas
| | - Beatriz Grinsztejn
- Evandro Chagas National Institute of Infectious Diseases-Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
| | - Glenda E. Gray
- South African Medical Research Council, Cape Town, South Africa
| | - Karen L. Kotloff
- Center for Vaccine Development and Global Health, Department of Pediatrics, University of Maryland School of Medicine, Baltimore
| | - Cynthia L. Gay
- University of North Carolina School of Medicine, Chapel Hill
| | | | - Ian Hirsch
- Vaccines & Immune Therapies, BioPharmaceuticals R&D, AstraZeneca, Cambridge, United Kingdom
| | - Frank Struyf
- Janssen Research and Development, Beerse, Belgium
| | | | - Kathleen M. Neuzil
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, Maryland
| | - Lawrence Corey
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, Washington
| | - Yunda Huang
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, Washington
| | - Paul A. Goepfert
- University of Alabama at Birmingham Heersink School of Medicine, Birmingham
| | | | | | - Holly Janes
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, Washington
| |
Collapse
|
9
|
Magaret CA, Li L, deCamp AC, Rolland M, Juraska M, Williamson BD, Ludwig J, Molitor C, Benkeser D, Luedtke A, Simpkins B, Heng F, Sun Y, Carpp LN, Bai H, Dearlove BL, Giorgi EE, Jongeneelen M, Brandenburg B, McCallum M, Bowen JE, Veesler D, Sadoff J, Gray GE, Roels S, Vandebosch A, Stieh DJ, Le Gars M, Vingerhoets J, Grinsztejn B, Goepfert PA, de Sousa LP, Silva MST, Casapia M, Losso MH, Little SJ, Gaur A, Bekker LG, Garrett N, Truyers C, Van Dromme I, Swann E, Marovich MA, Follmann D, Neuzil KM, Corey L, Greninger AL, Roychoudhury P, Hyrien O, Gilbert PB. Quantifying how single dose Ad26.COV2.S vaccine efficacy depends on Spike sequence features. Nat Commun 2024; 15:2175. [PMID: 38467646 PMCID: PMC10928100 DOI: 10.1038/s41467-024-46536-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 02/29/2024] [Indexed: 03/13/2024] Open
Abstract
In the ENSEMBLE randomized, placebo-controlled phase 3 trial (NCT04505722), estimated single-dose Ad26.COV2.S vaccine efficacy (VE) was 56% against moderate to severe-critical COVID-19. SARS-CoV-2 Spike sequences were determined from 484 vaccine and 1,067 placebo recipients who acquired COVID-19. In this set of prespecified analyses, we show that in Latin America, VE was significantly lower against Lambda vs. Reference and against Lambda vs. non-Lambda [family-wise error rate (FWER) p < 0.05]. VE differed by residue match vs. mismatch to the vaccine-insert at 16 amino acid positions (4 FWER p < 0.05; 12 q-value ≤ 0.20); significantly decreased with physicochemical-weighted Hamming distance to the vaccine-strain sequence for Spike, receptor-binding domain, N-terminal domain, and S1 (FWER p < 0.001); differed (FWER ≤ 0.05) by distance to the vaccine strain measured by 9 antibody-epitope escape scores and 4 NTD neutralization-impacting features; and decreased (p = 0.011) with neutralization resistance level to vaccinee sera. VE against severe-critical COVID-19 was stable across most sequence features but lower against the most distant viruses.
Collapse
Affiliation(s)
- Craig A Magaret
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Li Li
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Allan C deCamp
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Morgane Rolland
- US Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, MD, USA
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc, Bethesda, MD, USA
| | - Michal Juraska
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Brian D Williamson
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Biostatistics Division, Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - James Ludwig
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Cindy Molitor
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - David Benkeser
- Departments of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Alex Luedtke
- Department of Statistics, University of Washington, Seattle, WA, USA
| | - Brian Simpkins
- Department of Computer Science, Pitzer College, Claremont, CA, USA
| | - Fei Heng
- University of North Florida, Jacksonville, FL, USA
| | - Yanqing Sun
- University of North Carolina at Charlotte, Charlotte, NC, USA
| | - Lindsay N Carpp
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Hongjun Bai
- US Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, MD, USA
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc, Bethesda, MD, USA
| | - Bethany L Dearlove
- US Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, MD, USA
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc, Bethesda, MD, USA
| | - Elena E Giorgi
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Mandy Jongeneelen
- Johnson & Johnson Innovative Medicine, Janssen Vaccines & Prevention B.V, Leiden, The Netherlands
| | - Boerries Brandenburg
- Johnson & Johnson Innovative Medicine, Janssen Vaccines & Prevention B.V, Leiden, The Netherlands
| | - Matthew McCallum
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - John E Bowen
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - David Veesler
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Jerald Sadoff
- Johnson & Johnson Innovative Medicine, Janssen Vaccines & Prevention B.V, Leiden, The Netherlands
| | - Glenda E Gray
- Perinatal HIV Research Unit, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
- South African Medical Research Council, Cape Town, South Africa
| | - Sanne Roels
- Janssen R&D, a division of Janssen Pharmaceutica NV, Beerse, Belgium
| | - An Vandebosch
- Janssen R&D, a division of Janssen Pharmaceutica NV, Beerse, Belgium
| | - Daniel J Stieh
- Johnson & Johnson Innovative Medicine, Janssen Vaccines & Prevention B.V, Leiden, The Netherlands
| | - Mathieu Le Gars
- Johnson & Johnson Innovative Medicine, Janssen Vaccines & Prevention B.V, Leiden, The Netherlands
| | - Johan Vingerhoets
- Janssen R&D, a division of Janssen Pharmaceutica NV, Beerse, Belgium
| | - Beatriz Grinsztejn
- Evandro Chagas National Institute of Infectious Diseases-Fundação Oswaldo Cruz, Rio de Janeiro, RJ, Brazil
| | - Paul A Goepfert
- Division of Infectious Diseases, Department of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Leonardo Paiva de Sousa
- Evandro Chagas National Institute of Infectious Diseases-Fundação Oswaldo Cruz, Rio de Janeiro, RJ, Brazil
| | - Mayara Secco Torres Silva
- Evandro Chagas National Institute of Infectious Diseases-Fundação Oswaldo Cruz, Rio de Janeiro, RJ, Brazil
| | - Martin Casapia
- Facultad de Medicina Humana, Universidad Nacional de la Amazonia Peru, Iquitos, Peru
| | - Marcelo H Losso
- Hospital General de Agudos José María Ramos Mejia, Buenos Aires, Argentina
| | - Susan J Little
- Division of Infectious Diseases, University of California San Diego, La Jolla, CA, USA
| | - Aditya Gaur
- Department of Infectious Diseases, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Linda-Gail Bekker
- The Desmond Tutu HIV Centre, University of Cape Town, Observatory, Cape Town, South Africa
| | - Nigel Garrett
- Centre for the AIDS Programme of Research in South Africa, University of KwaZulu-Natal, Durban, South Africa
- Discipline of Public Health Medicine, School of Nursing and Public Health, University of KwaZulu-Natal, Durban, South Africa
| | - Carla Truyers
- Janssen R&D, a division of Janssen Pharmaceutica NV, Beerse, Belgium
| | - Ilse Van Dromme
- Janssen R&D, a division of Janssen Pharmaceutica NV, Beerse, Belgium
| | - Edith Swann
- Vaccine Research Program, Division of AIDS, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Mary A Marovich
- Vaccine Research Program, Division of AIDS, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Dean Follmann
- Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Kathleen M Neuzil
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Lawrence Corey
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Alexander L Greninger
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Pavitra Roychoudhury
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Ollivier Hyrien
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Peter B Gilbert
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA.
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA.
- Department of Biostatistics, University of Washington School of Public Health, Seattle, WA, USA.
| |
Collapse
|
10
|
Juraska M, Bai H, deCamp AC, Magaret CA, Li L, Gillespie K, Carpp LN, Giorgi EE, Ludwig J, Molitor C, Hudson A, Williamson BD, Espy N, Simpkins B, Rudnicki E, Shao D, Rossenkhan R, Edlefsen PT, Westfall DH, Deng W, Chen L, Zhao H, Bhattacharya T, Pankow A, Murrell B, Yssel A, Matten D, York T, Beaume N, Gwashu-Nyangiwe A, Ndabambi N, Thebus R, Karuna ST, Morris L, Montefiori DC, Hural JA, Cohen MS, Corey L, Rolland M, Gilbert PB, Williamson C, Mullins JI. Prevention efficacy of the broadly neutralizing antibody VRC01 depends on HIV-1 envelope sequence features. Proc Natl Acad Sci U S A 2024; 121:e2308942121. [PMID: 38241441 PMCID: PMC10823214 DOI: 10.1073/pnas.2308942121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 11/13/2023] [Indexed: 01/21/2024] Open
Abstract
In the Antibody Mediated Prevention (AMP) trials (HVTN 704/HPTN 085 and HVTN 703/HPTN 081), prevention efficacy (PE) of the monoclonal broadly neutralizing antibody (bnAb) VRC01 (vs. placebo) against HIV-1 acquisition diagnosis varied according to the HIV-1 Envelope (Env) neutralization sensitivity to VRC01, as measured by 80% inhibitory concentration (IC80). Here, we performed a genotypic sieve analysis, a complementary approach to gaining insight into correlates of protection that assesses how PE varies with HIV-1 sequence features. We analyzed HIV-1 Env amino acid (AA) sequences from the earliest available HIV-1 RNA-positive plasma samples from AMP participants diagnosed with HIV-1 and identified Env sequence features that associated with PE. The strongest Env AA sequence correlate in both trials was VRC01 epitope distance that quantifies the divergence of the VRC01 epitope in an acquired HIV-1 isolate from the VRC01 epitope of reference HIV-1 strains that were most sensitive to VRC01-mediated neutralization. In HVTN 704/HPTN 085, the Env sequence-based predicted probability that VRC01 IC80 against the acquired isolate exceeded 1 µg/mL also significantly associated with PE. In HVTN 703/HPTN 081, a physicochemical-weighted Hamming distance across 50 VRC01 binding-associated Env AA positions of the acquired isolate from the most VRC01-sensitive HIV-1 strain significantly associated with PE. These results suggest that incorporating mutation scoring by BLOSUM62 and weighting by the strength of interactions at AA positions in the epitope:VRC01 interface can optimize performance of an Env sequence-based biomarker of VRC01 prevention efficacy. Future work could determine whether these results extend to other bnAbs and bnAb combinations.
Collapse
Affiliation(s)
- Michal Juraska
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA98109
| | - Hongjun Bai
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, MD20910
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Bethesda, MD20817
| | - Allan C. deCamp
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA98109
| | - Craig A. Magaret
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA98109
| | - Li Li
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA98109
| | - Kevin Gillespie
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA98109
| | - Lindsay N. Carpp
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA98109
| | - Elena E. Giorgi
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA98109
| | - James Ludwig
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA98109
| | - Cindy Molitor
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA98109
| | - Aaron Hudson
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA98109
| | - Brian D. Williamson
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA98109
- Biostatistics Division, Kaiser Permanente Washington Health Research Institute, Seattle, WA98101
| | - Nicole Espy
- Science and Technology Policy Fellowships, American Association for the Advancement of Science, Washington, DC20005
| | - Brian Simpkins
- Department of Computer Science, Pitzer College, Claremont, CA91711
| | - Erika Rudnicki
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA98109
| | - Danica Shao
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA98109
| | - Raabya Rossenkhan
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA98109
| | - Paul T. Edlefsen
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA98109
| | - Dylan H. Westfall
- Department of Microbiology, University of Washington School of Medicine, Seattle, WA98195
| | - Wenjie Deng
- Department of Microbiology, University of Washington School of Medicine, Seattle, WA98195
| | - Lennie Chen
- Department of Microbiology, University of Washington School of Medicine, Seattle, WA98195
| | - Hong Zhao
- Department of Microbiology, University of Washington School of Medicine, Seattle, WA98195
| | | | - Alec Pankow
- Department of Microbiology, Tumor, and Cell Biology, Karolinska Institutet, Solna171 77, Sweden
| | - Ben Murrell
- Department of Microbiology, Tumor, and Cell Biology, Karolinska Institutet, Solna171 77, Sweden
| | - Anna Yssel
- Institute of Infectious Disease and Molecular Medicine, and Wellcome Centre for Infectious Diseases Research in Africa, Department of Pathology, Faculty of Health Sciences, University of Cape Town and National Health Laboratory Service, Cape Town7701, South Africa
| | - David Matten
- Institute of Infectious Disease and Molecular Medicine, and Wellcome Centre for Infectious Diseases Research in Africa, Department of Pathology, Faculty of Health Sciences, University of Cape Town and National Health Laboratory Service, Cape Town7701, South Africa
| | - Talita York
- Institute of Infectious Disease and Molecular Medicine, and Wellcome Centre for Infectious Diseases Research in Africa, Department of Pathology, Faculty of Health Sciences, University of Cape Town and National Health Laboratory Service, Cape Town7701, South Africa
| | - Nicolas Beaume
- Institute of Infectious Disease and Molecular Medicine, and Wellcome Centre for Infectious Diseases Research in Africa, Department of Pathology, Faculty of Health Sciences, University of Cape Town and National Health Laboratory Service, Cape Town7701, South Africa
| | - Asanda Gwashu-Nyangiwe
- Institute of Infectious Disease and Molecular Medicine, and Wellcome Centre for Infectious Diseases Research in Africa, Department of Pathology, Faculty of Health Sciences, University of Cape Town and National Health Laboratory Service, Cape Town7701, South Africa
| | - Nonkululeko Ndabambi
- Institute of Infectious Disease and Molecular Medicine, and Wellcome Centre for Infectious Diseases Research in Africa, Department of Pathology, Faculty of Health Sciences, University of Cape Town and National Health Laboratory Service, Cape Town7701, South Africa
| | - Ruwayhida Thebus
- Institute of Infectious Disease and Molecular Medicine, and Wellcome Centre for Infectious Diseases Research in Africa, Department of Pathology, Faculty of Health Sciences, University of Cape Town and National Health Laboratory Service, Cape Town7701, South Africa
| | - Shelly T. Karuna
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA98109
| | - Lynn Morris
- HIV Virology Section, National Institute for Communicable Diseases, National Health Laboratory Service, Johannesburg2192, South Africa
- Antibody Immunity Research Unit, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg2000, South Africa
- Centre for the AIDS Programme of Research in South Africa, University of KwaZulu-Natal, Durban4041, South Africa
| | | | - John A. Hural
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA98109
| | - Myron S. Cohen
- Institute of Global Health and Infectious Diseases, The University of North Carolina at Chapel Hill, Chapel Hill, NC27599
| | - Lawrence Corey
- Department of Medicine, University of Washington, Seattle, WA98195
- Department of Laboratory Medicine, University of Washington, Seattle, WA98195
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA98109
| | - Morgane Rolland
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, MD20910
- Henry M. Jackson Foundation for the Advancement of Military Medicine, Bethesda, MD20817
| | - Peter B. Gilbert
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA98109
- Department of Biostatistics, University of Washington, Seattle, WA98195
- Department of Global Health, University of Washington, Seattle, WA98195
| | - Carolyn Williamson
- Institute of Infectious Disease and Molecular Medicine, and Wellcome Centre for Infectious Diseases Research in Africa, Department of Pathology, Faculty of Health Sciences, University of Cape Town and National Health Laboratory Service, Cape Town7701, South Africa
| | - James I. Mullins
- Department of Microbiology, University of Washington School of Medicine, Seattle, WA98195
- Department of Global Health, University of Washington, Seattle, WA98195
- Department of Microbiology, University of Washington, Seattle, WA98109
| |
Collapse
|
11
|
Williamson BD, Wu L, Huang Y, Hudson A, Gilbert PB. Predicting neutralization susceptibility to combination HIV-1 monoclonal broadly neutralizing antibody regimens. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.14.571616. [PMID: 38168308 PMCID: PMC10760080 DOI: 10.1101/2023.12.14.571616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Combination monoclonal broadly neutralizing antibodies (bnAbs) are currently being developed for preventing HIV-1 infection. Recent work has focused on predicting in vitro neutralization potency of both individual bnAbs and combination regimens against HIV-1 pseudoviruses using Env sequence features. To predict in vitro combination regimen neutralization potency against a given HIV-1 pseudovirus, previous approaches have applied mathematical models to combine individual-bnAb neutralization and have predicted this combined neutralization value; we call this the combine-then-predict (CP) approach. However, prediction performance for some individual bnAbs has exceeded that for the combination, leading to another possibility: combining the individual-bnAb predicted values and using these to predict combination regimen neutralization; we call this the predict-then-combine (PC) approach. We explore both approaches in both simulated data and data from the Los Alamos National Laboratory's Compile, Neutralize, and Tally NAb Panels repository. The CP approach is superior to the PC approach when the neutralization outcome of interest is binary (e.g., neutralization susceptibility, defined as inhibitory concentration < 1 μg/mL. For continuous outcomes, the CP approach performs at least as well as the PC approach, and is superior to the PC approach when the individual-bnAb prediction algorithms have poor performance. This knowledge may be used when building prediction models for novel antibody combinations in the absence of in vitro neutralization data for the antibody combination; this, in turn, will aid in the evaluation and down-selection of these antibody combinations into prevention efficacy trials.
Collapse
Affiliation(s)
- Brian D. Williamson
- Biostatistics Division, Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Liana Wu
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Yunda Huang
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Department of Global Health, University of Washington, Seattle, WA, USA
| | - Aaron Hudson
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Peter B. Gilbert
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Department of Biostatistics, University of Washington, Seattle, WA, USA
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| |
Collapse
|
12
|
Li S, Luedtke A. Efficient Estimation under Data Fusion. Biometrika 2023; 110:1041-1054. [PMID: 37982010 PMCID: PMC10653189 DOI: 10.1093/biomet/asad007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2023] Open
Abstract
We aim to make inferences about a smooth, finite-dimensional parameter by fusing data from multiple sources together. Previous works have studied the estimation of a variety of parameters in similar data fusion settings, including in the estimation of the average treatment effect and average reward under a policy, with the majority of them merging one historical data source with covariates, actions, and rewards and one data source of the same covariates. In this work, we consider the general case where one or more data sources align with each part of the distribution of the target population, for example, the conditional distribution of the reward given actions and covariates. We describe potential gains in efficiency that can arise from fusing these data sources together in a single analysis, which we characterize by a reduction in the semiparametric efficiency bound. We also provide a general means to construct estimators that achieve these bounds. In numerical simulations, we illustrate marked improvements in efficiency from using our proposed estimators rather than their natural alternatives. Finally, we illustrate the magnitude of efficiency gains that can be realized in vaccine immunogenicity studies by fusing data from two HIV vaccine trials.
Collapse
Affiliation(s)
- Sijia Li
- Department of Biostatistics, University of Washington, Seattle, Washington 98195
| | - Alex Luedtke
- Department of Statistics, University of Washington, Box 354322, Seattle, Washington 98195
| |
Collapse
|
13
|
Williamson BD, Magaret CA, Karuna S, Carpp LN, Gelderblom HC, Huang Y, Benkeser D, Gilbert PB. Application of the SLAPNAP statistical learning tool to broadly neutralizing antibody HIV prevention research. iScience 2023; 26:107595. [PMID: 37654470 PMCID: PMC10466901 DOI: 10.1016/j.isci.2023.107595] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 07/05/2023] [Accepted: 08/07/2023] [Indexed: 09/02/2023] Open
Abstract
Combination monoclonal broadly neutralizing antibody (bnAb) regimens are in clinical development for HIV prevention, necessitating additional knowledge of bnAb neutralization potency/breadth against circulating viruses. Williamson et al. (2021) described a software tool, Super LeArner Prediction of NAb Panels (SLAPNAP), with application to any HIV bnAb regimen with sufficient neutralization data against a set of viruses in the Los Alamos National Laboratory's Compile, Neutralize, and Tally Nab Panels repository. SLAPNAP produces a proteomic antibody resistance (PAR) score for Env sequences based on predicted neutralization resistance and estimates variable importance of Env amino acid features. We apply SLAPNAP to compare HIV bnAb regimens undergoing clinical testing, finding improved power for downstream sieve analyses and increased precision for comparing neutralization potency/breadth of bnAb regimens due to the inclusion of PAR scores of Env sequences with much larger sample sizes available than for neutralization outcomes. SLAPNAP substantially improves bnAb regimen characterization, ranking, and down-selection.
Collapse
Affiliation(s)
- Brian D. Williamson
- Biostatistics Division; Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, USA
- Vaccine and Infectious Disease Division; Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Craig A. Magaret
- Vaccine and Infectious Disease Division; Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Shelly Karuna
- Vaccine and Infectious Disease Division; Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
- GreenLight Biosciences, Medford, MA 02155, USA
| | - Lindsay N. Carpp
- Vaccine and Infectious Disease Division; Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Huub C. Gelderblom
- Vaccine and Infectious Disease Division; Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Yunda Huang
- Vaccine and Infectious Disease Division; Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
- Department of Global Health; University of Washington, Seattle, WA 98105, USA
| | - David Benkeser
- Department of Biostatistics and Bioinformatics; Emory University, Atlanta, GA 30322, USA
| | - Peter B. Gilbert
- Vaccine and Infectious Disease Division; Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
- Department of Biostatistics; University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
14
|
Wallace ML, Mentch L, Wheeler BJ, Tapia AL, Richards M, Zhou S, Yi L, Redline S, Buysse DJ. Use and misuse of random forest variable importance metrics in medicine: demonstrations through incident stroke prediction. BMC Med Res Methodol 2023; 23:144. [PMID: 37337173 PMCID: PMC10280951 DOI: 10.1186/s12874-023-01965-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 06/06/2023] [Indexed: 06/21/2023] Open
Abstract
BACKGROUND Machine learning tools such as random forests provide important opportunities for modeling large, complex modern data generated in medicine. Unfortunately, when it comes to understanding why machine learning models are predictive, applied research continues to rely on 'out of bag' (OOB) variable importance metrics (VIMPs) that are known to have considerable shortcomings within the statistics community. After explaining the limitations of OOB VIMPs - including bias towards correlated features and limited interpretability - we describe a modern approach called 'knockoff VIMPs' and explain its advantages. METHODS We first evaluate current VIMP practices through an in-depth literature review of 50 recent random forest manuscripts. Next, we recommend organized and interpretable strategies for analysis with knockoff VIMPs, including computing them for groups of features and considering multiple model performance metrics. To demonstrate methods, we develop a random forest to predict 5-year incident stroke in the Sleep Heart Health Study and compare results based on OOB and knockoff VIMPs. RESULTS Nearly all papers in the literature review contained substantial limitations in their use of VIMPs. In our demonstration, using OOB VIMPs for individual variables suggested two highly correlated lung function variables (forced expiratory volume, forced vital capacity) as the best predictors of incident stroke, followed by age and height. Using an organized analytic approach that considered knockoff VIMPs of both groups of features and individual features, the largest contributions to model sensitivity were medications (especially cardiovascular) and measured medical risk factors, while the largest contributions to model specificity were age, diastolic blood pressure, self-reported medical risk factors, polysomnography features, and pack-years of smoking. Thus, we reach very different conclusions about stroke risk factors using OOB VIMPs versus knockoff VIMPs. CONCLUSIONS The near-ubiquitous reliance on OOB VIMPs may provide misleading results for researchers who use such methods to guide their research. Given the rapid pace of scientific inquiry using machine learning, it is essential to bring modern knockoff VIMPs that are interpretable and unbiased into widespread applied practice to steer researchers using random forest machine learning toward more meaningful results.
Collapse
Affiliation(s)
- Meredith L Wallace
- Department of Psychiatry, University of Pittsburgh, 3811 O'Hara Street, Pittsburgh, PA, 15231, USA.
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA, USA.
| | - Lucas Mentch
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Bradley J Wheeler
- School of Computing and Information, University of Pittsburgh, Pittsburgh, PA, USA
| | - Amanda L Tapia
- Department of Psychiatry, University of Pittsburgh, 3811 O'Hara Street, Pittsburgh, PA, 15231, USA
| | - Marc Richards
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Siyu Zhou
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Lixia Yi
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Susan Redline
- Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Daniel J Buysse
- Department of Psychiatry, University of Pittsburgh, 3811 O'Hara Street, Pittsburgh, PA, 15231, USA
| |
Collapse
|
15
|
Fong Y, Huang Y, Benkeser D, Carpp LN, Áñez G, Woo W, McGarry A, Dunkle LM, Cho I, Houchens CR, Martins K, Jayashankar L, Castellino F, Petropoulos CJ, Leith A, Haugaard D, Webb B, Lu Y, Yu C, Borate B, van der Laan LWP, Hejazi NS, Randhawa AK, Andrasik MP, Kublin JG, Hutter J, Keshtkar-Jahromi M, Beresnev TH, Corey L, Neuzil KM, Follmann D, Ake JA, Gay CL, Kotloff KL, Koup RA, Donis RO, Gilbert PB. Immune correlates analysis of the PREVENT-19 COVID-19 vaccine efficacy clinical trial. Nat Commun 2023; 14:331. [PMID: 36658109 PMCID: PMC9851580 DOI: 10.1038/s41467-022-35768-3] [Citation(s) in RCA: 56] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 12/28/2022] [Indexed: 01/21/2023] Open
Abstract
In the PREVENT-19 phase 3 trial of the NVX-CoV2373 vaccine (NCT04611802), anti-spike binding IgG concentration (spike IgG), anti-RBD binding IgG concentration (RBD IgG), and pseudovirus 50% neutralizing antibody titer (nAb ID50) measured two weeks post-dose two are assessed as correlates of risk and as correlates of protection against COVID-19. Analyses are conducted in the U.S. cohort of baseline SARS-CoV-2 negative per-protocol participants using a case-cohort design that measures the markers from all 12 vaccine recipient breakthrough COVID-19 cases starting 7 days post antibody measurement and from 639 vaccine recipient non-cases. All markers are inversely associated with COVID-19 risk and directly associated with vaccine efficacy. In vaccine recipients with nAb ID50 titers of 50, 100, and 7230 international units (IU50)/ml, vaccine efficacy estimates are 75.7% (49.8%, 93.2%), 81.7% (66.3%, 93.2%), and 96.8% (88.3%, 99.3%). The results support potential cross-vaccine platform applications of these markers for guiding decisions about vaccine approval and use.
Collapse
Affiliation(s)
- Youyi Fong
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Yunda Huang
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Department of Global Health, University of Washington, Seattle, WA, 98195, USA
| | - David Benkeser
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Lindsay N Carpp
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | | | - Wayne Woo
- Novavax, Inc., Gaithersburg, MD, USA
| | | | | | | | | | - Karen Martins
- Biomedical Advanced Research and Development Authority, Washington, DC, USA
| | | | - Flora Castellino
- Biomedical Advanced Research and Development Authority, Washington, DC, USA
| | | | | | | | | | - Yiwen Lu
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Chenchen Yu
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Bhavesh Borate
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Lars W P van der Laan
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Department of Statistics, University of Washington, Seattle, WA, USA
| | - Nima S Hejazi
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
| | - April K Randhawa
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Michele P Andrasik
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - James G Kublin
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Julia Hutter
- Division of Microbiology and Infectious Diseases, National Institute of Allergy and Infectious Diseases, NIH, Rockville, MD, USA
| | - Maryam Keshtkar-Jahromi
- Division of Microbiology and Infectious Diseases, National Institute of Allergy and Infectious Diseases, NIH, Rockville, MD, USA
| | - Tatiana H Beresnev
- Division of Microbiology and Infectious Diseases, National Institute of Allergy and Infectious Diseases, NIH, Rockville, MD, USA
| | - Lawrence Corey
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Kathleen M Neuzil
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Dean Follmann
- Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Julie A Ake
- U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, MD, USA
| | - Cynthia L Gay
- University of North Carolina School of Medicine, Chapel Hill, NC, USA
| | - Karen L Kotloff
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Richard A Koup
- Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Ruben O Donis
- Biomedical Advanced Research and Development Authority, Washington, DC, USA
| | - Peter B Gilbert
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA.
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA.
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, WA, USA.
| |
Collapse
|
16
|
Dănăilă VR, Avram S, Buiu C. The applications of machine learning in HIV neutralizing antibodies research-A systematic review. Artif Intell Med 2022; 134:102429. [PMID: 36462896 DOI: 10.1016/j.artmed.2022.102429] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 09/03/2022] [Accepted: 10/13/2022] [Indexed: 12/14/2022]
Abstract
Machine learning algorithms play an essential role in bioinformatics and allow exploring the vast and noisy biological data in unrivaled ways. This paper is a systematic review of the applications of machine learning in the study of HIV neutralizing antibodies. This significant and vast research domain can pave the way to novel treatments and to a vaccine. We selected the relevant papers by investigating the available literature from the Web of Science and PubMed databases in the last decade. The computational methods are applied in neutralization potency prediction, neutralization span prediction against multiple viral strains, antibody-virus binding sites detection, enhanced antibodies design, and the study of the antibody-induced immune response. These methods are viewed from multiple angles spanning data processing, model description, feature selection, evaluation, and sometimes paper comparisons. The algorithms are diverse and include supervised, unsupervised, and generative types. Both classical machine learning and modern deep learning were taken into account. The review ends with our ideas regarding future research directions and challenges.
Collapse
Affiliation(s)
- Vlad-Rareş Dănăilă
- Department of Automatic Control and Systems Engineering, Politehnica University of Bucharest, 313 Splaiul Independenţei, Bucharest 060042, Romania.
| | - Speranţa Avram
- Department of Anatomy, Animal Physiology and Biophysics, Faculty of Biology, University of Bucharest, 91-95 Splaiul Independentei, Bucharest 050095, Romania.
| | - Cătălin Buiu
- Department of Automatic Control and Systems Engineering, Politehnica University of Bucharest, 313 Splaiul Independenţei, Bucharest 060042, Romania.
| |
Collapse
|
17
|
Han S, Fong Y, Huang Y. Testing a global null hypothesis using ensemble machine learning methods. Stat Med 2022; 41:2417-2426. [PMID: 35253259 PMCID: PMC9035066 DOI: 10.1002/sim.9362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Revised: 12/14/2021] [Accepted: 02/12/2022] [Indexed: 08/27/2024]
Abstract
Testing a global null hypothesis that there are no significant predictors for a binary outcome of interest among a large set of biomarker measurements is an important task in biomedical studies. We seek to improve the power of such testing methods by leveraging ensemble machine learning methods. Ensemble machine learning methods such as random forest, bagging, and adaptive boosting model the relationship between the outcome and the predictor nonparametrically, while stacking combines the strength of multiple learners. We demonstrate the power of the proposed testing methods through Monte Carlo studies and show the use of the methods by applying them to the immunologic biomarkers dataset from the RV144 HIV vaccine efficacy trial.
Collapse
Affiliation(s)
- Sunwoo Han
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Youyi Fong
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Ying Huang
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| |
Collapse
|
18
|
Bénard C, Da Veiga S, Scornet E. MDA for random forests: inconsistency, and a practical solution via the Sobol-MDA. Biometrika 2022. [DOI: 10.1093/biomet/asac017] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Summary
Variable importance measures are the main tools used to analyse the black-box mechanisms of random forests. Although the mean decrease accuracy is widely accepted as the most efficient variable importance measure for random forests, little is known about its statistical properties. In fact, the definition of mean decrease accuracy varies across the main random forest software. In this article, our objective is to rigorously analyse the behaviour of the main mean decrease accuracy implementations. Consequently, we mathematically formalize the various implemented MDA algorithms, and then establish their limits when the sample size increases. This asymptotic analysis reveals that these mean decrease accuracy versions differ as importance measures, since they converge towards different quantities. More importantly, we break down these limits into three components: the first two terms are related to Sobol indices, which are well-defined measures of a covariate contribution to the response variance, widely used in the sensitivity analysis field, as opposed to the third term, whose value increases with dependence within covariates. Thus, we theoretically demonstrate that the mean decrease accuracy does not target the right quantity to detect influential covariates in a dependent setting, a fact that has already been noticed experimentally. To address this issue, we define a new importance measure for random forests, the Sobol-mean decrease accuracy, which fixes the flaws of the original mean decrease accuracy, and consistently estimates the accuracy decrease of the forest retrained without a given covariate, but with an efficient computational cost. The Sobol-mean decrease accuracy empirically outperforms its competitors on both simulated and real data for variable selection. An open source implementation in R and C ++ is available online.
Collapse
Affiliation(s)
- Clément Bénard
- Safran Tech, Digital Sciences & Technologies, 78114 Magny-Les-Hameaux, France
| | - Sébastien Da Veiga
- Safran Tech, Digital Sciences & Technologies, 78114 Magny-Les-Hameaux, France
| | - Erwan Scornet
- Ecole Polytechnique, IP Paris, CMAP, 91128 Palaiseau, France
| |
Collapse
|
19
|
McDonald DJ, Bien J, Green A, Hu AJ, DeFries N, Hyun S, Oliveira NL, Sharpnack J, Tang J, Tibshirani R, Ventura V, Wasserman L, Tibshirani RJ. Can auxiliary indicators improve COVID-19 forecasting and hotspot prediction? Proc Natl Acad Sci U S A 2021; 118:e2111453118. [PMID: 34903655 PMCID: PMC8713796 DOI: 10.1073/pnas.2111453118] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/02/2021] [Indexed: 02/07/2023] Open
Abstract
Short-term forecasts of traditional streams from public health reporting (such as cases, hospitalizations, and deaths) are a key input to public health decision-making during a pandemic. Since early 2020, our research group has worked with data partners to collect, curate, and make publicly available numerous real-time COVID-19 indicators, providing multiple views of pandemic activity in the United States. This paper studies the utility of five such indicators-derived from deidentified medical insurance claims, self-reported symptoms from online surveys, and COVID-related Google search activity-from a forecasting perspective. For each indicator, we ask whether its inclusion in an autoregressive (AR) model leads to improved predictive accuracy relative to the same model excluding it. Such an AR model, without external features, is already competitive with many top COVID-19 forecasting models in use today. Our analysis reveals that 1) inclusion of each of these five indicators improves on the overall predictive accuracy of the AR model; 2) predictive gains are in general most pronounced during times in which COVID cases are trending in "flat" or "down" directions; and 3) one indicator, based on Google searches, seems to be particularly helpful during "up" trends.
Collapse
Affiliation(s)
- Daniel J McDonald
- Department of Statistics, University of British Columbia, Vancouver, BC, Canada V6T 1Z4;
| | - Jacob Bien
- Department of Data Sciences and Operations, University of Southern California, Los Angeles, CA 90089
| | - Alden Green
- Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, PA 15213
| | - Addison J Hu
- Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, PA 15213
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213
| | - Nat DeFries
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213
| | - Sangwon Hyun
- Department of Data Sciences and Operations, University of Southern California, Los Angeles, CA 90089
| | - Natalia L Oliveira
- Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, PA 15213
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213
| | - James Sharpnack
- Department of Statistics, University of California, Davis, CA 95616
| | - Jingjing Tang
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213
| | - Robert Tibshirani
- Department of Statistics, Stanford University, Stanford, CA 94305
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305
| | - Valérie Ventura
- Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, PA 15213
| | - Larry Wasserman
- Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, PA 15213
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213
| | - Ryan J Tibshirani
- Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, PA 15213
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213
| |
Collapse
|