1
|
Asiimwe IG, S'fiso Ndzamba B, Mouksassi S, Pillai GC, Lombard A, Lang J. Machine-Learning Assisted Screening of Correlated Covariates: Application to Clinical Data of Desipramine. AAPS J 2024; 26:63. [PMID: 38816519 DOI: 10.1208/s12248-024-00934-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 05/14/2024] [Indexed: 06/01/2024] Open
Abstract
Stepwise covariate modeling (SCM) has a high computational burden and can select the wrong covariates. Machine learning (ML) has been proposed as a screening tool to improve the efficiency of covariate selection, but little is known about how to apply ML on actual clinical data. First, we simulated datasets based on clinical data to compare the performance of various ML and traditional pharmacometrics (PMX) techniques with and without accounting for highly-correlated covariates. This simulation step identified the ML algorithm and the number of top covariates to select when using the actual clinical data. A previously developed desipramine population-pharmacokinetic model was used to simulate virtual subjects. Fifteen covariates were considered with four having an effect included. Based on the F1 score (an accuracy measure), ridge regression was the most accurate ML technique on 200 simulated datasets (F1 score = 0.475 ± 0.231), a performance which almost doubled when highly-correlated covariates were accounted for (F1 score = 0.860 ± 0.158). These performances were better than forwards selection with SCM (F1 score = 0.251 ± 0.274 and 0.499 ± 0.381 without/with correlations respectively). In terms of computational cost, ridge regression (0.42 ± 0.07 seconds/simulated dataset, 1 thread) was ~20,000 times faster than SCM (2.30 ± 2.29 hours, 15 threads). On the clinical dataset, prescreening with the selected ML algorithm reduced SCM runtime by 42.86% (from 1.75 to 1.00 days) and produced the same final model as SCM only. In conclusion, we have demonstrated that accounting for highly-correlated covariates improves ML prescreening accuracy. The choice of ML method and the proportion of important covariates (unknown a priori) can be guided by simulations.
Collapse
Affiliation(s)
- Innocent Gerald Asiimwe
- The Wolfson Centre for Personalized Medicine, Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK.
- APT-Africa Fellowship Program, c/o Pharmacometrics Africa NPC, K45 Old Main Building, Groote Schuur Hospital, Cape Town, South Africa.
| | - Bonginkosi S'fiso Ndzamba
- APT-Africa Fellowship Program, c/o Pharmacometrics Africa NPC, K45 Old Main Building, Groote Schuur Hospital, Cape Town, South Africa
- Faculty of health sciences, Department of Pharmacy, Nelson Mandela University, Port Elizabeth, South Africa
| | | | - Goonaseelan Colin Pillai
- APT-Africa Fellowship Program, c/o Pharmacometrics Africa NPC, K45 Old Main Building, Groote Schuur Hospital, Cape Town, South Africa
- Division of Clinical Pharmacology, University of Cape Town, Cape Town, South Africa
- CP+ Associates GmbH, Basel, Switzerland
| | | | | |
Collapse
|
2
|
Joseph JE, Vaughan BK, Camp CC, Baker NL, Sherman BJ, Moran-Santa Maria M, McRae-Clark A, Brady KT. Oxytocin-Induced Changes in Intrinsic Network Connectivity in Cocaine Use Disorder: Modulation by Gender, Childhood Trauma, and Years of Use. Front Psychiatry 2019; 10:502. [PMID: 31379621 PMCID: PMC6658612 DOI: 10.3389/fpsyt.2019.00502] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/01/2019] [Accepted: 06/25/2019] [Indexed: 12/11/2022] Open
Abstract
Cocaine use disorder (CUD) is a major public health concern with devastating social, economic, and mental health implications. A better understanding of the underlying neurobiology and phenotypic variations in individuals with CUD is necessary for the development of effective and targeted treatments. In this study, 39 women and 54 men with CUD completed a 6-min resting-state functional magnetic resonance imaging scan after intranasal oxytocin (OXY) or placebo administration. Graph-theory network analysis was used to quantify functional connectivity changes caused by OXY in striatum, anterior cingulate cortex (ACC), insula, and amygdala nodes of interest. OXY increased connectivity in the right ACC and left amygdala in males, whereas OXY increased connectivity in the right ACC and right accumbens in females. Machine learning was then used to associate treatment response (placebo minus OXY) in nodes of interest with years of cocaine use and severity of childhood trauma separately for males and females. Childhood trauma and years of cocaine use were associated with OXY-induced changes in ACC connectivity for both men and women, but connectivity changes in the amygdala were associated with years of cocaine use in men and connectivity changes in the right insula were associated with years of cocaine use in women. These findings suggest that salience network nodes (ACC and insula) are potential OXY treatment targets in CUD, with the amygdala as a treatment target for men and the accumbens as a treatment target for women.
Collapse
Affiliation(s)
- Jane E. Joseph
- Department of Neuroscience, Medical University of South Carolina, Charleston, SC, United States
| | - Brandon K. Vaughan
- Department of Neuroscience, Medical University of South Carolina, Charleston, SC, United States
| | - Christopher C. Camp
- Department of Neuroscience, Medical University of South Carolina, Charleston, SC, United States
| | - Nathaniel L. Baker
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC, United States
| | - Brian J. Sherman
- Department of Psychiatry and Behavioral Sciences, Medical University of South Carolina, Charleston, SC, United States
| | - Megan Moran-Santa Maria
- Department of Psychiatry and Behavioral Sciences, Medical University of South Carolina, Charleston, SC, United States
| | - Aimee McRae-Clark
- Department of Psychiatry and Behavioral Sciences, Medical University of South Carolina, Charleston, SC, United States
- Ralph H. Johnson VA Medical Center, Charleston, SC, United States
| | - Kathleen T. Brady
- Department of Psychiatry and Behavioral Sciences, Medical University of South Carolina, Charleston, SC, United States
- Ralph H. Johnson VA Medical Center, Charleston, SC, United States
| |
Collapse
|
3
|
Joseph JE, Vanderweyen D, Swearingen J, Vaughan BK, Novo D, Zhu X, Gebregziabher M, Bonilha L, Bhatt R, Naselaris T, Dean B. Tracking the Development of Functional Connectomes for Face Processing. Brain Connect 2018; 9:231-239. [PMID: 30489152 DOI: 10.1089/brain.2018.0607] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Face processing capacities become more specialized and advanced during development, but neural underpinnings of these processes are not fully understood. The present study applied graph theory-based network analysis to task-negative (resting blocks) and task-positive (viewing faces) functional magnetic resonance imaging data in children (5-17 years) and adults (18-42 years) to test the hypothesis that the development of a specialized network for face processing is driven by task-positive processing (face viewing) more than by task-negative processing (visual fixation) and by both progressive and regressive changes in network properties. Predictive modeling was used to predict age from node-based network properties derived from task-positive and task-negative states in a whole-brain network (WBN) and a canonical face network (FN). The best-fitting model indicated that FN maturation was marked by both progressive and regressive changes in information diffusion (eigenvector centrality) in the task-positive state, with regressive changes outweighing progressive changes. Hence, FN maturation was characterized by reductions in information diffusion potentially reflecting the development of more specialized modules. In contrast, WBN maturation was marked by a balance of progressive and regressive changes in hub-connectivity (betweenness centrality) in the task-negative state. These findings suggest that the development of specialized networks like the FN depends on dynamic developmental changes associated with domain-specific information (e.g., face processing), but maturation of the brain as a whole can be predicted from task-free states.
Collapse
Affiliation(s)
- Jane E Joseph
- 1 Department of Neuroscience, Medical University of South Carolina, Charleston, South Carolina
| | - Davy Vanderweyen
- 1 Department of Neuroscience, Medical University of South Carolina, Charleston, South Carolina
| | - Joshua Swearingen
- 1 Department of Neuroscience, Medical University of South Carolina, Charleston, South Carolina
| | - Brandon K Vaughan
- 1 Department of Neuroscience, Medical University of South Carolina, Charleston, South Carolina
| | - Derek Novo
- 1 Department of Neuroscience, Medical University of South Carolina, Charleston, South Carolina
| | - Xun Zhu
- 1 Department of Neuroscience, Medical University of South Carolina, Charleston, South Carolina
| | - Mulugeta Gebregziabher
- 2 Department of Public Health Sciences, and Medical University of South Carolina, Charleston, South Carolina
| | - Leonardo Bonilha
- 3 Department of Neurology, Medical University of South Carolina, Charleston, South Carolina
| | - Ramesh Bhatt
- 4 Department of Psychology, University of Kentucky, Lexington, Kentucky
| | - Thomas Naselaris
- 1 Department of Neuroscience, Medical University of South Carolina, Charleston, South Carolina
| | - Brian Dean
- 5 Division of Computer Science, School of Computing, Clemson, South Carolina
| |
Collapse
|
4
|
Tavallali P, Razavi M, Brady S. A non-linear data mining parameter selection algorithm for continuous variables. PLoS One 2017; 12:e0187676. [PMID: 29131829 PMCID: PMC5683644 DOI: 10.1371/journal.pone.0187676] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 10/24/2017] [Indexed: 11/18/2022] Open
Abstract
In this article, we propose a new data mining algorithm, by which one can both capture the non-linearity in data and also find the best subset model. To produce an enhanced subset of the original variables, a preferred selection method should have the potential of adding a supplementary level of regression analysis that would capture complex relationships in the data via mathematical transformation of the predictors and exploration of synergistic effects of combined variables. The method that we present here has the potential to produce an optimal subset of variables, rendering the overall process of model selection more efficient. This algorithm introduces interpretable parameters by transforming the original inputs and also a faithful fit to the data. The core objective of this paper is to introduce a new estimation technique for the classical least square regression framework. This new automatic variable transformation and model selection method could offer an optimal and stable model that minimizes the mean square error and variability, while combining all possible subset selection methodology with the inclusion variable transformations and interactions. Moreover, this method controls multicollinearity, leading to an optimal set of explanatory variables.
Collapse
Affiliation(s)
- Peyman Tavallali
- Division of Engineering and Applied Sciences, California Institute of Technology, Pasadena, California, United States of America
| | - Marianne Razavi
- Division of Engineering and Applied Sciences, California Institute of Technology, Pasadena, California, United States of America
| | - Sean Brady
- Principium Consulting, LLC, Pasadena, California, United States of America
| |
Collapse
|
5
|
Tuttle S, Salvucci G. Empirical evidence of contrasting soil moisture–precipitation feedbacks across the United States. Science 2016; 352:825-8. [DOI: 10.1126/science.aaa7185] [Citation(s) in RCA: 126] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2015] [Accepted: 04/04/2016] [Indexed: 11/02/2022]
|
6
|
Good IJ. Minicommnications. COMMUN STAT-SIMUL C 2007. [DOI: 10.1080/03610917608812009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
7
|
GRECHANOVSKY EUGENE. Stepwise Regression Procedures: Overview, Problems, Results, and Suggestions. Ann N Y Acad Sci 2006. [DOI: 10.1111/j.1749-6632.1987.tb30057.x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
8
|
Mendez E, Billings S. An alternative solution to the model structure selection problem. ACTA ACUST UNITED AC 2001. [DOI: 10.1109/3468.983416] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
9
|
Gunst RF. Classical Studies That Revolutionized the Practice of Regression Analysis. Technometrics 2000. [DOI: 10.1080/00401706.2000.10485980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
10
|
|
11
|
|
12
|
Anthropometry, cholesterol, HDL and LDL in relation to alpha-2-macroglobulin in thai construction site workers. Nutr Res 1996. [DOI: 10.1016/0271-5317(96)00120-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
13
|
Pongpaew P, Boonyakarnkul N, Schelp FP, Changbumrung S, Supawan V, Tawprasert S, Migasena P. Serum concentrations of alpha-2-macroglobulin and other serum proteinase inhibitors in Thai vegetarians and omnivores. Nutr Res 1994. [DOI: 10.1016/s0271-5317(05)80172-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
14
|
Zeng W, Wang P, Zhang H, Tong S. Qualitative and quantitative analyses of synthetic pigments in foods by using the branch and bound algorithm. Anal Chim Acta 1993. [DOI: 10.1016/0003-2670(93)85330-m] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
15
|
Information extraction on efficient purification of organic reagents by using the branch and bound algorithm. Anal Chim Acta 1993. [DOI: 10.1016/0003-2670(93)80256-k] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
16
|
Xie YL, Liang YZ, Yu RQ. Quantitative calibration of multi-component systems with a known range of possibly co-existing species. Anal Chim Acta 1993. [DOI: 10.1016/0003-2670(93)80376-v] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
17
|
Influence of dietary intake on alpha-2-macroglobulin and other biochemical parameters in healthy thai males. Nutr Res 1991. [DOI: 10.1016/s0271-5317(05)80346-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
18
|
|
19
|
|
20
|
|
21
|
Hawkins DM, Eplett WJR. The Cholesky Factorization of the Inverse Correlation or Covariance Matrix in Multiple Regression. Technometrics 1982. [DOI: 10.1080/00401706.1982.10487758] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
22
|
|
23
|
|
24
|
|
25
|
HERMANSSON AM, ÅKESSON C. FUNCTIONAL PROPERTIES OF ADDED PROTEINS CORRELATED WITH PROPERTIES OF MEAT SYSTEMS. Effect of Concentration and Temperature on Water-Binding Properties of Model Meat Systems. J Food Sci 1975. [DOI: 10.1111/j.1365-2621.1975.tb12536.x] [Citation(s) in RCA: 66] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
26
|
|
27
|
Diehr G, Hoflin DR. Approximating the Distribution of the Sample R 2in Best Subset Regressions. Technometrics 1974. [DOI: 10.1080/00401706.1974.10489189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
28
|
|
29
|
Bunke H, Bunke O. Das empirische Entscheidungsprinzip und die Wahl von Regressionsmodellen. ACTA ACUST UNITED AC 1974. [DOI: 10.1002/bimj.19740160303] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
30
|
|
31
|
PERSSON TYKO, SYDOW ERIK, ÅKESSON CAJ. THE AROMA OF CANNED BEEF: MODELS FOR CORRELATION OF INSTRUMENTAL AND SENSORY DATA. J Food Sci 1973. [DOI: 10.1111/j.1365-2621.1973.tb02845.x] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
32
|
|
33
|
|
34
|
|
35
|
|