1
|
Kwan B, Fuhrer T, Montemayor D, Fink JC, He J, Hsu CY, Messer K, Nelson RG, Pu M, Ricardo AC, Rincon-Choles H, Shah VO, Ye H, Zhang J, Sharma K, Natarajan L. A generalized covariate-adjusted top-scoring pair algorithm with applications to diabetic kidney disease stage classification in the Chronic Renal Insufficiency Cohort (CRIC) Study. BMC Bioinformatics 2023; 24:57. [PMID: 36803209 PMCID: PMC9942303 DOI: 10.1186/s12859-023-05171-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 02/02/2023] [Indexed: 02/22/2023] Open
Abstract
BACKGROUND The growing amount of high dimensional biomolecular data has spawned new statistical and computational models for risk prediction and disease classification. Yet, many of these methods do not yield biologically interpretable models, despite offering high classification accuracy. An exception, the top-scoring pair (TSP) algorithm derives parameter-free, biologically interpretable single pair decision rules that are accurate and robust in disease classification. However, standard TSP methods do not accommodate covariates that could heavily influence feature selection for the top-scoring pair. Herein, we propose a covariate-adjusted TSP method, which uses residuals from a regression of features on the covariates for identifying top scoring pairs. We conduct simulations and a data application to investigate our method, and compare it to existing classifiers, LASSO and random forests. RESULTS Our simulations found that features that were highly correlated with clinical variables had high likelihood of being selected as top scoring pairs in the standard TSP setting. However, through residualization, our covariate-adjusted TSP was able to identify new top scoring pairs, that were largely uncorrelated with clinical variables. In the data application, using patients with diabetes (n = 977) selected for metabolomic profiling in the Chronic Renal Insufficiency Cohort (CRIC) study, the standard TSP algorithm identified (valine-betaine, dimethyl-arg) as the top-scoring metabolite pair for classifying diabetic kidney disease (DKD) severity, whereas the covariate-adjusted TSP method identified the pair (pipazethate, octaethylene glycol) as top-scoring. Valine-betaine and dimethyl-arg had, respectively, ≥ 0.4 absolute correlation with urine albumin and serum creatinine, known prognosticators of DKD. Thus without covariate-adjustment the top-scoring pair largely reflected known markers of disease severity, whereas covariate-adjusted TSP uncovered features liberated from confounding, and identified independent prognostic markers of DKD severity. Furthermore, TSP-based methods achieved competitive classification accuracy in DKD to LASSO and random forests, while providing more parsimonious models. CONCLUSIONS We extended TSP-based methods to account for covariates, via a simple, easy to implement residualizing process. Our covariate-adjusted TSP method identified metabolite features, uncorrelated from clinical covariates, that discriminate DKD severity stage based on the relative ordering between two features, and thus provide insights into future studies on the order reversals in early vs advanced disease states.
Collapse
Grants
- R01 DK110541 NIDDK NIH HHS
- U24 DK060990 NIDDK NIH HHS
- R01DK118736, 1R01DK110541-01A1, U01DK060990, U01DK060984, U01DK061022, U01DK061021, U01DK061028, U01DK060980, U01DK060963, U01DK060902, U24DK060990 NIDDK NIH HHS
- National Science Foundation Graduate Research Fellowship Program
- Intramural Research Program of the National Institute of Diabetes and Digestive and Kidney Diseases
- National Institute of Diabetes and Digestive and Kidney Diseases
Collapse
Affiliation(s)
- Brian Kwan
- Division of Biostatistics and Bioinformatics, Herbert Wertheim School of Public Health, University of California, San Diego, La Jolla, CA, USA
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Tobias Fuhrer
- Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Daniel Montemayor
- Division of Nephrology, Department of Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
- Center for Renal Precision Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
| | - Jeffery C Fink
- Department of Medicine, University of Maryland, Baltimore School of Medicine, Baltimore, MD, USA
| | - Jiang He
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine and Tulane University Translational Science Institute,, New Orleans, LA, USA
| | - Chi-Yuan Hsu
- Division of Nephrology, University of California, San Francisco School of Medicine, San Francisco, CA, USA
| | - Karen Messer
- Division of Biostatistics and Bioinformatics, Herbert Wertheim School of Public Health, University of California, San Diego, La Jolla, CA, USA
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Robert G Nelson
- Chronic Kidney Disease Section, National Institute of Diabetes and Digestive and Kidney Diseases, Phoenix, AZ, USA
| | - Minya Pu
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Ana C Ricardo
- Department of Medicine, University of Illinois, Chicago, IL, USA
| | - Hernan Rincon-Choles
- Department of Nephrology, Glickman Urological and Kidney Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Vallabh O Shah
- University of New Mexico Health Sciences Center, Albuquerque, NM, USA
| | - Hongping Ye
- Division of Nephrology, Department of Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
- Center for Renal Precision Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
| | - Jing Zhang
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Kumar Sharma
- Division of Nephrology, Department of Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
- Center for Renal Precision Medicine, University of Texas Health San Antonio, San Antonio, TX, USA
| | - Loki Natarajan
- Division of Biostatistics and Bioinformatics, Herbert Wertheim School of Public Health, University of California, San Diego, La Jolla, CA, USA.
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
3
|
Bowen L, Manlove K, Roug A, Waters S, LaHue N, Wolff P. Using transcriptomics to predict and visualize disease status in bighorn sheep ( Ovis canadensis). CONSERVATION PHYSIOLOGY 2022; 10:coac046. [PMID: 35795016 PMCID: PMC9252122 DOI: 10.1093/conphys/coac046] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 02/18/2022] [Accepted: 06/17/2022] [Indexed: 06/15/2023]
Abstract
Increasing risk of pathogen spillover coupled with overall declines in wildlife population abundance in the Anthropocene make infectious disease a relevant concern for species conservation worldwide. While emerging molecular tools could improve our diagnostic capabilities and give insight into mechanisms underlying wildlife disease risk, they have rarely been applied in practice. Here, employing a previously reported gene transcription panel of common immune markers to track physiological changes, we present a detailed analysis over the course of both acute and chronic infection in one wildlife species where disease plays a critical role in conservation, bighorn sheep (Ovis canadensis). Differential gene transcription patterns distinguished between infection statuses over the course of acute infection and differential correlation (DC) analyses identified clear changes in gene co-transcription patterns over the early stages of infection, with transcription of four genes-TGFb, AHR, IL1b and MX1-continuing to increase even as transcription of other immune-associated genes waned. In a separate analysis, we considered the capacity of the same gene transcription panel to aid in differentiating between chronically infected animals and animals in other disease states outside of acute disease events (an immediate priority for wildlife management in this system). We found that this transcription panel was capable of accurately identifying chronically infected animals in the test dataset, though additional data will be required to determine how far this ability extends. Taken together, our results showcase the successful proof of concept and breadth of potential utilities that gene transcription might provide to wildlife disease management, from direct insight into mechanisms associated with differential disease response to improved diagnostic capacity in the field.
Collapse
Affiliation(s)
| | - Kezia Manlove
- Department of Wildland Resources and Ecology Center, Utah State University, Logan, UT, 84322, USA
| | - Annette Roug
- Centre for Veterinary Wildlife Studies, Faculty of Veterinary Medicine, University of Pretoria, Onderstepoort, 0110, South Africa
| | - Shannon Waters
- U.S. Geological Survey, Western Ecological Research Center, Davis, CA, 95616, USA
| | - Nate LaHue
- Nevada Department of Wildlife, Reno, NV, 89512, USA
| | | |
Collapse
|
4
|
Wu TC, Zhou Z, Wang H, Wang B, Lin T, Feng C, Tu XM. Advanced machine learning methods in psychiatry: an introduction. Gen Psychiatr 2020; 33:e100197. [PMID: 32215364 PMCID: PMC7076259 DOI: 10.1136/gpsych-2020-100197] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 02/03/2020] [Indexed: 12/15/2022] Open
Abstract
Mental health questions can be tackled through machine learning (ML) techniques. Apart from the two ML methods we introduced in our previous paper, we discuss two more advanced ML approaches in this paper: support vector machines and artificial neural networks. To illustrate how these ML methods have been employed in mental health, recent research applications in psychiatry were reported.
Collapse
Affiliation(s)
- Tsung-Chin Wu
- Department of Mathematics, University of California San Diego, La Jolla, California, USA
| | - Zhirou Zhou
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, New York, USA
| | - Hongyue Wang
- Departments of Biostatistics and Computational Biology and Anesthesiology, University of Rochester, Rochester, New York, USA
| | - Bokai Wang
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, New York, USA
| | - Tuo Lin
- Clinical and Translational Research Institute, University of California San Diego, San Diego, California, USA
| | - Changyong Feng
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, New York, USA
| | - Xin M Tu
- Family Medicine and Public Health, University of California San Diego, La Jolla, California, USA.,Naval Health Research Center, San Diego, California, USA
| |
Collapse
|
5
|
Asafu-Adjei JK, Sampson AR. Covariate adjusted classification trees. Biostatistics 2019; 19:42-53. [PMID: 28520903 DOI: 10.1093/biostatistics/kxx015] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Accepted: 03/16/2017] [Indexed: 11/12/2022] Open
Abstract
In studies that compare several diagnostic groups, subjects can be measured on certain features and classification trees can be used to identify which of them best characterize the differences among groups. However, subjects may also be measured on additional covariates whose ability to characterize group differences is not meaningful or of interest, but may still have an impact on the examined features. Therefore, it is important to adjust for the effects of covariates on these features. We present a new semi-parametric approach to adjust for covariate effects when constructing classification trees based on the features of interest that is readily implementable. An application is given for postmortem brain tissue data to compare the neurobiological characteristics of subjects with schizophrenia to those of normal controls. We also evaluate the performance of our approach using a simulation study.
Collapse
Affiliation(s)
- Josephine K Asafu-Adjei
- Department of Biostatistics, School of Nursing, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Allan R Sampson
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA 15260, USA
| |
Collapse
|
6
|
Asafu-Adjei JK, Sampson AR, Sweet RA, Lewis DA. Adjusting for matching and covariates in linear discriminant analysis. Biostatistics 2013; 14:779-91. [PMID: 23640791 DOI: 10.1093/biostatistics/kxt017] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In studies that compare several diagnostic or treatment groups, subjects may not only be measured on a certain set of feature variables, but also be matched on a number of demographic characteristics and measured on additional covariates. Linear discriminant analysis (LDA) is sometimes used to identify which feature variables best discriminate among groups, while accounting for the dependencies among the feature variables. We present a new approach to LDA for multivariate normal data that accounts for the subject matching used in a particular study design, as well as covariates not used in the matching. Applications are given for post-mortem tissue data with the aim of comparing neurobiological characteristics of subjects with schizophrenia with those of normal controls, and for a post-mortem tissue primate study comparing brain biomarker measurements across three treatment groups. We also investigate the performance of our approach using a simulation study.
Collapse
|
7
|
Kowalski J, Tu XM, Jia G, Perlis M, Frank E, Crits-Christoph P, Kupfer DJ. Generalized covariance-adjusted canonical correlation analysis with application to psychiatry. Stat Med 2003; 22:595-610. [PMID: 12590416 DOI: 10.1002/sim.1332] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The lack of control over covariates in practice motivates the need for their adjustment when measuring the degree of association between two sets of variables, for which canonical correlation is traditionally used. In most studies however, there is also a lack of control over the attributes of responses for the sets of variables of interest. In particular, a portion of the response variable may be continuous and the other discrete. For such settings, the traditional partial canonical correlation approach is restrictive, since a covariate-adjustment for a set of continuous variables is assumed. By ignoring the assumption of continuous variates and proceeding with a partial canonical correlation analysis in the presence of continuous and discrete variates, results in canonical correlation estimates that are not consistent. In this paper we generalize the traditional partial canonical correlation approach to covariate-adjustment by allowing the response variables to contain continuous, as well as discrete, variates. The methodology is illustrated with a psychiatric application for examining which sleep variables relate to which depressive symptoms, as measured by commonly used constructs that presents with both continuous and discrete outcomes.
Collapse
Affiliation(s)
- J Kowalski
- Division of Oncology Biostatistics, Johns Hopkins University, U.S.A
| | | | | | | | | | | | | |
Collapse
|