1
|
Marchena-Romero KJ, Ji X, Sommer R, Centen A, Ramirez J, Poulin JM, Mikulis D, Thrippleton M, Wardlaw J, Lim A, Black SE, MacIntosh BJ. Examining temporal features of BOLD-based cerebrovascular reactivity in clinical populations. Front Neurol 2023; 14:1199805. [PMID: 37396759 PMCID: PMC10310960 DOI: 10.3389/fneur.2023.1199805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 05/25/2023] [Indexed: 07/04/2023] Open
Abstract
Background Conventional cerebrovascular reactivity (CVR) estimation has demonstrated that many brain diseases and/or conditions are associated with altered CVR. Despite the clinical potential of CVR, characterization of temporal features of a CVR challenge remains uncommon. This work is motivated by the need to develop CVR parameters that characterize individual temporal features of a CVR challenge. Methods Data were collected from 54 adults and recruited based on these criteria: (1) Alzheimer's disease diagnosis or subcortical Vascular Cognitive Impairment, (2) sleep apnea, and (3) subjective cognitive impairment concerns. We investigated signal changes in blood oxygenation level dependent (BOLD) contrast images with respect to hypercapnic and normocapnic CVR transition periods during a gas manipulation paradigm. We developed a model-free, non-parametric CVR metric after considering a range of responses through simulations to characterize BOLD signal changes that occur when transitioning from normocapnia to hypercapnia. The non-parametric CVR measure was used to examine regional differences across the insula, hippocampus, thalamus, and centrum semiovale. We also examined the BOLD signal transition from hypercapnia back to normocapnia. Results We found a linear association between isolated temporal features of successive CO2 challenges. Our study concluded that the transition rate from hypercapnia to normocapnia was significantly associated with the second CVR response across all regions of interest (p < 0.001), and this association was highest in the hippocampus (R2 = 0.57, p < 0.0125). Conclusion This study demonstrates that it is feasible to examine individual responses associated with normocapnic and hypercapnic transition periods of a BOLD-based CVR experiment. Studying these features can provide insight on between-subject differences in CVR.
Collapse
Affiliation(s)
- Kayley-Jasmin Marchena-Romero
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
- Hurvitz Brain Sciences Program, Sunnybrook Research Institute, Toronto, ON, Canada
| | - Xiang Ji
- Hurvitz Brain Sciences Program, Sunnybrook Research Institute, Toronto, ON, Canada
- Dr. Sandra Black Centre for Brain Resilience and Recovery, Toronto, ON, Canada
| | - Rosa Sommer
- Hurvitz Brain Sciences Program, Sunnybrook Research Institute, Toronto, ON, Canada
- Dr. Sandra Black Centre for Brain Resilience and Recovery, Toronto, ON, Canada
- Institute of Medical Sciences, University of Toronto, Toronto, ON, Canada
| | - Andrew Centen
- Hurvitz Brain Sciences Program, Sunnybrook Research Institute, Toronto, ON, Canada
| | - Joel Ramirez
- Hurvitz Brain Sciences Program, Sunnybrook Research Institute, Toronto, ON, Canada
- Dr. Sandra Black Centre for Brain Resilience and Recovery, Toronto, ON, Canada
| | - Joshua M. Poulin
- Hurvitz Brain Sciences Program, Sunnybrook Research Institute, Toronto, ON, Canada
- Dr. Sandra Black Centre for Brain Resilience and Recovery, Toronto, ON, Canada
| | - David Mikulis
- Institute of Medical Sciences, University of Toronto, Toronto, ON, Canada
- Division of Neuroradiology, Joint Department of Medical Imaging, University Health Network, Toronto, ON, Canada
- Department of Medical Imaging, University of Toronto, Toronto, ON, Canada
| | - Michael Thrippleton
- Brain Research Imaging Centre, Centre for Clinical Brain Sciences, UK Dementia Research Institute Centre, The University of Edinburgh, Edinburgh, United Kingdom
| | - Joanna Wardlaw
- Brain Research Imaging Centre, Centre for Clinical Brain Sciences, UK Dementia Research Institute Centre, The University of Edinburgh, Edinburgh, United Kingdom
| | - Andrew Lim
- Hurvitz Brain Sciences Program, Sunnybrook Research Institute, Toronto, ON, Canada
- Institute of Medical Sciences, University of Toronto, Toronto, ON, Canada
| | - Sandra E. Black
- Hurvitz Brain Sciences Program, Sunnybrook Research Institute, Toronto, ON, Canada
- Dr. Sandra Black Centre for Brain Resilience and Recovery, Toronto, ON, Canada
| | - Bradley J. MacIntosh
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
- Hurvitz Brain Sciences Program, Sunnybrook Research Institute, Toronto, ON, Canada
- Dr. Sandra Black Centre for Brain Resilience and Recovery, Toronto, ON, Canada
| |
Collapse
|
2
|
Wang Y, Lê Cao KA. PLSDA-batch: a multivariate framework to correct for batch effects in microbiome data. Brief Bioinform 2023; 24:6991121. [PMID: 36653900 PMCID: PMC10025448 DOI: 10.1093/bib/bbac622] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 12/14/2022] [Accepted: 12/17/2022] [Indexed: 01/20/2023] Open
Abstract
Microbial communities are highly dynamic and sensitive to changes in the environment. Thus, microbiome data are highly susceptible to batch effects, defined as sources of unwanted variation that are not related to and obscure any factors of interest. Existing batch effect correction methods have been primarily developed for gene expression data. As such, they do not consider the inherent characteristics of microbiome data, including zero inflation, overdispersion and correlation between variables. We introduce new multivariate and non-parametric batch effect correction methods based on Partial Least Squares Discriminant Analysis (PLSDA). PLSDA-batch first estimates treatment and batch variation with latent components, then subtracts batch-associated components from the data. The resulting batch-effect-corrected data can then be input in any downstream statistical analysis. Two variants are proposed to handle unbalanced batch x treatment designs and to avoid overfitting when estimating the components via variable selection. We compare our approaches with popular methods managing batch effects, namely, removeBatchEffect, ComBat and Surrogate Variable Analysis, in simulated and three case studies using various visual and numerical assessments. We show that our three methods lead to competitive performance in removing batch variation while preserving treatment variation, especially for unbalanced batch $\times $ treatment designs. Our downstream analyses show selections of biologically relevant taxa. This work demonstrates that batch effect correction methods can improve microbiome research outputs. Reproducible code and vignettes are available on GitHub.
Collapse
Affiliation(s)
- Yiwen Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, 97 Buxin Rd, Shenzhen, 518000, Guangdong, China
- Melbourne Integrative Genomics, School of Mathematics and Statistics, The University of Melbourne, 30 Royal Parade, Melbourne, 3052, VIC, Australia
| | - Kim-Anh Lê Cao
- Melbourne Integrative Genomics, School of Mathematics and Statistics, The University of Melbourne, 30 Royal Parade, Melbourne, 3052, VIC, Australia
| |
Collapse
|
3
|
Grieve AP. On implementing Jeffreys' substitution likelihood for Bayesian inference concerning the medians of unknown distributions. Pharm Stat 2023; 22:365-377. [PMID: 36510749 DOI: 10.1002/pst.2277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 09/30/2022] [Accepted: 11/20/2022] [Indexed: 12/15/2022]
Abstract
When statisticians are uncertain as to which parametric statistical model to use to analyse experimental data, they will often resort to a non-parametric approach. The purpose of this paper is to provide insight into a simple approach to take when it is unclear as to the appropriate parametric model and plan to conduct a Bayesian analysis. I introduce an approximate, or substitution likelihood, first proposed by Harold Jeffreys in 1939 and show how to implement the approach combined with both a non-informative and an informative prior to provide a random sample from the posterior distribution of the median of the unknown distribution. The first example I use to demonstrate the approach is a within-patient bioequivalence design and then show how to extend the approach to a parallel group design.
Collapse
Affiliation(s)
- Andrew P Grieve
- Centre of Excellence for Statistical Innovation, UCB Pharma, Berkshire, UK
| |
Collapse
|
4
|
Venkatasubramaniam A, Evers L, Thakuriah P, Ampountolas K. Functional distributional clustering using spatio-temporal data. J Appl Stat 2023; 50:909-926. [PMID: 36925906 PMCID: PMC10013458 DOI: 10.1080/02664763.2021.2001443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
This paper presents a new method called the functional distributional clustering algorithm (FDCA) that seeks to identify spatially contiguous clusters and incorporate changes in temporal patterns across overcrowded networks. This method is motivated by a graph-based network composed of sensors arranged over space where recorded observations for each sensor represent a multi-modal distribution. The proposed method is fully non-parametric and generates clusters within an agglomerative hierarchical clustering approach based on a measure of distance that defines a cumulative distribution function over temporal changes for different locations in space. Traditional hierarchical clustering algorithms that are spatially adapted do not typically accommodate the temporal characteristics of the underlying data. The effectiveness of the FDCA is illustrated using an application to both empirical and simulated data from about 400 sensors in a 2.5 square miles network area in downtown San Francisco, California. The results demonstrate the superior ability of the the FDCA in identifying true clusters compared to functional only and distributional only algorithms and similar performance to a model-based clustering algorithm.
Collapse
Affiliation(s)
| | - L Evers
- School of Mathematics and Statistics, University of Glasgow, Glasgow, UK
| | - P Thakuriah
- E.J. Bloustein School of Planning & Public Policy, Rutgers University, New Brunswick, NJ, USA
| | - K Ampountolas
- James Watt School of Engineering, University of Glasgow, Glasgow, UK.,Department of Mechanical Engineering, University of Thessaly, Volos, Greece
| |
Collapse
|
5
|
Xu Y, Gao X, Wang X. Nonparametric Clustering of Mixed Data Using Modified Chi-Squared Tests. Entropy (Basel) 2022; 24:1749. [PMID: 36554154 PMCID: PMC9778617 DOI: 10.3390/e24121749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 11/27/2022] [Accepted: 11/28/2022] [Indexed: 06/17/2023]
Abstract
We propose a non-parametric method to cluster mixed data containing both continuous and discrete random variables. The product space of the continuous and discrete sample space is transformed into a new product space based on adaptive quantization on the continuous part. Detection of cluster patterns on the product space is determined locally by using a weighted modified chi-squared test. Our algorithm does not require any user input since the number of clusters is determined automatically by data. Simulation studies and real data analysis results show that our proposed method outperforms the benchmark method, AutoClass, in various settings.
Collapse
|
6
|
Combrisson E, Allegra M, Basanisi R, Ince RAA, Giordano B, Bastin J, Brovelli A. Group-level inference of information-based measures for the analyses of cognitive brain networks from neurophysiological data. Neuroimage 2022; 258:119347. [PMID: 35660460 DOI: 10.1016/j.neuroimage.2022.119347] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Revised: 05/24/2022] [Accepted: 05/30/2022] [Indexed: 12/30/2022] Open
Abstract
The reproducibility crisis in neuroimaging and in particular in the case of underpowered studies has introduced doubts on our ability to reproduce, replicate and generalize findings. As a response, we have seen the emergence of suggested guidelines and principles for neuroscientists known as Good Scientific Practice for conducting more reliable research. Still, every study remains almost unique in its combination of analytical and statistical approaches. While it is understandable considering the diversity of designs and brain data recording, it also represents a striking point against reproducibility. Here, we propose a non-parametric permutation-based statistical framework, primarily designed for neurophysiological data, in order to perform group-level inferences on non-negative measures of information encompassing metrics from information-theory, machine-learning or measures of distances. The framework supports both fixed- and random-effect models to adapt to inter-individuals and inter-sessions variability. Using numerical simulations, we compared the accuracy in ground-truth retrieving of both group models, such as test- and cluster-wise corrections for multiple comparisons. We then reproduced and extended existing results using both spatially uniform MEG and non-uniform intracranial neurophysiological data. We showed how the framework can be used to extract stereotypical task- and behavior-related effects across the population covering scales from the local level of brain regions, inter-areal functional connectivity to measures summarizing network properties. We also present an open-source Python toolbox called Frites1 that includes the proposed statistical pipeline using information-theoretic metrics such as single-trial functional connectivity estimations for the extraction of cognitive brain networks. Taken together, we believe that this framework deserves careful attention as its robustness and flexibility could be the starting point toward the uniformization of statistical approaches.
Collapse
Affiliation(s)
- Etienne Combrisson
- Institut de Neurosciences de la Timone, Aix Marseille Université, UMR 7289 CNRS, 13005, Marseille, France.
| | - Michele Allegra
- Institut de Neurosciences de la Timone, Aix Marseille Université, UMR 7289 CNRS, 13005, Marseille, France; Dipartimento di Fisica e Astronomia "Galileo Galilei", Università di Padova, via Marzolo 8, 35131 Padova, Italy; Padua Neuroscience Center, Università di Padova, via Orus 2, 35131 Padova, Italy
| | - Ruggero Basanisi
- Institut de Neurosciences de la Timone, Aix Marseille Université, UMR 7289 CNRS, 13005, Marseille, France
| | - Robin A A Ince
- School of Psychology and Neuroscience, University of Glasgow, Glasgow, UK
| | - Bruno Giordano
- Institut de Neurosciences de la Timone, Aix Marseille Université, UMR 7289 CNRS, 13005, Marseille, France
| | - Julien Bastin
- Univ. Grenoble Alpes, Inserm, U1216, Grenoble Institut Neurosciences, 38000 Grenoble, France
| | - Andrea Brovelli
- Institut de Neurosciences de la Timone, Aix Marseille Université, UMR 7289 CNRS, 13005, Marseille, France.
| |
Collapse
|
7
|
Merrick LF, Lozada DN, Chen X, Carter AH. Classification and Regression Models for Genomic Selection of Skewed Phenotypes: A Case for Disease Resistance in Winter Wheat ( Triticum aestivum L.). Front Genet 2022; 13:835781. [PMID: 35281841 PMCID: PMC8904966 DOI: 10.3389/fgene.2022.835781] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 01/19/2022] [Indexed: 11/22/2022] Open
Abstract
Most genomic prediction models are linear regression models that assume continuous and normally distributed phenotypes, but responses to diseases such as stripe rust (caused by Puccinia striiformis f. sp. tritici) are commonly recorded in ordinal scales and percentages. Disease severity (SEV) and infection type (IT) data in germplasm screening nurseries generally do not follow these assumptions. On this regard, researchers may ignore the lack of normality, transform the phenotypes, use generalized linear models, or use supervised learning algorithms and classification models with no restriction on the distribution of response variables, which are less sensitive when modeling ordinal scores. The goal of this research was to compare classification and regression genomic selection models for skewed phenotypes using stripe rust SEV and IT in winter wheat. We extensively compared both regression and classification prediction models using two training populations composed of breeding lines phenotyped in 4 years (2016–2018 and 2020) and a diversity panel phenotyped in 4 years (2013–2016). The prediction models used 19,861 genotyping-by-sequencing single-nucleotide polymorphism markers. Overall, square root transformed phenotypes using ridge regression best linear unbiased prediction and support vector machine regression models displayed the highest combination of accuracy and relative efficiency across the regression and classification models. Furthermore, a classification system based on support vector machine and ordinal Bayesian models with a 2-Class scale for SEV reached the highest class accuracy of 0.99. This study showed that breeders can use linear and non-parametric regression models within their own breeding lines over combined years to accurately predict skewed phenotypes.
Collapse
Affiliation(s)
- Lance F Merrick
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
| | - Dennis N Lozada
- Department of Plant and Environmental Sciences, New Mexico State University, Las Cruces, NM, United States
| | - Xianming Chen
- USDA-ARS Wheat Health, Genetics and Quality Research Unit and Department of Plant Pathology, Washington State University, Pullman, WA, United States
| | - Arron H Carter
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States
| |
Collapse
|
8
|
Enyew BY, Asfaw ZG. Comparison of survival models and assessment of risk factors for survival of cardiovascular patients at Addis Ababa Cardiac Center, Ethiopia: a retrospective study. Afr Health Sci 2021; 21:1201-1213. [PMID: 35222583 PMCID: PMC8843306 DOI: 10.4314/ahs.v21i3.29] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Background Cardiovascular diseases (CVDs) is disorders of heart and blood vessels. It is a major health problem across the world, and 82% of CVD deaths is contributed by countries with low and middle income. The aim of this study was to choose appropriate model for the survival of cardiovascular patients data and identify the factors that affect the survival of cardiovascular patients at Addis Ababa Cardiac Center. Method A Retrospective study was conducted on patients under follow-up at Addis Ababa Cardiac Center between September 2010 to December 2018. The patients included have made either post operation or pre-operation. Out of 1042 cardiac patients, a sample of 332 were selected for the current study using simple random sampling technique. Non-parametric, semi-parametric and parametric survival models were used and comparisons were made to select the appropriate predicting model. Results Among the sample of 332 cardiac patients, only 67(20.2%) experienced CVD and the remaining 265(79.8%) were censored. The median and the maximum survival time of cardiac patients was 1925 and 1403 days respectively. The estimated hazard ratio of male patients to female patients is 1.926214 (95%CI: 1.111917–3.336847; p = 0.019) implying that the risk of death of male patients is 1.926214 times higher than female cardiac patients keeping the other covariates constant in the model. Even if, all semi parametric and parametric survival models fitted to the current data well, various model comparison criteria showed that parametric/weibull AFT survival model is better than the other. Conclusions The governmental and non-governmental stakeholders should pay attention to give training on the risk factors identified on the current study to optimize individual's knowledge and awareness so that death due to CVDs can be minimized.
Collapse
|
9
|
Feutrill A, Roughan M. A Review of Shannon and Differential Entropy Rate Estimation. Entropy (Basel) 2021; 23:1046. [PMID: 34441186 PMCID: PMC8392187 DOI: 10.3390/e23081046] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 08/04/2021] [Accepted: 08/09/2021] [Indexed: 11/17/2022]
Abstract
In this paper, we present a review of Shannon and differential entropy rate estimation techniques. Entropy rate, which measures the average information gain from a stochastic process, is a measure of uncertainty and complexity of a stochastic process. We discuss the estimation of entropy rate from empirical data, and review both parametric and non-parametric techniques. We look at many different assumptions on properties of the processes for parametric processes, in particular focussing on Markov and Gaussian assumptions. Non-parametric estimation relies on limit theorems which involve the entropy rate from observations, and to discuss these, we introduce some theory and the practical implementations of estimators of this type.
Collapse
Affiliation(s)
- Andrew Feutrill
- CSIRO/Data61, 13 Kintore Avenue, Adelaide, SA 5000, Australia
- School of Mathematical Sciences, The University of Adelaide, Adelaide, SA 5005, Australia;
- ARC Centre of Excellence for Mathematical & Statistical Frontiers, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Matthew Roughan
- School of Mathematical Sciences, The University of Adelaide, Adelaide, SA 5005, Australia;
- ARC Centre of Excellence for Mathematical & Statistical Frontiers, The University of Melbourne, Parkville, VIC 3010, Australia
| |
Collapse
|
10
|
Balzer LB, Westling T. Demystifying Statistical Inference When Using Machine Learning in Causal Research. Am J Epidemiol 2021; 192:kwab200. [PMID: 34268553 PMCID: PMC10472326 DOI: 10.1093/aje/kwab200] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 04/22/2021] [Accepted: 04/29/2021] [Indexed: 11/13/2022] Open
Abstract
In this issue, Naimi et al. (Am J Epidemiol. XXXX;XXX(XX):XXXX-XXXX) discuss a critical topic in public health and beyond: obtaining valid statistical inference when using machine learning in causal research. In doing so, the authors review recent prominent methodological work and recommend: (i) double robust estimators, such as targeted maximum likelihood estimation (TMLE); (ii) ensemble methods, such as Super Learner, to combine predictions from a diverse library of algorithms, and (iii) sample-splitting to reduce bias and improve inference. We largely agree with these recommendations. In this commentary, we highlight the critical importance of the Super Learner library. Specifically, in both simulation settings considered by the authors, we demonstrate that low bias and valid statistical inference can be achieved using TMLE without sample-splitting and with a Super Learner library that excludes tree-based methods but includes regression splines. Whether extremely data-adaptive algorithms and sample-splitting are needed depends on the specific problem and should be informed by simulations reflecting the specific application. More research is needed on practical recommendations for selecting among these options in common situations arising in epidemiology.
Collapse
Affiliation(s)
- Laura B Balzer
- Correspondence to Dr. Laura B. Balzer, Department of Biostatistics and Epidemiology, University of Massachusetts Amherst, 427 Arnold House, Amherst, MA 01003 (e-mail: )
| | | |
Collapse
|
11
|
López-Cheda A, Jácome MA, Cao R, De Salazar PM. Estimating lengths-of-stay of hospitalised COVID-19 patients using a non-parametric model: a case study in Galicia (Spain). Epidemiol Infect 2021; 149:e102. [PMID: 33902779 DOI: 10.1017/S0950268821000959] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Estimating the lengths-of-stay (LoS) of hospitalised COVID-19 patients is key for predicting the hospital beds’ demand and planning mitigation strategies, as overwhelming the healthcare systems has critical consequences for disease mortality. However, accurately mapping the time-to-event of hospital outcomes, such as the LoS in the intensive care unit (ICU), requires understanding patient trajectories while adjusting for covariates and observation bias, such as incomplete data. Standard methods, such as the Kaplan-Meier estimator, require prior assumptions that are untenable given current knowledge. Using real-time surveillance data from the first weeks of the COVID-19 epidemic in Galicia (Spain), we aimed to model the time-to-event and event probabilities of patients’ hospitalised, without parametric priors and adjusting for individual covariates. We applied a non-parametric mixture cure model and compared its performance in estimating hospital ward (HW)/ICU LoS to the performances of commonly used methods to estimate survival. We showed that the proposed model outperformed standard approaches, providing more accurate ICU and HW LoS estimates. Finally, we applied our model estimates to simulate COVID-19 hospital demand using a Monte Carlo algorithm. We provided evidence that adjusting for sex, generally overlooked in prediction models, together with age is key for accurately forecasting HW and ICU occupancy, as well as discharge or death outcomes.
Collapse
|
12
|
Shero JA, Al Otaiba S, Schatschneider C, Hart SA. Data Envelopment Analysis (DEA) in the Educational Sciences. J Exp Educ 2021; 90:1021-1040. [PMID: 36324877 PMCID: PMC9624468 DOI: 10.1080/00220973.2021.1906198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Many of the analytical models commonly used in educational research often aim to maximize explained variance and identify variable importance within models. These models are useful for understanding general ideas and trends, but give limited insight into the individuals within said models. Data envelopment analysis (DEA), is a method rooted in organizational management that makes such insights possible. Unlike models alluded to above, DEA does not explain variance. Instead, it explains how efficiently an individual utilizes their inputs to produce outputs, and identifies which input is not being utilized optimally. This paper provides a history and usages of DEA from fields outside of education, and describes the math and processes behind it. This paper then extends DEA's usage into the educational field using a study on child reading ability. Using students from the Project KIDS dataset (n=1987), DEA is demonstrated using a simple view of reading framework, identifying individual efficiency levels in using reading-based skills to achieve reading comprehension, determining which skills are being underutilized, and classifying new subsets of readers. New subsets of readers were identified using this method, with implications for more targeted interventions.
Collapse
Affiliation(s)
| | | | | | - Sara A. Hart
- Florida State University
- Florida Center for Reading Research
| |
Collapse
|
13
|
Willette AA, Willette SA, Wang Q, Pappas C, Klinedinst BS, Le S, Larsen B, Pollpeter A, Li T, Brenner N, Waterboer T. Using machine learning to predict COVID-19 infection and severity risk among 4,510 aged adults: a UK Biobank cohort study. medRxiv 2021:2020.06.09.20127092. [PMID: 32577673 PMCID: PMC7302228 DOI: 10.1101/2020.06.09.20127092] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
BACKGROUND Many risk factors have emerged for novel 2019 coronavirus disease (COVID-19). It is relatively unknown how these factors collectively predict COVID-19 infection risk, as well as risk for a severe infection (i.e., hospitalization). METHODS Among aged adults (69.3 ± 8.6 years) in UK Biobank, COVID-19 data was downloaded for 4,510 participants with 7,539 test cases. We downloaded baseline data from 10-14 years ago, including demographics, biochemistry, body mass, and other factors, as well as antibody titers for 20 common to rare infectious diseases. Permutation-based linear discriminant analysis was used to predict COVID-19 risk and hospitalization risk. Probability and threshold metrics included receiver operating characteristic curves to derive area under the curve (AUC), specificity, sensitivity, and quadratic mean. RESULTS The "best-fit" model for predicting COVID-19 risk achieved excellent discrimination (AUC=0.969, 95% CI=0.934-1.000). Factors included age, immune markers, lipids, and serology titers to common pathogens like human cytomegalovirus. The hospitalization "best-fit" model was more modest (AUC=0.803, 95% CI=0.663-0.943) and included only serology titers. CONCLUSIONS Accurate risk profiles can be created using standard self-report and biomedical data collected in public health and medical settings. It is also worthwhile to further investigate if prior host immunity predicts current host immunity to COVID-19.
Collapse
Affiliation(s)
- Auriel A. Willette
- Department of Food Science and Human Nutrition, Iowa State University, Ames, IA, USA
- Department of Neurology, University of Iowa, Iowa City, IA, USA
- Iowa COVID-19 Tracker, Ames, IA, USA
| | | | - Qian Wang
- Department of Food Science and Human Nutrition, Iowa State University, Ames, IA, USA
| | - Colleen Pappas
- Department of Food Science and Human Nutrition, Iowa State University, Ames, IA, USA
| | - Brandon S. Klinedinst
- Department of Food Science and Human Nutrition, Iowa State University, Ames, IA, USA
| | - Scott Le
- Department of Food Science and Human Nutrition, Iowa State University, Ames, IA, USA
| | - Brittany Larsen
- Department of Food Science and Human Nutrition, Iowa State University, Ames, IA, USA
| | - Amy Pollpeter
- Department of Food Science and Human Nutrition, Iowa State University, Ames, IA, USA
| | - Tianqi Li
- Department of Food Science and Human Nutrition, Iowa State University, Ames, IA, USA
| | - Nicole Brenner
- Infections and Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Tim Waterboer
- Infections and Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| |
Collapse
|
14
|
Abstract
BACKGROUND/AIMS In clinical trials, the primary outcome is often a composite endpoint defined as time to the first occurrence of either death or certain non-fatal events. Thus, a portion of available data would be omitted. In the win ratio approach, priorities are given to the clinically more important events, and more data are used. However, its power may be low if the treatment effect is predominantly on the non-terminal event. METHODS We propose event-specific win ratios obtained separately on the terminal and non-terminal events. They can then be used to form global tests such as a linear combination test, the maximum test, or a χ2 test. RESULTS In simulations, these tests often improve the power of the original win ratio test. Furthermore, when the terminal and non-terminal events experience differential treatment effects, the new tests are often more powerful than the log-rank test for the composite outcome. Whether the treatment effect is primarily on the terminal events or not, the new tests based on the event-specific win ratios can be useful when different types of events are present. The new tests can reject the null hypothesis of no difference in the event distributions in the two treatment arms with the terminal event showing detrimental effect and the non-terminal event showing beneficial effect. The maximum test and the χ2 test do not have test-estimation coherency, but the maximum test has the coherency that the global null is rejected if and only if the null for one of the event types is rejected. When applied to data from the trial Aldosterone Antagonist Therapy for Adults With Heart Failure and Preserved Systolic Function (TOPCAT), the new tests all reject the null hypothesis of no treatment effect while both the log-rank test used in TOPCAT and the original win ratio approach show non-significant p-values. CONCLUSION Whether the treatment effect is primarily on the terminal events or the non-terminal events, the maximum test based on the event-specific win ratios can be a useful alternative for testing treatment effect in clinical trials with time-to-event outcomes when different types of events are present.
Collapse
Affiliation(s)
- Song Yang
- Office of Biostatistics Research, National Heart, Lung, and Blood Institute, Bethesda, MD, USA
| | - James Troendle
- Office of Biostatistics Research, National Heart, Lung, and Blood Institute, Bethesda, MD, USA
| |
Collapse
|
15
|
Porfiri M, Ruiz Marín M. An information-theoretic approach to study spatial dependencies in small datasets. Proc Math Phys Eng Sci 2020; 476:20200113. [PMID: 33223927 PMCID: PMC7655761 DOI: 10.1098/rspa.2020.0113] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Accepted: 09/25/2020] [Indexed: 03/31/2024] Open
Abstract
From epidemiology to economics, there is a fundamental need of statistically principled approaches to unveil spatial patterns and identify their underpinning mechanisms. Grounded in network and information theory, we establish a non-parametric scheme to study spatial associations from limited measurements of a spatial process. Through the lens of network theory, we relate spatial patterning in the dataset to the topology of a network on which the process unfolds. From the available observations of the spatial process and a candidate network topology, we compute a mutual information statistic that measures the extent to which the measurement at a node is explained by observations at neighbouring nodes. For a class of networks and linear autoregressive processes, we establish closed-form expressions for the mutual information statistic in terms of network topological features. We demonstrate the feasibility of the approach on synthetic datasets comprising 25-100 measurements, generated by linear or nonlinear autoregressive processes. Upon validation on synthetic processes, we examine datasets of human migration under climate change in Bangladesh and motor vehicle deaths in the United States of America. For both these real datasets, our approach is successful in identifying meaningful spatial patterns, begetting statistically-principled insight into the mechanisms of important socioeconomic problems.
Collapse
Affiliation(s)
- Maurizio Porfiri
- Department of Mechanical and Aerospace Engineering, New York University Tandon School of Engineering, Brooklyn, NY, USA
- Department of Biomedical Engineering, New York University Tandon School of Engineering, Brooklyn, NY, USA
- Department of Quantitative Methods, Law and Modern Languages, Technical University of Cartagena, Cartagena, Murcia, Spain
| | - Manuel Ruiz Marín
- Department of Quantitative Methods, Law and Modern Languages, Technical University of Cartagena, Cartagena, Murcia, Spain
| |
Collapse
|
16
|
Henok JN, Okeleye BI, Omodanisi EI, Ntwampe SKO, Aboua YG. Analysis of Reference Ranges of Total Serum Protein in Namibia: Clinical Implications. Proteomes 2020; 8:7. [PMID: 32326470 PMCID: PMC7356781 DOI: 10.3390/proteomes8020007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 03/22/2020] [Accepted: 03/30/2020] [Indexed: 11/17/2022] Open
Abstract
A reference range is an essential part of clinical laboratory test interpretation and patient care. The levels of total serum protein (TSP) are measured in sera to assess nutritional, liver, and kidney disorders. This study determined the TSP reference range with respect to gender, age, and region in Namibia. A retrospective cross-sectional study was conducted to determine the TSP reference range among 78,477 healthy participants within the ages of less than one year to more than 65 yrs in 14 regions of Namibia. The reference range of TSP was 51-91 g/L for females and 51-92 g/L for males. A reduced TSP range of 48.00-85.55 g/L (2.5-97.5 percentiles) was established at <1-5 years and increased towards adolescence. An uttermost range of 54-93 g/L was observed from 36-65 years of age. At the age >65 years; a steady decline in the reference range (51.00-89 g/L) was recorded. An upper TSP range of 53-92 g/L (2.5-97.5 percentiles) was detected in Erongo, Zambezi, Hardap, Kavango East, and a comparable trend was also seen in Omusati with a 54-91 g/L range. Meanwhile; a reduced TSP range of 50-89 g/L was identified in Ohangwena. This study showed that gender, age, and geographical location can impact TSP levels with a significant clinical difference (p < 0.05) between each category.
Collapse
Affiliation(s)
- Josephine N. Henok
- Department of Health Sciences, Faculty of Health and Applied Sciences, Namibia University of Science and Technology (NUST), Private Bag 13388, Windhoek, Namibia; (J.N.H.); (Y.G.A.)
| | - Benjamin I. Okeleye
- Bioresource Engineering Research Group (BioERG), Department of Biotechnology, Faculty of Applied Sciences, Cape Peninsula University of Technology, P.O. Box 652, Cape Town 8000, South Africa; (E.I.O.); (S.K.O.N.)
| | - Elizabeth I. Omodanisi
- Bioresource Engineering Research Group (BioERG), Department of Biotechnology, Faculty of Applied Sciences, Cape Peninsula University of Technology, P.O. Box 652, Cape Town 8000, South Africa; (E.I.O.); (S.K.O.N.)
| | - Seteno K. O. Ntwampe
- Bioresource Engineering Research Group (BioERG), Department of Biotechnology, Faculty of Applied Sciences, Cape Peninsula University of Technology, P.O. Box 652, Cape Town 8000, South Africa; (E.I.O.); (S.K.O.N.)
- School of Chemical and Minerals Engineering, North-West University, Private Bag X1290, Potchefstroom 2520, South Africa
| | - Yapo G. Aboua
- Department of Health Sciences, Faculty of Health and Applied Sciences, Namibia University of Science and Technology (NUST), Private Bag 13388, Windhoek, Namibia; (J.N.H.); (Y.G.A.)
- Bioresource Engineering Research Group (BioERG), Department of Biotechnology, Faculty of Applied Sciences, Cape Peninsula University of Technology, P.O. Box 652, Cape Town 8000, South Africa; (E.I.O.); (S.K.O.N.)
| |
Collapse
|
17
|
Mollan KR, Trumble IM, Reifeis SA, Ferrer O, Bay CP, Baldoni PL, Hudgens MG. Precise and accurate power of the rank-sum test for a continuous outcome. J Biopharm Stat 2020; 30:639-648. [PMID: 32126888 DOI: 10.1080/10543406.2020.1730866] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Accurate power calculations are essential in small studies containing expensive experimental units or high-stakes exposures. Herein, power of the Wilcoxon Mann-Whitney rank-sum test of a continuous outcome is formulated using a Monte Carlo approach and defining [Formula: see text] as a measure of effect size, where [Formula: see text] and [Formula: see text] denote random observations from two distributions hypothesized to be equal under the null. Effect size [Formula: see text] fosters productive communications because researchers understand [Formula: see text] is analogous to a fair coin toss, and [Formula: see text] near 0 or 1 represents a large effect. This approach is feasible even without background data. Simulations were conducted comparing the empirical power approach to existing approaches by Rosner & Glynn, Shieh and colleagues, Noether, and O'Brien-Castelloe. Approximations by Noether and O'Brien-Castelloe are shown to be inaccurate for small sample sizes. The Rosner & Glynn and Shieh, Jan & Randles approaches performed well in many small sample scenarios, though both are restricted to location-shift alternatives and neither approach is theoretically justified for small samples. The empirical method is recommended and available in the R package wmwpow.
Collapse
Affiliation(s)
- Katie R Mollan
- Department of Biostatistics and Center for AIDS Research, The University of North Carolina , Chapel Hill, North Carolina, USA
| | - Ilana M Trumble
- Department of Biostatistics and Center for AIDS Research, The University of North Carolina , Chapel Hill, North Carolina, USA
| | - Sarah A Reifeis
- Department of Biostatistics and Center for AIDS Research, The University of North Carolina , Chapel Hill, North Carolina, USA
| | - Orlando Ferrer
- Department of Biostatistics and Center for AIDS Research, The University of North Carolina , Chapel Hill, North Carolina, USA
| | - Camden P Bay
- Department of Biostatistics and Center for AIDS Research, The University of North Carolina , Chapel Hill, North Carolina, USA
| | - Pedro L Baldoni
- Department of Biostatistics and Center for AIDS Research, The University of North Carolina , Chapel Hill, North Carolina, USA
| | - Michael G Hudgens
- Department of Biostatistics and Center for AIDS Research, The University of North Carolina , Chapel Hill, North Carolina, USA
| |
Collapse
|
18
|
Filipe JAN, Kyriazakis I. Bayesian, Likelihood-Free Modelling of Phenotypic Plasticity and Variability in Individuals and Populations. Front Genet 2019; 10:727. [PMID: 31616460 PMCID: PMC6764410 DOI: 10.3389/fgene.2019.00727] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Accepted: 07/11/2019] [Indexed: 12/17/2022] Open
Abstract
There is a paradigm shift from the traditional focus on the “average” individual towards the definition and analysis of trait variation within individual life-history and among individuals in populations. This is a result of increasing availability of individual phenotypic data. The shift allows the use of genetic and environment-driven variations to assess robustness to challenge, gain greater understanding of organismal biological processes, or deliver individual-targeted treatments or genetic selection. These consequences apply, in particular, to variation in ontogenetic growth. We propose an approach to parameterise mathematical models of individual traits (e.g., reaction norms, growth curves) that address two challenges: 1) Estimation of individual traits while making minimal assumptions about data distribution and correlation, addressed via Approximate Bayesian Computation (a form of nonparametric inference). We are motivated by the fact that available information on distribution of biological data is often less precise than assumed by conventional likelihood functions. 2) Scaling-up to population phenotype distributions while facilitating unbiased use of individual data; this is addressed via a probabilistic framework where population distributions build on separately-inferred individual distributions and individual-trait interpretability is preserved. The approach is tested against Bayesian likelihood-based inference, by fitting weight and energy intake growth models to animal data and normal- and skewed-distributed simulated data. i) Individual inferences were accurate and robust to changes in data distribution and sample size; in particular, median-based predictions were more robust than maximum- likelihood-based curves. These results suggest that the approach gives reliable inferences using few observations and monitoring resources. ii) At the population level, each individual contributed via a specific data distribution, and population phenotype estimates were not disproportionally influenced by outlier individuals. Indices measuring population phenotype variation can be derived for study comparisons. The approach offers an alternative for estimating trait variability in biological systems that may be reliable for various applications, for example, in genetics, health, and individualised nutrition, while using fewer assumptions and fewer empirical observations. In livestock breeding, the potentially greater accuracy of trait estimation (without specification of multitrait variance-covariance parameters) could lead to improved selection and to more decisive estimates of trait heritability.
Collapse
Affiliation(s)
- Joao A N Filipe
- Agriculture, School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Ilias Kyriazakis
- Agriculture, School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom
| |
Collapse
|
19
|
Johansen MB, Christensen PA. A simple transformation independent method for outlier definition. Clin Chem Lab Med 2019; 56:1524-1532. [PMID: 29634477 DOI: 10.1515/cclm-2018-0025] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2018] [Accepted: 02/22/2018] [Indexed: 11/15/2022]
Abstract
BACKGROUND Definition and elimination of outliers is a key element for medical laboratories establishing or verifying reference intervals (RIs). Especially as inclusion of just a few outlying observations may seriously affect the determination of the reference limits. Many methods have been developed for definition of outliers. Several of these methods are developed for the normal distribution and often data require transformation before outlier elimination. METHODS We have developed a non-parametric transformation independent outlier definition. The new method relies on drawing reproducible histograms. This is done by using defined bin sizes above and below the median. The method is compared to the method recommended by CLSI/IFCC, which uses Box-Cox transformation (BCT) and Tukey's fences for outlier definition. The comparison is done on eight simulated distributions and an indirect clinical datasets. RESULTS The comparison on simulated distributions shows that without outliers added the recommended method in general defines fewer outliers. However, when outliers are added on one side the proposed method often produces better results. With outliers on both sides the methods are equally good. Furthermore, it is found that the presence of outliers affects the BCT, and subsequently affects the determined limits of current recommended methods. This is especially seen in skewed distributions. The proposed outlier definition reproduced current RI limits on clinical data containing outliers. CONCLUSIONS We find our simple transformation independent outlier detection method as good as or better than the currently recommended methods.
Collapse
Affiliation(s)
| | - Peter Astrup Christensen
- Department of Clinical Biochemistry, Aalborg University Hospital, Hobrovej 18-22, 9000 Aalborg, Denmark, Phone: +45 97649000
| |
Collapse
|
20
|
Allemann SS, Dediu D, Dima AL. Beyond Adherence Thresholds: A Simulation Study of the Optimal Classification of Longitudinal Adherence Trajectories From Medication Refill Histories. Front Pharmacol 2019; 10:383. [PMID: 31105559 PMCID: PMC6499004 DOI: 10.3389/fphar.2019.00383] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Accepted: 03/27/2019] [Indexed: 11/13/2022] Open
Abstract
Background: The description of adherence based on medication refill histories relies on the estimation of continuous medication availability (CMA) during an observation period. Thresholds to distinguish adherence from non-adherence typically refer to an aggregated value across the entire observation period, disregarding differences in adherence over time. Sliding windows to divide the observation period into smaller portions, estimating adherence for these increments, and classify individuals with similar trajectories into clusters can retain this temporal information. Optimal methods to estimate adherence trajectories to identify underlying patterns have not yet been established. This simulation study aimed to provide guidance for future studies by analyzing the effect of different longitudinal adherence estimates, sliding window parameters, and sample characteristics on the performance of a longitudinal clustering algorithm. Methods: We generated samples of 250–25,000 individuals with one of six longitudinal refill patterns over a 2-year period. We used two longitudinal CMA estimates (LCMA1 and LCMA2) and their dichotomized variants (with a threshold of 80%) to create adherence trajectories. LCMA1 assumes full adherence until the supply ends while LCMA2 assumes constant adherence between refills. We assessed scenarios with different LCMA estimates and sliding window parameters for 350 independent samples. Individual trajectories were clustered with kml, an implementation of k-means for longitudinal data in R. We compared performance between the four LCMA estimates using the adjusted Rand Index (cARI). Results: Cluster analysis with LCMA2 outperformed other estimates in overall performance, correct identification of groups, and classification accuracy, irrespective of sliding window parameters. Pairwise comparison between LCMA estimates showed a relative cARI-advantage of 0.12–0.22 (p < 0.001) for LCMA2. Sample size did not affect overall performance. Conclusion: The choice of LCMA estimate and sliding window parameters has a major impact on the performance of a clustering algorithm to identify distinct longitudinal adherence trajectories. We recommend (a) to assume constant adherence between refills, (b) to avoid dichotomization based on a threshold, and (c) to explore optimal sliding windows parameters in simulation studies or selecting shorter non-overlapping windows for the identification of different adherence patterns from medication refill data.
Collapse
Affiliation(s)
- Samuel S Allemann
- Health Services and Performance Research (HESPER EA 7425), University Claude Bernard Lyon 1, Lyon, France.,Pharmaceutical Care Research Group, University of Basel, Basel, Switzerland
| | - Dan Dediu
- Collegium de Lyon, Institut d'Études Avancées, Lyon, France.,Laboratoire Dynamique Du Langage UMR 5596, Université Lumière Lyon 2, Lyon, France
| | - Alexandra Lelia Dima
- Health Services and Performance Research (HESPER EA 7425), University Claude Bernard Lyon 1, Lyon, France
| |
Collapse
|
21
|
Yu K, Canalias F, Solà-Oriol D, Arroyo L, Pato R, Saco Y, Terré M, Bassols A. Age-Related Serum Biochemical Reference Intervals Established for Unweaned Calves and Piglets in the Post-weaning Period. Front Vet Sci 2019; 6:123. [PMID: 31069239 PMCID: PMC6491529 DOI: 10.3389/fvets.2019.00123] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Accepted: 04/02/2019] [Indexed: 11/13/2022] Open
Abstract
The purpose of the present study is to establish the influence of age on serum biochemistry reference intervals (RIs) for unweaned calves and recently-weaned piglets using large number of animals sampled at different ages from populations under different season trials. Specifically, milk replacer (MR)-fed calves from April-July 2017 (n = 60); from December 2016-March 2017 (n = 76) and from April-August 2018 (n = 57) and one group of healthy weaned piglets (n = 72) were subjected to the study. Serum enzymes and metabolites of calves at age of 24 h (24 h after colostrum intake), 2, 5, and 7 weeks from merged trials and piglets at 0, 7, and 14 days post-weaning (at 21, 28, and 35 days of age) were studied. The main variable is age whereas no major trial- or sex-biased differences were noticed. In calves, ALT, AST, GGT, GPx, SOD, NEFAs, triglycerides, glucose, creatinine, total protein, and urea were greatly elevated (p < 0.001) at 24 h compared with other ages; glucose, creatinine, total protein, and urea constantly decreased through the age; cholesterol's lowest level (p < 0.001) was found in 24 h compared with other ages and the levels of haptoglobin remained unchanged (p > 0.1) during the study. In comparison with the adult RIs, creatinine from 24 h, NEFAs from 2 w, GGT from 5 w, and urea from 7 w are fully comparable with RIs or lie within RIs determined for adult. In piglets, no changes were noticed on glucose (p > 0.1) and haptoglobin (p > 0.1) and there were no major changes on hepatic enzymes (ALT, AST, and GGT), total protein, creatinine and urea even though several statistical differences were noticed on 7 days post-weaning. Cholesterol, triglycerides, NEFAs, cortisol and PigMAP were found increased (p < 0.05) while TNF-alpha was found less concentrated (p < 0.001) at 0 days post-weaning compared with other times. Moreover, the RIs of creatinine and GGT are fully comparable with RIs or lie within RIs determined for adult. In conclusion, clinical biochemistry analytes RIs were established for unweaned calves and recently-weaned piglets and among them some can vary at different ages.
Collapse
Affiliation(s)
- Kuai Yu
- Departament de Bioquímica i Biologia Molecular, Facultat de Veterinària, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Francesca Canalias
- Departament de Bioquímica i Biologia Molecular, Facultat de Medicina, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - David Solà-Oriol
- Animal Nutrition and Welfare Service, Animal and Food Science Department, Facultat de Veterinària, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Laura Arroyo
- Departament de Bioquímica i Biologia Molecular, Facultat de Veterinària, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Raquel Pato
- Servei de Bioquímica Clínica Veterinària, Facultat de Veterinària, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Yolanda Saco
- Servei de Bioquímica Clínica Veterinària, Facultat de Veterinària, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Marta Terré
- Departament de Producció de Remugants, Institut de Recerca i Tecnologia Agroalimentàries, Barcelona, Spain
| | - Anna Bassols
- Departament de Bioquímica i Biologia Molecular, Facultat de Veterinària, Universitat Autònoma de Barcelona, Barcelona, Spain.,Servei de Bioquímica Clínica Veterinària, Facultat de Veterinària, Universitat Autònoma de Barcelona, Barcelona, Spain
| |
Collapse
|
22
|
Meredith CS, Trebitz AS, Hoffman JC. Resolving taxonomic ambiguities: effects on rarity, projected richness, and indices in macroinvertebrate datasets. Ecol Indic 2019; 98:137-148. [PMID: 31178665 PMCID: PMC6550328 DOI: 10.1016/j.ecolind.2018.10.047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Biodiversity information is an important basis for ecological research and environmental assessment, and can be impacted by choices made in the manipulation and analysis of taxonomic data. Such choices include methods for resolving multiple redundant levels of taxonomic resolution, as typically arise with morphological identification of damaged or immature aquatic macro-invertebrates. In particular, the effects of these processing choices on number of rare taxa are poorly understood yet potentially significant to the estimation of projected taxa richness and related evaluations such as biodiversity conservation value and survey sufficiency. Using aquatic macroinvertebrate data collected for two nearshore areas of Lake Superior, we determined how multiple methods of resolving taxonomic redundancies influence two commonly-used estimates of projected richness, Chao1 and Chao2, which hinge on the ratio of taxa that are singletons to doubletons (i.e., just one versus two individuals found) or uniques versus duplicates (i.e., just one versus two occurrences). We also determined how choice of ambiguous taxa method, including some modified specifically to retain rare taxa and others taken from the literature, influenced effort to reach 95% of projected richness, site-level richness and abundance, and representative invertebrate IBI scores. We found that Chao1 was more sensitive to method choice than Chao2, because singleton and doubleton status was more frequently affected when taxa were deleted, merged, or re-assigned in the process of resolving taxonomic redundancies than was unique and duplicate status. Methods that eliminated redundant taxa at the site scale but not the study-area scale tended to overinflate study area and projected richness, and resulted in a significant loss of abundance. The method that aggregated or deleted redundant taxa depending on abundance resulted in a decrease in site and study area richness, abundance, and an underestimation of projected richness. Methods which re-assigned parents to common children retained a majority of richness and abundance information and a more realistic estimate of projected taxa richness; however, the identity of poorly-identified parents was imputed. All methods resulted in little effect to typical IBI scores. Overall, no one method is fully capable of removing spurious richness at the study-area scale while preserving all taxa occurrence, abundance and rarity patterns. Therefore, the most appropriate method for making comparisons among sites may be different than the most appropriate method for comparing among surveys or among study areas, or if a goal is to estimate projected taxa richness or retain rare taxa information.
Collapse
Affiliation(s)
- Christy S Meredith
- National Research Council fellow, Environmental Protection Agency, Office of Research and Development, Mid-Continent Ecology Division, 6201 Congdon Blvd, Duluth, Minnesota 55804 USA
| | - Anett S Trebitz
- U. S. Environmental Protection Agency, Office of Research and Development, Mid-Continent Ecology Division, 6201 Congdon Blvd, Duluth, Minnesota 55804 USA
| | - Joel C Hoffman
- U. S. Environmental Protection Agency, Office of Research and Development, Mid-Continent Ecology Division, 6201 Congdon Blvd, Duluth, Minnesota 55804 USA
| |
Collapse
|
23
|
Heuser A, Huynh M, Chang JC. Asymptotic convergence in distribution of the area bounded by prevalence-weighted Kaplan-Meier curves using empirical process modelling. R Soc Open Sci 2018; 5:180496. [PMID: 30564383 PMCID: PMC6281901 DOI: 10.1098/rsos.180496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2018] [Accepted: 10/17/2018] [Indexed: 06/09/2023]
Abstract
The Kaplan-Meier product-limit estimator is a simple and powerful tool in time to event analysis. An extension exists for populations stratified into cohorts where a population survival curve is generated by weighted averaging of cohort-level survival curves. For making population-level comparisons using this statistic, we analyse the statistics of the area between two such weighted survival curves. We derive the large sample behaviour of this statistic based on an empirical process of product-limit estimators. This estimator was used by an interdisciplinary National Institutes of Health-Social Security Administration team in the identification of medical conditions to prioritize for adjudication in disability benefits processing.
Collapse
Affiliation(s)
- Aaron Heuser
- Impaq International LLC, Washington, DC 20005, USA
| | - Minh Huynh
- Impaq International LLC, Washington, DC 20005, USA
| | - Joshua C. Chang
- Epidemiology and Biostatistics Section, Rehabilitation Medicine Department, The National Institutes of Health Clinical Center, Bethesda, MD 20892, USA
| |
Collapse
|
24
|
Zhang H, Tang L, Kong Y, Chen T, Liu X, Zhang Z, Zhang B. Distribution-free models for latent mixed population responses in a longitudinal setting with missing data. Stat Methods Med Res 2018; 28:3273-3285. [PMID: 30246608 DOI: 10.1177/0962280218801123] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Many biomedical and psychosocial studies involve population mixtures, which consist of multiple latent subpopulations. Because group membership cannot be observed, standard methods do not apply when differential treatment effects need to be studied across subgroups. We consider a two-group mixture in which membership of latent subgroups is determined by structural zeroes of a zero-inflated count variable and propose a new approach to model treatment differences between latent subgroups in a longitudinal setting. It has also been incorporated with the inverse probability weighted method to address data missingness. As the approach builds on the distribution-free functional response models, it requires no parametric distribution model and thereby provides a robust inference. We illustrate the approach with both real and simulated data.
Collapse
Affiliation(s)
- Hui Zhang
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Li Tang
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Yuanyuan Kong
- Liver Research Center, Beijing Key Laboratory of Translational Medicine in Liver Cirrhosis, National Clinical Research Center of Digestive Diseases, Beijing Friendship Hospital, Capital Medical University, Beijing, China
| | - Tian Chen
- Department of Mathematics and Statistics, University of Toledo, Toledo, OH, USA
| | - Xueyan Liu
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Zhiwei Zhang
- Department of Statistics, University of California, Riverside, CA, USA
| | - Bo Zhang
- Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA, USA
| |
Collapse
|
25
|
Gress TW, Denvir J, Shapiro JI. Effect of removing outliers on statistical inference: implications to interpretation of experimental data in medical research. Marshall J Med 2018; 4. [PMID: 32923665 DOI: 10.18590/mjm.2018.vol4.iss2.9] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Background Data editing with elimination of "outliers" is commonly performed in the biomedical sciences. The effects of this type of data editing could influence study results, and with the vast and expanding amount of research in medicine, these effects would be magnified. Methods and Results We first performed an anonymous survey of medical school faculty at institutions across the United States and found that indeed some form of outlier exclusion was performed by a large percentage of the respondents to the survey. We next performed Monte Carlo simulations of excluding high and low values from samplings from the same normal distribution. We found that removal of one pair of "outliers", specifically removal of the high and low values of the two samplings, respectively, had measurable effects on the type I error as the sample size was increased into the thousands. We developed an adjustment to the t score that accounts for the anticipated alteration of the type I error (tadj=tobs-2(log(n)^0.5/n^0.5)), and propose that this be used when outliers are eliminated prior to parametric analysis. Conclusion Data editing with elimination of outliers that includes removal of high and low values from two samples, respectively, can have significant effects on the occurrence of type 1 error. This type of data editing could have profound effects in high volume research fields, particularly in medicine, and we recommend an adjustment to the t score be used to reduce the potential for error.
Collapse
|
26
|
Mircioiu C, Atkinson J. A Comparison of Parametric and Non-Parametric Methods Applied to a Likert Scale. Pharmacy (Basel) 2017; 5:E26. [PMID: 28970438 PMCID: PMC5597151 DOI: 10.3390/pharmacy5020026] [Citation(s) in RCA: 75] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2017] [Revised: 05/08/2017] [Accepted: 05/08/2017] [Indexed: 12/04/2022] Open
Abstract
A trenchant and passionate dispute over the use of parametric versus non-parametric methods for the analysis of Likert scale ordinal data has raged for the past eight decades. The answer is not a simple "yes" or "no" but is related to hypotheses, objectives, risks, and paradigms. In this paper, we took a pragmatic approach. We applied both types of methods to the analysis of actual Likert data on responses from different professional subgroups of European pharmacists regarding competencies for practice. Results obtained show that with "large" (>15) numbers of responses and similar (but clearly not normal) distributions from different subgroups, parametric and non-parametric analyses give in almost all cases the same significant or non-significant results for inter-subgroup comparisons. Parametric methods were more discriminant in the cases of non-similar conclusions. Considering that the largest differences in opinions occurred in the upper part of the 4-point Likert scale (ranks 3 "very important" and 4 "essential"), a "score analysis" based on this part of the data was undertaken. This transformation of the ordinal Likert data into binary scores produced a graphical representation that was visually easier to understand as differences were accentuated. In conclusion, in this case of Likert ordinal data with high response rates, restraining the analysis to non-parametric methods leads to a loss of information. The addition of parametric methods, graphical analysis, analysis of subsets, and transformation of data leads to more in-depth analyses.
Collapse
Affiliation(s)
- Constantin Mircioiu
- Pharmacy Faculty, University of Medicine and Pharmacy "Carol Davila" Bucharest, Dionisie Lupu 37, Bucharest 020021, Romania.
| | - Jeffrey Atkinson
- Pharmacolor Consultants Nancy, 12 rue de Versigny, Villers 54600, France.
| |
Collapse
|
27
|
Shikha M, Kanika A, Rao AR, Mallikarjuna MG, Gupta HS, Nepolean T. Genomic Selection for Drought Tolerance Using Genome-Wide SNPs in Maize. Front Plant Sci 2017; 8:550. [PMID: 28484471 PMCID: PMC5399777 DOI: 10.3389/fpls.2017.00550] [Citation(s) in RCA: 72] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Accepted: 03/27/2017] [Indexed: 05/05/2023]
Abstract
Traditional breeding strategies for selecting superior genotypes depending on phenotypic traits have proven to be of limited success, as this direct selection is hindered by low heritability, genetic interactions such as epistasis, environmental-genotype interactions, and polygenic effects. With the advent of new genomic tools, breeders have paved a way for selecting superior breeds. Genomic selection (GS) has emerged as one of the most important approaches for predicting genotype performance. Here, we tested the breeding values of 240 maize subtropical lines phenotyped for drought at different environments using 29,619 cured SNPs. Prediction accuracies of seven genomic selection models (ridge regression, LASSO, elastic net, random forest, reproducing kernel Hilbert space, Bayes A and Bayes B) were tested for their agronomic traits. Though prediction accuracies of Bayes B, Bayes A and RKHS were comparable, Bayes B outperformed the other models by predicting highest Pearson correlation coefficient in all three environments. From Bayes B, a set of the top 1053 significant SNPs with higher marker effects was selected across all datasets to validate the genes and QTLs. Out of these 1053 SNPs, 77 SNPs associated with 10 drought-responsive transcription factors. These transcription factors were associated with different physiological and molecular functions (stomatal closure, root development, hormonal signaling and photosynthesis). Of several models, Bayes B has been shown to have the highest level of prediction accuracy for our data sets. Our experiments also highlighted several SNPs based on their performance and relative importance to drought tolerance. The result of our experiments is important for the selection of superior genotypes and candidate genes for breeding drought-tolerant maize hybrids.
Collapse
Affiliation(s)
- Mittal Shikha
- Division of Genetics, ICAR-Indian Agricultural Research InstituteNew Delhi, India
| | - Arora Kanika
- Division of Genetics, ICAR-Indian Agricultural Research InstituteNew Delhi, India
| | - Atmakuri Ramakrishna Rao
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research InstituteNew Delhi, India
| | | | - Hari Shanker Gupta
- Division of Genetics, ICAR-Indian Agricultural Research InstituteNew Delhi, India
- Office of Director General, Borlaug Institute for South AsiaNew Delhi, India
| | | |
Collapse
|
28
|
Abstract
In this paper, we propose an application of non-parametric Bayesian (NPB) models to classification of fetal heart rate recordings. More specifically, the models are used to discriminate between fetal heart rate recordings that belong to fetuses that may have adverse asphyxia outcomes and those that are considered normal. In our work we rely on models based on hierarchical Dirichlet processes. Two mixture models were inferred from recordings that represent healthy and unhealthy fetuses, respectively. The models were then used to classify new recordings. We compared the classification performance of the NPB models with that of support vector machines on real data and concluded that the NPB models achieved better performance.
Collapse
Affiliation(s)
- Kezi Yu
- Department of Electrical and Computer Engineering, Stony Brook University, Stony Brook, NY 11794, USA
| | - J Gerald Quirk
- Department of Obstetrics/Gynecology, Stony Brook University Hospital, Stony Brook University, Stony Brook, NY 11794, USA
| | - Petar M Djurić
- Department of Electrical and Computer Engineering, Stony Brook University, Stony Brook, NY 11794, USA
| |
Collapse
|
29
|
Shikha M, Kanika A, Rao AR, Mallikarjuna MG, Gupta HS, Nepolean T. Genomic Selection for Drought Tolerance Using Genome-Wide SNPs in Maize. Front Plant Sci 2017. [PMID: 28484471 DOI: 10.3385/fpls.2017.00550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Traditional breeding strategies for selecting superior genotypes depending on phenotypic traits have proven to be of limited success, as this direct selection is hindered by low heritability, genetic interactions such as epistasis, environmental-genotype interactions, and polygenic effects. With the advent of new genomic tools, breeders have paved a way for selecting superior breeds. Genomic selection (GS) has emerged as one of the most important approaches for predicting genotype performance. Here, we tested the breeding values of 240 maize subtropical lines phenotyped for drought at different environments using 29,619 cured SNPs. Prediction accuracies of seven genomic selection models (ridge regression, LASSO, elastic net, random forest, reproducing kernel Hilbert space, Bayes A and Bayes B) were tested for their agronomic traits. Though prediction accuracies of Bayes B, Bayes A and RKHS were comparable, Bayes B outperformed the other models by predicting highest Pearson correlation coefficient in all three environments. From Bayes B, a set of the top 1053 significant SNPs with higher marker effects was selected across all datasets to validate the genes and QTLs. Out of these 1053 SNPs, 77 SNPs associated with 10 drought-responsive transcription factors. These transcription factors were associated with different physiological and molecular functions (stomatal closure, root development, hormonal signaling and photosynthesis). Of several models, Bayes B has been shown to have the highest level of prediction accuracy for our data sets. Our experiments also highlighted several SNPs based on their performance and relative importance to drought tolerance. The result of our experiments is important for the selection of superior genotypes and candidate genes for breeding drought-tolerant maize hybrids.
Collapse
Affiliation(s)
- Mittal Shikha
- Division of Genetics, ICAR-Indian Agricultural Research InstituteNew Delhi, India
| | - Arora Kanika
- Division of Genetics, ICAR-Indian Agricultural Research InstituteNew Delhi, India
| | - Atmakuri Ramakrishna Rao
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research InstituteNew Delhi, India
| | | | - Hari Shanker Gupta
- Division of Genetics, ICAR-Indian Agricultural Research InstituteNew Delhi, India
- Office of Director General, Borlaug Institute for South AsiaNew Delhi, India
| | | |
Collapse
|
30
|
Jacquin L, Cao TV, Ahmadi N. A Unified and Comprehensible View of Parametric and Kernel Methods for Genomic Prediction with Application to Rice. Front Genet 2016; 7:145. [PMID: 27555865 PMCID: PMC4977290 DOI: 10.3389/fgene.2016.00145] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Accepted: 07/26/2016] [Indexed: 11/29/2022] Open
Abstract
One objective of this study was to provide readers with a clear and unified understanding of parametric statistical and kernel methods, used for genomic prediction, and to compare some of these in the context of rice breeding for quantitative traits. Furthermore, another objective was to provide a simple and user-friendly R package, named KRMM, which allows users to perform RKHS regression with several kernels. After introducing the concept of regularized empirical risk minimization, the connections between well-known parametric and kernel methods such as Ridge regression [i.e., genomic best linear unbiased predictor (GBLUP)] and reproducing kernel Hilbert space (RKHS) regression were reviewed. Ridge regression was then reformulated so as to show and emphasize the advantage of the kernel “trick” concept, exploited by kernel methods in the context of epistatic genetic architectures, over parametric frameworks used by conventional methods. Some parametric and kernel methods; least absolute shrinkage and selection operator (LASSO), GBLUP, support vector machine regression (SVR) and RKHS regression were thereupon compared for their genomic predictive ability in the context of rice breeding using three real data sets. Among the compared methods, RKHS regression and SVR were often the most accurate methods for prediction followed by GBLUP and LASSO. An R function which allows users to perform RR-BLUP of marker effects, GBLUP and RKHS regression, with a Gaussian, Laplacian, polynomial or ANOVA kernel, in a reasonable computation time has been developed. Moreover, a modified version of this function, which allows users to tune kernels for RKHS regression, has also been developed and parallelized for HPC Linux clusters. The corresponding KRMM package and all scripts have been made publicly available.
Collapse
Affiliation(s)
- Laval Jacquin
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement, BIOS, UMR AGAP Montpellier, France
| | - Tuong-Vi Cao
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement, BIOS, UMR AGAP Montpellier, France
| | - Nourollah Ahmadi
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement, BIOS, UMR AGAP Montpellier, France
| |
Collapse
|
31
|
Clark JE, Osborne JW, Gallagher P, Watson S. A simple method for optimising transformation of non-parametric data: an illustration by reference to cortisol assays. Hum Psychopharmacol 2016; 31:259-67. [PMID: 27230811 DOI: 10.1002/hup.2528] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/07/2015] [Revised: 01/25/2016] [Accepted: 02/04/2016] [Indexed: 11/10/2022]
Abstract
Neuroendocrine data are typically positively skewed and rarely conform to the expectations of a Gaussian distribution. This can be a problem when attempting to analyse results within the framework of the general linear model, which relies on assumptions that residuals in the data are normally distributed. One frequently used method for handling violations of this assumption is to transform variables to bring residuals into closer alignment with assumptions (as residuals are not directly manipulated). This is often attempted through ad hoc traditional transformations such as square root, log and inverse. However, Box and Cox (Box & Cox, ) observed that these are all special cases of power transformations and proposed a more flexible method of transformation for researchers to optimise alignment with assumptions. The goal of this paper is to demonstrate the benefits of the infinitely flexible Box-Cox transformation on neuroendocrine data using syntax in spss. When applied to positively skewed data typical of neuroendocrine data, the majority (~2/3) of cases were brought into strict alignment with Gaussian distribution (i.e. a non-significant Shapiro-Wilks test). Those unable to meet this challenge showed substantial improvement in distributional properties. The biggest challenge was distributions with a high ratio of kurtosis to skewness. We discuss how these cases might be handled, and we highlight some of the broader issues associated with transformation. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- James E Clark
- The Institute of Neuroscience, Newcastle University, Newcastle, UK
| | - Jason W Osborne
- The Graduate School and Department of Mathematical Sciences, Clemson University, Clemson, SC, USA
| | - Peter Gallagher
- The Institute of Neuroscience, Newcastle University, Newcastle, UK
| | - Stuart Watson
- The Institute of Neuroscience, Newcastle University, Newcastle, UK.,Northumberland Tyne and Wear NHS Mental Health Trust
| |
Collapse
|
32
|
Abstract
In spite of substantial work and recent progress, a global and fully resolved picture of the macroevolutionary history of eukaryotes is still under construction. This concerns not only the phylogenetic relations among major groups, but also the general characteristics of the underlying macroevolutionary processes, including the patterns of gene family evolution associated with endosymbioses, as well as their impact on the sequence evolutionary process. All these questions raise formidable methodological challenges, calling for a more powerful statistical paradigm. In this direction, model-based probabilistic approaches have played an increasingly important role. In particular, improved models of sequence evolution accounting for heterogeneities across sites and across lineages have led to significant, although insufficient, improvement in phylogenetic accuracy. More recently, one main trend has been to move away from simple parametric models and stepwise approaches, towards integrative models explicitly considering the intricate interplay between multiple levels of macroevolutionary processes. Such integrative models are in their infancy, and their application to the phylogeny of eukaryotes still requires substantial improvement of the underlying models, as well as additional computational developments.
Collapse
Affiliation(s)
- Nicolas Lartillot
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université Claude Bernard Lyon 1, F-69622 Villeurbanne Cedex, France
| |
Collapse
|
33
|
Woillard JB, Lebreton V, Neely M, Turlure P, Girault S, Debord J, Marquet P, Saint-Marcoux F. Pharmacokinetic tools for the dose adjustment of ciclosporin in haematopoietic stem cell transplant patients. Br J Clin Pharmacol 2015; 78:836-46. [PMID: 24698009 DOI: 10.1111/bcp.12394] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2014] [Accepted: 03/25/2014] [Indexed: 01/15/2023] Open
Abstract
AIMS Ciclosporin A (CsA) is used in the prophylaxis and treatment of acute and chronic graft vs. host disease after haematopoietic stem cell (HSCT) transplantation. Our objective was to build and compare three independent Bayesian estimators of CsA area under the curve (AUC) using a limited sampling strategy (LSS), to assist in dose adjustment. METHODS The Bayesian estimators were developed using in parallel: two independent parametric modelling approaches (nonmem® and iterative two stage (ITS) Bayesian modelling) and the non-parametric adaptive grid method (Pmetrics®). Seventy-two full pharmacokinetic profiles (at pre-dose and 0.33, 0.66, 1, 2, 3, 4, 6, 8 and 12h after dosing) collected from 40 HSCT patients given CsA were used to build the pharmacokinetic models, while 15 other profiles (n = 7) were kept for validation. For each Bayesian estimator, AUCs estimated using the full profiles were compared with AUCs estimated using three samples. RESULTS The pharmacokinetic profiles were well fitted using a two compartment model with first order elimination, combined with a gamma function for the absorption phase with ITS and Pmetrics or an Erlang distribution with nonmem. The derived Bayesian estimators based on a C0-C1 h-C4 h sampling schedule (best LSS) accurately estimated CsA AUC(0,12 h) in the validation group (n = 15; nonmem: bias (mean ± SD)/RMSE 2.05% ± 13.31%/13.02%; ITS: 4.61% ± 10.56%/11.20%; Pmetrics: 0.30% ± 10.12%/10.47%). The dose chosen confronting the three results led to a pertinent dose proposal. CONCLUSIONS The developed Bayesian estimators were all able to predict ciclosporin AUC(0,12 h) in HSCT patients using only three blood with minimal bias and may be combined to increase the reliability of CsA dose adjustment in routine.
Collapse
Affiliation(s)
- Jean-Baptiste Woillard
- Service de Pharmacologie, Toxicologie et Pharmacovigilance, CHU Limoges, Limoges, France; INSERM UMR-S850, Univ Limoges, Limoges, France
| | | | | | | | | | | | | | | |
Collapse
|
34
|
Zhang X, Xu J, He J. Assessing non-inferiority with time-to-event data via the method of non-parametric covariance. Stat Methods Med Res 2011; 22:346-60. [PMID: 21705437 DOI: 10.1177/0962280211402261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Non-parametric methods have been well recognised as useful tools for time-to-event (survival) data analysis because they provide valid statistical inference with few assumptions. Tangen and Koch have proposed the use of the method of non-parametric covariance for time-to-event data in a traditional superiority setting. In this article, we extended their method to assess non-inferiority of two treatments. To evaluate this non-parametric method versus the classical semi-parametric Cox proportional hazards regression model, simulations in terms of the Type 1 error rate and power were performed and compared. The results showed that the two methods were generally comparable regarding the Type 1 error rate when adjustment for the covariates correlated with the survival time was made. In the non-inferiority setting, the covariate-adjusted non-parametric analysis was shown to always increase power. However, this was not necessarily the case for the adjusted Cox model where results were inconsistent to those seen in the superiority setting. For illustration, an application of the proposed non-parametric method to a trial involving pemetrexed, a recently approved drug for first-line treatment of non-small cell lung cancer, is included.
Collapse
Affiliation(s)
- Xinji Zhang
- Department of Health Statistics, Second Military Medical University, Shanghai, China
| | | | | |
Collapse
|
35
|
Abstract
During the 19th century, with the emergence of public health as a goal to improve hygiene and conditions of the poor, statistics established itself as a distinct scientific field important for critically interpreting studies of public health concerns. During the 20th century, statistics began to evolve mathematically and methodologically with hypothesis testing and experimental design. Today, much of medical investigation centers around clinical trials and observational studies, and with the application of statistical formulas, the collected data are summarized, weighed, interpreted, and presented to direct both physicians and the public toward evidence-based medicine. Having a basic understanding of statistics is mandatory in evaluating the validity of published literature and applying it to patient care. In this review, we discuss basic statistical tests to assist the investigator in choosing the correct statistical test and present examples relevant to hand surgery research.
Collapse
Affiliation(s)
- Jae W. Song
- Surgery Research Fellow, Section of Plastic Surgery, Department of Surgery, The University of Michigan Health System; Ann Arbor, MI
| | - Ann Haas
- Research Assistant, Section of Plastic Surgery, Department of Surgery, The University of Michigan Health System; Ann Arbor, MI
| | - Kevin C. Chung
- Professor of Surgery, Section of Plastic Surgery, Assistant Dean for Faculty Affairs, The University of Michigan Medical School
| |
Collapse
|
36
|
Maei HR, Zaslavsky K, Teixeira CM, Frankland PW. What is the Most Sensitive Measure of Water Maze Probe Test Performance? Front Integr Neurosci 2009; 3:4. [PMID: 19404412 PMCID: PMC2659169 DOI: 10.3389/neuro.07.004.2009] [Citation(s) in RCA: 137] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2009] [Accepted: 02/25/2009] [Indexed: 11/26/2022] Open
Abstract
The water maze is commonly used to assay spatial cognition, or, more generally, learning and memory in experimental rodent models. In the water maze, mice or rats are trained to navigate to a platform located below the water's surface. Spatial learning is then typically assessed in a probe test, where the platform is removed from the pool and the mouse or rat is allowed to search for it. Performance in the probe test may then be evaluated using either occupancy-based (percent time in a virtual quadrant [Q] or zone [Z] centered on former platform location), error-based (mean proximity to former platform location [P]) or counting-based (platform crossings [X]) measures. While these measures differ in their popularity, whether they differ in their ability to detect group differences is not known. To address this question we compiled five separate databases, containing more than 1600 mouse probe tests. Random selection of individual trials from respective databases then allowed us to simulate experiments with varying sample and effect sizes. Using this Monte Carlo-based method, we found that the P measure consistently outperformed the Q, Z and X measures in its ability to detect group differences. This was the case regardless of sample or effect size, and using both parametric and non-parametric statistical analyses. The relative superiority of P over other commonly used measures suggests that it is the most appropriate measure to employ in both low- and high-throughput water maze screens.
Collapse
Affiliation(s)
- Hamid R Maei
- Program in Neurosciences and Mental Health, The Hospital for Sick Children Toronto, ON, Canada
| | | | | | | |
Collapse
|