1
|
Mills C, De Ste Croix M, James D, Cooper SM. Development of novel calibration model(s) to predict whole-body density in professional football players. SCI MED FOOTBALL 2024; 8:170-178. [PMID: 36624982 DOI: 10.1080/24733938.2023.2166680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/05/2023] [Indexed: 01/11/2023]
Abstract
INTRODUCTION Questions continue to be raised about the validity that is in existence to estimate Db, in professional male footballer players. METHODS Phase 1: n = 28 anthropometric variables were used on n = 206 footballers, using regression analyses to determine standard error of estimate and R2. A cut-off correlation coefficient set at r = 0.950 and 90% R2. Phase 2: all variables (z-scores, x - = 0.0, SD = ±1.0) to help reduce heteroscedasticity, β, r, t, significance of t and P-values were calculated. Phase 3: a forced stepwise-backwards regression analysis approach with nine predictors which met the acceptance criteria (r = 0.950, R2 = 90% and β weights) was used to develop a 'best fit' and a 'practical' calibration model. Phase 4: cross-validation of the two newly developed calibration method using LoA. RESULTS The 'best fit' model SEM (0.115 g ml-1), the highest R2 (6.6%) (P ≤ 0.005), whereas the 'practical' calibration model SEM (0.115 g ml-1), R2 (4.7%) (P ≤ 0.005) with r values = 0.271 and 0.596 and R2 (%) coefficients = 0.3526 for the 'best fit' and 'practical' calibration models, respectively (P = 0.01). CONCLUSIONS The two calibration models supported an ecologically and statistically valid contribution and can provide sound judgements about professional footballers' body composition.
Collapse
Affiliation(s)
- Claire Mills
- School of Sport and Exercise, University of Gloucestershire, Gloucester, UK
| | - Mark De Ste Croix
- School of Sport and Exercise, University of Gloucestershire, Gloucester, UK
| | - David James
- School of Sport and Exercise, University of Gloucestershire, Gloucester, UK
| | - Stephen-Mark Cooper
- Cardiff School of Sport and Health Sciences, Cardiff Metropolitan University, Cardiff, UK
| |
Collapse
|
2
|
Ranaut A, Khandnor P, Chand T. Identifying autism using EEG: unleashing the power of feature selection and machine learning. Biomed Phys Eng Express 2024; 10:035013. [PMID: 38457850 DOI: 10.1088/2057-1976/ad31fb] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 03/08/2024] [Indexed: 03/10/2024]
Abstract
Autism Spectrum Disorder (ASD) is a neurodevelopmental condition that is characterized by communication barriers, societal disengagement, and monotonous actions. Currently, the diagnosis of ASD is made by experts through a subjective and time-consuming qualitative behavioural examination using internationally recognized descriptive standards. In this paper, we present an EEG-based three-phase novel approach comprising 29 autistic subjects and 30 neurotypical people. In the first phase, preprocessing of data is performed from which we derived one continuous dataset and four condition-based datasets to determine the role of each dataset in the identification of autism from neurotypical people. In the second phase, time-domain and morphological features were extracted and four different feature selection techniques were applied. In the last phase, five-fold cross-validation is used to evaluate six different machine learning models based on the performance metrics and computational efficiency. The neural network outperformed when trained with maximum relevance and minimum redundancy (MRMR) algorithm on the continuous dataset with 98.10% validation accuracy and 0.9994 area under the curve (AUC) value for model validation, and 98.43% testing accuracy and AUC test value of 0.9998. The decision tree overall performed the second best in terms of computational efficiency and performance accuracy. The results indicate that EEG-based machine learning models have the potential for ASD identification from neurotypical people with a more objective and reliable method.
Collapse
Affiliation(s)
- Anamika Ranaut
- Department of Computer Science and Engineering, Punjab Engineering College, Chandigarh, India
| | - Padmavati Khandnor
- Department of Computer Science and Engineering, Punjab Engineering College, Chandigarh, India
| | - Trilok Chand
- Department of Computer Science and Engineering, Punjab Engineering College, Chandigarh, India
| |
Collapse
|
3
|
Dunias ZS, Van Calster B, Timmerman D, Boulesteix AL, van Smeden M. A comparison of hyperparameter tuning procedures for clinical prediction models: A simulation study. Stat Med 2024; 43:1119-1134. [PMID: 38189632 DOI: 10.1002/sim.9932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 09/10/2023] [Accepted: 09/21/2023] [Indexed: 01/09/2024]
Abstract
Tuning hyperparameters, such as the regularization parameter in Ridge or Lasso regression, is often aimed at improving the predictive performance of risk prediction models. In this study, various hyperparameter tuning procedures for clinical prediction models were systematically compared and evaluated in low-dimensional data. The focus was on out-of-sample predictive performance (discrimination, calibration, and overall prediction error) of risk prediction models developed using Ridge, Lasso, Elastic Net, or Random Forest. The influence of sample size, number of predictors and events fraction on performance of the hyperparameter tuning procedures was studied using extensive simulations. The results indicate important differences between tuning procedures in calibration performance, while generally showing similar discriminative performance. The one-standard-error rule for tuning applied to cross-validation (1SE CV) often resulted in severe miscalibration. Standard non-repeated and repeated cross-validation (both 5-fold and 10-fold) performed similarly well and outperformed the other tuning procedures. Bootstrap showed a slight tendency to more severe miscalibration than standard cross-validation-based tuning procedures. Differences between tuning procedures were larger for smaller sample sizes, lower events fractions and fewer predictors. These results imply that the choice of tuning procedure can have a profound influence on the predictive performance of prediction models. The results support the application of standard 5-fold or 10-fold cross-validation that minimizes out-of-sample prediction error. Despite an increased computational burden, we found no clear benefit of repeated over non-repeated cross-validation for hyperparameter tuning. We warn against the potentially detrimental effects on model calibration of the popular 1SE CV rule for tuning prediction models in low-dimensional settings.
Collapse
Affiliation(s)
- Zoë S Dunias
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Dirk Timmerman
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Department of Obstetrics and Gynecology, University Hospitals Leuven, Leuven, Belgium
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, University of Munich, Munich, Germany
- Munich Center for Machine Learning (MCML), LMU Munich, Munich, Germany
| | - Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
| |
Collapse
|
4
|
Youn A, Chi J, Cui Y, Quan H. A case study: Assessing the efficacy of the revised dosage regimen via prediction model for recurrent event rate using biomarker data. Pharm Stat 2024. [PMID: 38317373 DOI: 10.1002/pst.2362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 11/20/2023] [Accepted: 12/30/2023] [Indexed: 02/07/2024]
Abstract
In recently conducted phase III trials in a rare disease area, patients received monthly treatment at a high dose of the drug, which targets to lower a specific biomarker level, closely associated with the efficacy endpoint, to around 10% across patients. Although this high dose demonstrated strong efficacy, treatments were withheld due to the reports of serious adverse events. Dosing in these studies were later resumed at a reduced dosage which targets to lower the biomarker level to 15%-35% across patients. Two questions arose after this disruption. The first is whether the efficacy of this revised regimen as measured by the reduction in annualized event rate is adequate to support the continuation of the development and the second is whether the potential bias due to the loss of patients during this dosing gap process can be gauged. To address these questions, we built a prediction model that quantitatively characterizes biomarker vs. endpoint relationship and predicts efficacy at the 15%-35% range of the biomarker level using the available data from the original high dose. This model predicts favorable event rate in the target biomarker level and shows that the bias due to the loss of patients is limited. These results support the continued development of the revised regimen, however, given the limitation of the data available, this prediction is planned to be validated further when data under the revised regimen become available.
Collapse
Affiliation(s)
| | | | - Yue Cui
- Sysdata Consulting, Richmond Heights, Missouri, USA
| | - Hui Quan
- Sanofi, Bridgewater, New Jersey, USA
| |
Collapse
|
5
|
Clare JDJ, de Valpine P, Moanga DA, Tingley MW, Beissinger SR. A cloudy forecast for species distribution models: Predictive uncertainties abound for California birds after a century of climate and land-use change. Glob Chang Biol 2024; 30:e17019. [PMID: 37987241 DOI: 10.1111/gcb.17019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 10/04/2023] [Accepted: 10/07/2023] [Indexed: 11/22/2023]
Abstract
Correlative species distribution models are widely used to quantify past shifts in ranges or communities, and to predict future outcomes under ongoing global change. Practitioners confront a wide range of potentially plausible models for ecological dynamics, but most specific applications only consider a narrow set. Here, we clarify that certain model structures can embed restrictive assumptions about key sources of forecast uncertainty into an analysis. To evaluate forecast uncertainties and our ability to explain community change, we fit and compared 39 candidate multi- or joint species occupancy models to avian incidence data collected at 320 sites across California during the early 20th century and resurveyed a century later. We found massive (>20,000 LOOIC) differences in within-time information criterion across models. Poorer fitting models omitting multivariate random effects predicted less variation in species richness changes and smaller contemporary communities, with considerable variation in predicted spatial patterns in richness changes across models. The top models suggested avian environmental associations changed across time, contemporary avian occupancy was influenced by previous site-specific occupancy states, and that both latent site variables and species associations with these variables also varied over time. Collectively, our results recapitulate that simplified model assumptions not only impact predictive fit but may mask important sources of forecast uncertainty and mischaracterize the current state of system understanding when seeking to describe or project community responses to global change. We recommend that researchers seeking to make long-term forecasts prioritize characterizing forecast uncertainty over seeking to present a single best guess. To do so reliably, we urge practitioners to employ models capable of characterizing the key sources of forecast uncertainty, where predictors, parameters and random effects may vary over time or further interact with previous occurrence states.
Collapse
Affiliation(s)
- John D J Clare
- Museum of Vertebrate Zoology, University of California-Berkeley, Berkeley, California, USA
- Department of Environmental Science, Policy, and Management, University of California-Berkeley, Berkeley, California, USA
| | - Perry de Valpine
- Department of Environmental Science, Policy, and Management, University of California-Berkeley, Berkeley, California, USA
| | - Diana A Moanga
- Department of Earth System Science, Stanford University, Palo Alto, California, USA
| | - Morgan W Tingley
- Department of Ecology and Evolutionary Biology, University of California-Los Angeles, Los Angeles, California, USA
| | - Steven R Beissinger
- Museum of Vertebrate Zoology, University of California-Berkeley, Berkeley, California, USA
- Department of Environmental Science, Policy, and Management, University of California-Berkeley, Berkeley, California, USA
| |
Collapse
|
6
|
Sheel H, Suárez L, Marsh NV. Screening Children in India: Translation and Psychometric Evaluation of the Parents' Evaluation of Developmental Status and the Strength and Difficulties Questionnaire. Pediatr Rep 2023; 15:750-765. [PMID: 38133435 PMCID: PMC10745979 DOI: 10.3390/pediatric15040067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 12/06/2023] [Accepted: 12/08/2023] [Indexed: 12/23/2023] Open
Abstract
Timely screening and surveillance of children for developmental delay and social-emotional learning difficulties are essential in Low- and Middle-Income Countries like India. Screening measures like the Parents' Evaluation of Developmental Status (PEDS) and Strength and Difficulties Questionnaire (SDQ) are considered suitable for India due to their low cost, easy accessibility, and no training requirement for administration. However, India lacks validated screening measures, and the PEDS and SDQ have yet to be validated for children in India. The study aimed to translate the PEDS and SDQ from English to Hindi and psychometrically evaluate the same measures on children aged 4-8 years in India. The original PEDS and SDQ forms and their translations were pilot tested on 55 participants and evaluated using data from 407 children with typical development (TD) and 59 children with developmental disability (DD). Parents and teachers reported no meaningful discrepancy between the original and translated (Hindi) questionnaires. Internal consistency for the PEDS was acceptable, but unacceptable for most subscales on the SDQ, for both TD and DD samples. Test-retest reliability was poor for the PEDS but adequate for the SDQ. Results from known-group validity testing showed that the PEDS scores could be used to distinguish between the TD and DD samples. The results from this study provide further support for the use of the PEDS and SDQ in developing countries like India.
Collapse
Affiliation(s)
- Hina Sheel
- School of Social and Health Sciences, James Cook University, Singapore 387380, Singapore (L.S.)
- School of Health and Life Sciences, De Montfort University, Academic City, Dubai 294345, United Arab Emirates
| | - Lidia Suárez
- School of Social and Health Sciences, James Cook University, Singapore 387380, Singapore (L.S.)
| | - Nigel V. Marsh
- School of Social and Health Sciences, James Cook University, Singapore 387380, Singapore (L.S.)
| |
Collapse
|
7
|
Karpenko D, Bigildeev A. Small groups in multidimensional feature space: Two examples of supervised two-group classification from biomedicine. J Bioinform Comput Biol 2023; 21:2350025. [PMID: 38212875 DOI: 10.1142/s0219720023500257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2024]
Abstract
Some biomedical datasets contain a small number of samples which have large numbers of features. This can make analysis challenging and prone to errors such as overfitting and misinterpretation. To improve the accuracy and reliability of analysis in such cases, we present a tutorial that demonstrates a mathematical approach for a supervised two-group classification problem using two medical datasets. A tutorial provides insights on effectively addressing uncertainties and handling missing values without the need for removing or inputting additional data. We describe a method that considers the size and shape of feature distributions, as well as the pairwise relations between measured features as separate derived features and prognostic factors. Additionally, we explain how to perform similarity calculations that account for the variation in feature values within groups and inaccuracies in individual value measurements. By following these steps, a more accurate and reliable analysis can be achieved when working with biomedical datasets that have a small sample size and multiple features.
Collapse
Affiliation(s)
- Dmitriy Karpenko
- Laboratory of Epigenetic Regulation of Hematopoiesis, National Medical Research Center for Hematology, Novii Zikovskii proezd, 4, 125167 Russia, Moscow, Russia
| | - Aleksei Bigildeev
- Laboratory of Epigenetic Regulation of Hematopoiesis, National Medical Research Center for Hematology, Novii Zikovskii proezd, 4, 125167 Russia, Moscow, Russia
| |
Collapse
|
8
|
Ranalli MG, Salvati N, Petrella L, Pantalone F. M-quantile regression shrinkage and selection via the Lasso and Elastic Net to assess the effect of meteorology and traffic on air quality. Biom J 2023; 65:e2100355. [PMID: 37743255 DOI: 10.1002/bimj.202100355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 01/31/2023] [Accepted: 04/11/2023] [Indexed: 09/26/2023]
Abstract
In this work, we intersect data on size-selected particulate matter (PM) with vehicular traffic counts and a comprehensive set of meteorological covariates to study the effect of traffic on air quality. To this end, we develop an M-quantile regression model with Lasso and Elastic Net penalizations. This allows (i) to identify the best proxy for vehicular traffic via model selection, (ii) to investigate the relationship between fine PM concentration and the covariates at different M-quantiles of the conditional response distribution, and (iii) to be robust to the presence of outliers. Heterogeneity in the data is accounted by fitting a B-spline on the effect of the day of the year. Analytic and bootstrap-based variance estimates of the regression coefficients are provided, together with a numerical evaluation of the proposed estimation procedure. Empirical results show that atmospheric stability is responsible for the most significant effect on fine PM concentration: this effect changes at different levels of the conditional response distribution and is relatively weaker on the tails. On the other hand, model selection allows to identify the best proxy for vehicular traffic whose effect remains essentially the same at different levels of the conditional response distribution.
Collapse
Affiliation(s)
| | - Nicola Salvati
- Department of Economics and Management, University of Pisa, Pisa, Italy
| | - Lea Petrella
- MEMOTEF Department, Sapienza University of Rome, Rome, Lazio, Italy
| | - Francesco Pantalone
- Department of Social Statistics and Demography, University of Southampton, Southampton, UK
| |
Collapse
|
9
|
Kendall CJ, Bracebridge C, Lynch EC, Mgumba M, Monadjem A, Nicholas A, Kane A. Value of combining transect counts and telemetry data to determine short-term population trends in a globally threatened species. Conserv Biol 2023; 37:e14146. [PMID: 37424360 DOI: 10.1111/cobi.14146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 06/20/2023] [Accepted: 06/21/2023] [Indexed: 07/11/2023]
Abstract
To evaluate conservation interventions, it is necessary to obtain reliable population trends for short (<10 years) time scales. Telemetry can be used to estimate short-term survival rates and is a common tool for assessing population trends, but it has limitations and can be biased toward specific behavioral traits of tagged individuals. Encounter rates calculated from transects can be useful for assessing changes across multiple species, but they can have large confidence intervals and be affected by variations in survey conditions. The decline of African vultures has been well-documented, but understanding of recent trends is lacking. To examine population trends, we used survival estimates from telemetry data collected over 6 years (primarily for white-backed vultures [Gyps africanus]) and transect counts conducted over 8 years (for 7 scavenging raptors) in 3 large protected areas in Tanzania. Population trends were estimated using survival analysis combined with the Leslie Lefkovitch matrix model from the telemetry data and using Bayesian mixed effects generalized linear regression models from the transect data. Both methods showed significant declines for white-backed vultures in Ruaha and Nyerere National Parks. Only telemetry estimates suggested significant declines in Katavi National Park. Encounter rates calculated from transects also showed declines in Nyerere National Park for lappet-faced vultures (38% annual declines) and Bateleurs (18%) and in Ruaha National Park for white-headed vultures (Trigonoceps occipitalis) (19%). Mortality rates recorded and inferred from telemetry suggested that poisoning is prevalent. However, only 6 mortalities of the 26 presumed mortalities were confirmed to be caused by poisoning, highlighting the challenges of determining the cause of death when working across large landscapes. Despite declines, our data provide evidence that southern Tanzania has higher current encounter rates of African vultures than elsewhere in East Africa. Preventing further declines will depend greatly on mitigating poisoning. Based on our results, we suggest that the use of multiple techniques improves understanding of population trends over the short term.
Collapse
Affiliation(s)
- Corinne J Kendall
- Conservation, Education and Science, North Carolina Zoo, Asheboro, North Carolina, USA
- Department of Applied Ecology, North Carolina State University, Raleigh, North Carolina, USA
| | - Claire Bracebridge
- Conservation, Education and Science, North Carolina Zoo, Asheboro, North Carolina, USA
| | - Emily C Lynch
- Conservation, Education and Science, North Carolina Zoo, Asheboro, North Carolina, USA
| | - Msafiri Mgumba
- Wildlife Conservation Society, Ruaha-Katavi Landscape Program, Tanzania
| | - Ara Monadjem
- Department of Biological Sciences, University of Eswatini, Kwaluseni, Eswatini
- Mammal Research Institute, Department of Zoology and Entomology, University of Pretoria, Hatfield, Pretoria, South Africa
| | - Aaron Nicholas
- Wildlife Conservation Society, Ruaha-Katavi Landscape Program, Tanzania
| | - Adam Kane
- School of Biology and Environmental Science and Earth Institute, O'Brien Science Centre West, University College Dublin Belfield, Dublin, Ireland
| |
Collapse
|
10
|
Liu Y, Wang W. What Can We Learn from a Semiparametric Factor Analysis of Item Responses and Response Time? An Illustration with the PISA 2015 Data. Psychometrika 2023:10.1007/s11336-023-09936-3. [PMID: 37973773 DOI: 10.1007/s11336-023-09936-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Indexed: 11/19/2023]
Abstract
It is widely believed that a joint factor analysis of item responses and response time (RT) may yield more precise ability scores that are conventionally predicted from responses only. For this purpose, a simple-structure factor model is often preferred as it only requires specifying an additional measurement model for item-level RT while leaving the original item response theory (IRT) model for responses intact. The added speed factor indicated by item-level RT correlates with the ability factor in the IRT model, allowing RT data to carry additional information about respondents' ability. However, parametric simple-structure factor models are often restrictive and fit poorly to empirical data, which prompts under-confidence in the suitablity of a simple factor structure. In the present paper, we analyze the 2015 Programme for International Student Assessment mathematics data using a semiparametric simple-structure model. We conclude that a simple factor structure attains a decent fit after further parametric assumptions in the measurement model are sufficiently relaxed. Furthermore, our semiparametric model implies that the association between latent ability and speed/slowness is strong in the population, but the form of association is nonlinear. It follows that scoring based on the fitted model can substantially improve the precision of ability scores.
Collapse
Affiliation(s)
- Yang Liu
- Department of Human Development and Quantitative Methodology, University of Maryland, 3304R Benjamin Bldg, 3942 Campus Dr, College Park, MD, 20742, USA.
| | - Weimeng Wang
- Department of Human Development and Quantitative Methodology, University of Maryland, 3304R Benjamin Bldg, 3942 Campus Dr, College Park, MD, 20742, USA
| |
Collapse
|
11
|
Hazelton ML. Shrinkage estimators of the spatial relative risk function. Stat Med 2023; 42:4556-4569. [PMID: 37599209 DOI: 10.1002/sim.9875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 06/26/2023] [Accepted: 08/01/2023] [Indexed: 08/22/2023]
Abstract
The spatial relative risk function describes differences in the geographical distribution of two types of points, such as locations of cases and controls in an epidemiological study. It is defined as the ratio of the two underlying densities. Estimation of spatial relative risk is typically done using kernel estimates of these densities, but this procedure is often challenging in practice because of the high degree of spatial inhomogeneity in the distributions. This makes it difficult to obtain estimates of the relative risk that are stable in areas of sparse data while retaining necessary detail elsewhere, and consequently difficult to distinguish true risk hotspots from stochastic bumps in the risk function. We study shrinkage estimators of the spatial relative risk function to address these problems. In particular, we propose a new lasso-type estimator that shrinks a standard kernel estimator of the log-relative risk function towards zero. The shrinkage tuning parameter can be adjusted to help quantify the degree of evidence for the existence of risk hotspots, or selected to optimize a cross-validation criterion. The performance of the lasso estimator is encouraging both on a simulation study and on real-world examples.
Collapse
Affiliation(s)
- Martin L Hazelton
- Department of Mathematics and Statistics, University of Otago, Dunedin, New Zealand
| |
Collapse
|
12
|
Sun MW, Tibshirani R. Confidence intervals for the Cox model test error from cross-validation. Stat Med 2023; 42:4532-4541. [PMID: 37580906 PMCID: PMC10734684 DOI: 10.1002/sim.9873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Revised: 06/25/2023] [Accepted: 07/29/2023] [Indexed: 08/16/2023]
Abstract
Cross-validation (CV) is one of the most widely used techniques in statistical learning for estimating the test error of a model, but its behavior is not yet fully understood. It has been shown that standard confidence intervals for test error using estimates from CV may have coverage below nominal levels. This phenomenon occurs because each sample is used in both the training and testing procedures during CV and as a result, the CV estimates of the errors become correlated. Without accounting for this correlation, the estimate of the variance is smaller than it should be. One way to mitigate this issue is by estimating the mean squared error of the prediction error instead using nested CV. This approach has been shown to achieve superior coverage compared to intervals derived from standard CV. In this work, we generalize the nested CV idea to the Cox proportional hazards model and explore various choices of test error for this setting.
Collapse
Affiliation(s)
- Min Woo Sun
- Department of Biomedical Data Science, Stanford University, CA, United States
| | - Robert Tibshirani
- Department of Biomedical Data Science, Stanford University, CA, United States
- Department of Statistics, Stanford University, CA, United States
| |
Collapse
|
13
|
Vamvourellis K, Kalogeropoulos K, Moustaki I. Assessment of generalised Bayesian structural equation models for continuous and binary data. Br J Math Stat Psychol 2023; 76:559-584. [PMID: 37401608 DOI: 10.1111/bmsp.12314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 04/17/2023] [Indexed: 07/05/2023]
Abstract
The paper proposes a novel model assessment paradigm aiming to address shortcoming of posterior predictivep -values, which provide the default metric of fit for Bayesian structural equation modelling (BSEM). The model framework presented in the paper focuses on the approximate zero approach (Psychological Methods, 17, 2012, 313), which involves formulating certain parameters (such as factor loadings) to be approximately zero through the use of informative priors, instead of explicitly setting them to zero. The introduced model assessment procedure monitors the out-of-sample predictive performance of the fitted model, and together with a list of guidelines we provide, one can investigate whether the hypothesised model is supported by the data. We incorporate scoring rules and cross-validation to supplement existing model assessment metrics for BSEM. The proposed tools can be applied to models for both continuous and binary data. The modelling of categorical and non-normally distributed continuous data is facilitated with the introduction of an item-individual random effect. We study the performance of the proposed methodology via simulation experiments as well as real data on the 'Big-5' personality scale and the Fagerstrom test for nicotine dependence.
Collapse
Affiliation(s)
| | | | - Irini Moustaki
- Department of Statistics, London School of Economics, London, UK
| |
Collapse
|
14
|
Jin Y, Kattan MW. Methodologic Issues Specific to Prediction Model Development and Evaluation. Chest 2023; 164:1281-1289. [PMID: 37414333 DOI: 10.1016/j.chest.2023.06.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 06/26/2023] [Accepted: 06/27/2023] [Indexed: 07/08/2023] Open
Abstract
Developing and evaluating statistical prediction models is challenging, and many pitfalls can arise. This article identifies what the authors believe are some common methodologic concerns that may be encountered. We describe each problem and make suggestions regarding how to address them. The hope is that this article will result in higher-quality publications of statistical prediction models.
Collapse
Affiliation(s)
- Yuxuan Jin
- Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, OH
| | - Michael W Kattan
- Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, OH.
| |
Collapse
|
15
|
Su SY. Synthesized Age-Period-Cohort Prediction Method: Application to Lung Cancer Mortality in Taiwan. Am J Epidemiol 2023; 192:1712-1719. [PMID: 37218606 DOI: 10.1093/aje/kwad120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 04/18/2023] [Accepted: 05/16/2023] [Indexed: 05/24/2023] Open
Abstract
Age-period-cohort analysis involves 3 temporal factors: age (the length of time from birth to diagnosis), period (the calendar time of diagnosis), and cohort (the calendar time of birth). The application of age-period-cohort analysis in disease forecasting can help researchers and health authorities anticipate future disease burden. In this study, a synthesized age-period-cohort prediction method was proposed based on 4 assumptions: 1) no single model can dominate as the most accurate prediction model in all forecasting scenarios; 2) historical trends will not continue indefinitely; 3) a model with the most accurate forecast for the training data will also be appropriate for forecasting future data; and 4) a model dominated by the stochastic temporal change will be the best-selected model with the robust forecasting. An ensemble of age-period-cohort prediction models was constructed, and Monte Carlo cross-validation was performed to evaluate forecasting accuracy of these models. Data on lung cancer mortality from 1996 to 2015 in Taiwan were used and projected to the year 2035 to illustrate the method. The actual lung cancer mortality rates from 2016 to 2020 were then used to verify the forecasting accuracy.
Collapse
|
16
|
Hiramitsu T, Hasegawa Y, Futamura K, Okada M, Matsuoka Y, Goto N, Ichimori T, Narumi S, Takeda A, Kobayashi T, Uchida K, Watarai Y. Prediction models for the recipients' ideal perioperative estimated glomerular filtration rates for predicting graft survival after adult living-donor kidney transplantation. Front Med (Lausanne) 2023; 10:1187777. [PMID: 37720509 PMCID: PMC10501755 DOI: 10.3389/fmed.2023.1187777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 08/22/2023] [Indexed: 09/19/2023] Open
Abstract
Introduction The impact of the perioperative estimated glomerular filtration rate (eGFR) on graft survival in kidney transplant recipients is yet to be evaluated. In this study, we developed prediction models for the ideal perioperative eGFRs in recipients. Methods We evaluated the impact of perioperative predicted ideal and actual eGFRs on graft survival by including 1,174 consecutive adult patients who underwent living-donor kidney transplantation (LDKT) between January 2008 and December 2020. Prediction models for the ideal perioperative eGFR were developed for 676 recipients who were randomly assigned to the training and validation sets (ratio: 7:3). The prediction models for the ideal best eGFR within 3 weeks and those at 1, 2, and 3 weeks after LDKT in 474 recipients were developed using 10-fold validation and stepwise multiple regression model analyzes. The developed prediction models were validated in 202 recipients. Finally, the impact of perioperative predicted ideal eGFRs/actual eGFRs on graft survival was investigated using Fine-Gray regression analysis. Results The correlation coefficients of the predicted ideal best eGFR within 3 weeks and the predicted ideal eGFRs at 1, 2, and 3 weeks after LDKT were 0.651, 0.600, 0.598, and 0.617, respectively. Multivariate analyzes for graft loss demonstrated significant differences in the predicted ideal best eGFR/actual best eGFR within 3 weeks and the predicted ideal eGFRs/actual eGFRs at 1, 2, and 3 weeks after LDKT. Discussion The predicted ideal best eGFR/actual best eGFR within 3 weeks and the predicted ideal eGFRs/actual eGFRs at 1, 2, and 3 weeks after LDKT were independent prognostic factors for graft loss. Therefore, the perioperative predicted ideal eGFR/actual eGFR may be useful for predicting graft survival after adult LDKT.
Collapse
Affiliation(s)
- Takahisa Hiramitsu
- Department of Transplant and Endocrine Surgery, Japanese Red Cross Aichi Medical Center Nagoya Daini Hospital, Nagoya, Japan
| | - Yuki Hasegawa
- Department of Transplant and Endocrine Surgery, Japanese Red Cross Aichi Medical Center Nagoya Daini Hospital, Nagoya, Japan
| | - Kenta Futamura
- Department of Transplant and Endocrine Surgery, Japanese Red Cross Aichi Medical Center Nagoya Daini Hospital, Nagoya, Japan
| | - Manabu Okada
- Department of Transplant and Endocrine Surgery, Japanese Red Cross Aichi Medical Center Nagoya Daini Hospital, Nagoya, Japan
| | - Yutaka Matsuoka
- Department of Renal Transplant Surgery, Masuko Memorial Hospital, Nagoya, Japan
| | - Norihiko Goto
- Department of Transplant and Endocrine Surgery, Japanese Red Cross Aichi Medical Center Nagoya Daini Hospital, Nagoya, Japan
| | - Toshihiro Ichimori
- Department of Transplant and Endocrine Surgery, Japanese Red Cross Aichi Medical Center Nagoya Daini Hospital, Nagoya, Japan
| | - Shunji Narumi
- Department of Transplant and Endocrine Surgery, Japanese Red Cross Aichi Medical Center Nagoya Daini Hospital, Nagoya, Japan
| | - Asami Takeda
- Department of Nephrology, Japanese Red Cross Aichi Medical Center Nagoya Daini Hospital, Nagoya, Japan
| | - Takaaki Kobayashi
- Department of Renal Transplant Surgery, Aichi Medical University School of Medicine, Nagakute, Japan
| | - Kazuharu Uchida
- Department of Renal Transplant Surgery, Masuko Memorial Hospital, Nagoya, Japan
| | - Yoshihiko Watarai
- Department of Transplant and Endocrine Surgery, Japanese Red Cross Aichi Medical Center Nagoya Daini Hospital, Nagoya, Japan
| |
Collapse
|
17
|
Dousti Mousavi N, Aldirawi H, Yang J. Categorical Data Analysis for High-Dimensional Sparse Gene Expression Data. BioTech (Basel) 2023; 12:52. [PMID: 37606439 PMCID: PMC10443356 DOI: 10.3390/biotech12030052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 07/15/2023] [Accepted: 07/24/2023] [Indexed: 08/23/2023] Open
Abstract
Categorical data analysis becomes challenging when high-dimensional sparse covariates are involved, which is often the case for omics data. We introduce a statistical procedure based on multinomial logistic regression analysis for such scenarios, including variable screening, model selection, order selection for response categories, and variable selection. We perform our procedure on high-dimensional gene expression data with 801 patients, 2426 genes, and five types of cancerous tumors. As a result, we recommend three finalized models: one with 74 genes achieves extremely low cross-entropy loss and zero predictive error rate based on a five-fold cross-validation; and two other models with 31 and 4 genes, respectively, are recommended for prognostic multi-gene signatures.
Collapse
Affiliation(s)
- Niloufar Dousti Mousavi
- Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, Chicago, IL 60607, USA;
| | - Hani Aldirawi
- Department of Mathematics, California State University—San Bernardino, San Bernardino, CA 92407, USA;
| | - Jie Yang
- Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, Chicago, IL 60607, USA;
| |
Collapse
|
18
|
Daba SD, Kiszonas AM, McGee RJ. Selecting High-Performing and Stable Pea Genotypes in Multi-Environmental Trial (MET): Applying AMMI, GGE-Biplot, and BLUP Procedures. Plants (Basel) 2023; 12:2343. [PMID: 37375968 DOI: 10.3390/plants12122343] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 05/30/2023] [Accepted: 06/13/2023] [Indexed: 06/29/2023]
Abstract
A large amount of data on various traits is accumulated over the course of a breeding program and can be used to optimize various aspects of the crop improvement pipeline. We leveraged data from advanced yield trials (AYT) of three classes of peas (green, yellow, and winter peas) collected over ten years (2012-2021) to analyze and test key aspects fundamental to pea breeding. Six balanced datasets were used to test the predictive success of the BLUP and AMMI family models. Predictive assessment using cross-validation indicated that BLUP offered better predictive accuracy as compared to any AMMI family model. However, BLUP may not always identify the best genotype that performs well across environments. AMMI and GGE, two statistical tools used to exploit GE, could fill this gap and aid in understanding how genotypes perform across environments. AMMI's yield by environmental IPCA1, WAASB by yield plot, and GGE biplot were shown to be useful in identifying genotypes for specific or broad adaptability. When compared to the most favorable environment, we observed a yield reduction of 80-87% in the most unfavorable environment. The seed yield variability across environments was caused in part by weather variability. Hotter conditions in June and July as well as low precipitation in May and June affected seed yield negatively. In conclusion, the findings of this study are useful to breeders in the variety selection process and growers in pea production.
Collapse
Affiliation(s)
- Sintayehu D Daba
- USDA-ARS Western Wheat & Pulse Quality Laboratory, Pullman, WA 99164, USA
| | - Alecia M Kiszonas
- USDA-ARS Western Wheat & Pulse Quality Laboratory, Pullman, WA 99164, USA
| | - Rebecca J McGee
- USDA-ARS Grain Legume Genetics and Physiology Research Unit, Pullman, WA 99164, USA
| |
Collapse
|
19
|
Alem O, Hughes KJ, Buard I, Cheung TP, Maydew T, Griesshammer A, Holloway K, Park A, Lechuga V, Coolidge C, Gerginov M, Quigg E, Seames A, Kronberg E, Teale P, Knappe S. An integrated full-head OPM-MEG system based on 128 zero-field sensors. Front Neurosci 2023; 17:1190310. [PMID: 37389367 PMCID: PMC10303922 DOI: 10.3389/fnins.2023.1190310] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 05/24/2023] [Indexed: 07/01/2023] Open
Abstract
Compact optically-pumped magnetometers (OPMs) are now commercially available with noise floors reaching 10 fT/Hz1/2. However, to be used effectively for magnetoencephalography (MEG), dense arrays of these sensors are required to operate as an integrated turn-key system. In this study, we present the HEDscan, a 128-sensor OPM MEG system by FieldLine Medical, and evaluate its sensor performance with regard to bandwidth, linearity, and crosstalk. We report results from cross-validation studies with conventional cryogenic MEG, the Magnes 3,600 WH Biomagnetometer by 4-D Neuroimaging. Our results show high signal amplitudes captured by the OPM-MEG system during a standard auditory paradigm, where short tones at 1000 Hz were presented to the left ear of six healthy adult volunteers. We validate these findings through an event-related beamformer analysis, which is in line with existing literature results.
Collapse
Affiliation(s)
- Orang Alem
- FieldLine Medical, Boulder, CO, United States
- Paul M. Rady Department of Mechanical Engineering, University of Colorado Boulder, Boulder, CO, United States
- FieldLine Industries, Boulder, CO, United States
| | - K. Jeramy Hughes
- FieldLine Medical, Boulder, CO, United States
- Paul M. Rady Department of Mechanical Engineering, University of Colorado Boulder, Boulder, CO, United States
- FieldLine Industries, Boulder, CO, United States
| | - Isabelle Buard
- Anschutz Medical Campus, University of Colorado Denver, Denver, CO, United States
| | - Teresa P. Cheung
- FieldLine Medical, Boulder, CO, United States
- School of Engineering, Simon Fraser University, Burnaby, BC, Canada
- Surrey Memorial Hospital, Fraser Health Authority, Surrey, BC, Canada
| | | | | | | | - Aaron Park
- FieldLine Medical, Boulder, CO, United States
| | | | | | - Marja Gerginov
- Paul M. Rady Department of Mechanical Engineering, University of Colorado Boulder, Boulder, CO, United States
| | - Erik Quigg
- Paul M. Rady Department of Mechanical Engineering, University of Colorado Boulder, Boulder, CO, United States
| | - Alexander Seames
- Anschutz Medical Campus, University of Colorado Denver, Denver, CO, United States
| | - Eugene Kronberg
- Anschutz Medical Campus, University of Colorado Denver, Denver, CO, United States
| | - Peter Teale
- Anschutz Medical Campus, University of Colorado Denver, Denver, CO, United States
| | - Svenja Knappe
- FieldLine Medical, Boulder, CO, United States
- Paul M. Rady Department of Mechanical Engineering, University of Colorado Boulder, Boulder, CO, United States
- FieldLine Industries, Boulder, CO, United States
| |
Collapse
|
20
|
Ballout N, Etievant L, Viallon V. On the use of cross-validation for the calibration of the adaptive lasso. Biom J 2023; 65:e2200047. [PMID: 36960476 DOI: 10.1002/bimj.202200047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 11/10/2022] [Accepted: 12/30/2022] [Indexed: 03/25/2023]
Abstract
Cross-validation is the standard method for hyperparameter tuning, or calibration, of machine learning algorithms. The adaptive lasso is a popular class of penalized approaches based on weighted L1 -norm penalties, with weights derived from an initial estimate of the model parameter. Although it violates the paramount principle of cross-validation, according to which no information from the hold-out test set should be used when constructing the model on the training set, a "naive" cross-validation scheme is often implemented for the calibration of the adaptive lasso. The unsuitability of this naive cross-validation scheme in this context has not been well documented in the literature. In this work, we recall why the naive scheme is theoretically unsuitable and how proper cross-validation should be implemented in this particular context. Using both synthetic and real-world examples and considering several versions of the adaptive lasso, we illustrate the flaws of the naive scheme in practice. In particular, we show that it can lead to the selection of adaptive lasso estimates that perform substantially worse than those selected via a proper scheme in terms of both support recovery and prediction error. In other words, our results show that the theoretical unsuitability of the naive scheme translates into suboptimality in practice, and call for abandoning it.
Collapse
Affiliation(s)
- Nadim Ballout
- Univ Lyon, Univ Eiffel, IFSTTAR, Univ Lyon 1, UMRESTTE, Bron, France
| | - Lola Etievant
- Univ Lyon, Univ Eiffel, IFSTTAR, Univ Lyon 1, UMRESTTE, Bron, France
- Institut Camille Jordan, Université Claude Bernard Lyon 1, Lyon, France
| | - Vivian Viallon
- Nutrition and Metabolism Branch, International Agency for Research on Cancer (IARC-WHO), Lyon, France
| |
Collapse
|
21
|
Zhao B, Ivanova A, Fine J. Inference on subgroups identified based on a heterogeneous treatment effect in a post hoc analysis of a clinical trial. Clin Trials 2023:17407745231173055. [PMID: 37170632 DOI: 10.1177/17407745231173055] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Due to the many benefits of understanding treatment effect heterogeneity in a clinical trial, an exploratory post hoc subgroup analysis is often performed to find subpopulations of patients with conditional average treatment effect that suggests better treatment efficacy than in the overall population. A naive re-substitution approach uses all available data to identify a subgroup and then proceeds with estimation and inference using the same data set. This approach generally leads to an overly optimistic estimate of conditional average treatment effect. In this article, in a post hoc analysis, we estimate the target optimal subgroup through maximizing a utility function, from candidates systematically identified with a penalized regression. We then compare two resampling-based bias-correction methods, cross-validation and debiasing bootstrap, for obtaining approximately unbiased estimates and valid inference of conditional average treatment effect in the identified subgroup, with either an empirical or an augmented estimator. Our results show that both the cross-validation and the debiasing bootstrap methods reduce the re-substitution bias effectively. The cross-validation method appears to have less biased point estimates, smaller standard error estimates, but poorer coverages than the debiasing bootstrap method when using the empirical estimator and the sample size is moderate. Using the augmented estimator in the debiasing bootstrap method leads to less biased point estimates but poorer coverages. We conclude that bias correction should be a part of every exploratory post hoc subgroup analysis to eliminate re-substitution bias and to obtain a proper confidence interval for the estimated conditional average treatment effect in the selected subgroup.
Collapse
Affiliation(s)
- Beibo Zhao
- Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Anastasia Ivanova
- Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Jason Fine
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| |
Collapse
|
22
|
Difabachew YF, Frisch M, Langstroff AL, Stahl A, Wittkop B, Snowdon RJ, Koch M, Kirchhoff M, Cselényi L, Wolf M, Förster J, Weber S, Okoye UJ, Zenke-Philippi C. Genomic prediction with haplotype blocks in wheat. Front Plant Sci 2023; 14:1168547. [PMID: 37229104 PMCID: PMC10203549 DOI: 10.3389/fpls.2023.1168547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 04/17/2023] [Indexed: 05/27/2023]
Abstract
Haplotype blocks might carry additional information compared to single SNPs and have therefore been suggested for use as independent variables in genomic prediction. Studies in different species resulted in more accurate predictions than with single SNPs in some traits but not in others. In addition, it remains unclear how the blocks should be built to obtain the greatest prediction accuracies. Our objective was to compare the results of genomic prediction with different types of haplotype blocks to prediction with single SNPs in 11 traits in winter wheat. We built haplotype blocks from marker data from 361 winter wheat lines based on linkage disequilibrium, fixed SNP numbers, fixed lengths in cM and with the R package HaploBlocker. We used these blocks together with data from single-year field trials in a cross-validation study for predictions with RR-BLUP, an alternative method (RMLA) that allows for heterogeneous marker variances, and GBLUP performed with the software GVCHAP. The greatest prediction accuracies for resistance scores for B. graminis, P. triticina, and F. graminearum were obtained with LD-based haplotype blocks while blocks with fixed marker numbers and fixed lengths in cM resulted in the greatest prediction accuracies for plant height. Prediction accuracies of haplotype blocks built with HaploBlocker were greater than those of the other methods for protein concentration and resistances scores for S. tritici, B. graminis, and P. striiformis. We hypothesize that the trait-dependence is caused by properties of the haplotype blocks that have overlapping and contrasting effects on the prediction accuracy. While they might be able to capture local epistatic effects and to detect ancestral relationships better than single SNPs, prediction accuracy might be reduced by unfavorable characteristics of the design matrices in the models that are due to their multi-allelic nature.
Collapse
Affiliation(s)
| | - Matthias Frisch
- Institute of Agronomy and Plant Breeding II, Justus Liebig University, Gießen, Germany
| | - Anna Luise Langstroff
- Institute of Agronomy and Plant Breeding I, Justus Liebig University, Gießen, Germany
| | - Andreas Stahl
- Institute for Resistance Research and Stress Tolerance, Julius Kühn Institute, Quedlinburg, Germany
| | - Benjamin Wittkop
- Institute of Agronomy and Plant Breeding I, Justus Liebig University, Gießen, Germany
| | - Rod J. Snowdon
- Institute of Agronomy and Plant Breeding I, Justus Liebig University, Gießen, Germany
| | | | | | - László Cselényi
- Department of Cereal Breeding, W. von Borries-Eckendorf GmbH & Co. KG, Leopoldshöhe, Germany
| | - Markus Wolf
- German Seed Alliance GmbH, Holtsee, Germany
- Saaten-Union Biotec GmbH, Leopoldshöhe, Germany
| | | | - Sven Weber
- Institute of Agronomy and Plant Breeding I, Justus Liebig University, Gießen, Germany
| | - Uche Joshua Okoye
- Institute of Agronomy and Plant Breeding II, Justus Liebig University, Gießen, Germany
| | - Carola Zenke-Philippi
- Institute of Agronomy and Plant Breeding II, Justus Liebig University, Gießen, Germany
| |
Collapse
|
23
|
Liu Z, Lan P, Liu T, Liu X, Liu T. m6Aminer: Predicting the m6Am Sites on mRNA by Fusing Multiple Sequence-Derived Features into a CatBoost-Based Classifier. Int J Mol Sci 2023; 24:ijms24097878. [PMID: 37175594 PMCID: PMC10177809 DOI: 10.3390/ijms24097878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 04/20/2023] [Accepted: 04/24/2023] [Indexed: 05/15/2023] Open
Abstract
As one of the most important post-transcriptional modifications, m6Am plays a fairly important role in conferring mRNA stability and in the progression of cancers. The accurate identification of the m6Am sites is critical for explaining its biological significance and developing its application in the medical field. However, conventional experimental approaches are time-consuming and expensive, making them unsuitable for the large-scale identification of the m6Am sites. To address this challenge, we exploit a CatBoost-based method, m6Aminer, to identify the m6Am sites on mRNA. For feature extraction, nine different feature-encoding schemes (pseudo electron-ion interaction potential, hash decimal conversion method, dinucleotide binary encoding, nucleotide chemical properties, pseudo k-tuple composition, dinucleotide numerical mapping, K monomeric units, series correlation pseudo trinucleotide composition, and K-spaced nucleotide pair frequency) were utilized to form the initial feature space. To obtain the optimized feature subset, the ExtraTreesClassifier algorithm was adopted to perform feature importance ranking, and the top 300 features were selected as the optimal feature subset. With different performance assessment methods, 10-fold cross-validation and independent test, m6Aminer achieved average AUC of 0.913 and 0.754, demonstrating a competitive performance with the state-of-the-art models m6AmPred (0.905 and 0.735) and DLm6Am (0.897 and 0.730). The prediction model developed in this study can be used to identify the m6Am sites in the whole transcriptome, laying a foundation for the functional research of m6Am.
Collapse
Affiliation(s)
- Ze Liu
- College of Water Resources and Architectural Engineering, Northwest A&F University, Xianyang 712100, China
| | - Pengfei Lan
- College of Water Resources and Architectural Engineering, Northwest A&F University, Xianyang 712100, China
| | - Ting Liu
- College of Water Resources and Architectural Engineering, Northwest A&F University, Xianyang 712100, China
- Department of Mechanical Engineering, Faculty of Engineering, The University of Hong Kong, Hong Kong 999077, China
| | - Xudong Liu
- College of Water Resources and Architectural Engineering, Northwest A&F University, Xianyang 712100, China
- College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, China
| | - Tao Liu
- College of Water Resources and Architectural Engineering, Northwest A&F University, Xianyang 712100, China
- Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A&F University, Xianyang 712100, China
| |
Collapse
|
24
|
Bakheet S, Alsubai S, El-Nagar A, Alqahtani A. A Multi-Feature Fusion Framework for Automatic Skin Cancer Diagnostics. Diagnostics (Basel) 2023; 13:diagnostics13081474. [PMID: 37189574 DOI: 10.3390/diagnostics13081474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 04/07/2023] [Accepted: 04/17/2023] [Indexed: 05/17/2023] Open
Abstract
Malignant melanoma is the most invasive skin cancer and is currently regarded as one of the deadliest disorders; however, it can be cured more successfully if detected and treated early. Recently, CAD (computer-aided diagnosis) systems have emerged as a powerful alternative tool for the automatic detection and categorization of skin lesions, such as malignant melanoma or benign nevus, in given dermoscopy images. In this paper, we propose an integrated CAD framework for rapid and accurate melanoma detection in dermoscopy images. Initially, an input dermoscopy image is pre-processed by using a median filter and bottom-hat filtering for noise reduction, artifact removal, and, thus, enhancing the image quality. After this, each skin lesion is described by an effective skin lesion descriptor with high discrimination and descriptiveness capabilities, which is constructed by calculating the HOG (Histogram of Oriented Gradient) and LBP (Local Binary Patterns) and their extensions. After feature selection, the lesion descriptors are fed into three supervised machine learning classification models, namely SVM (Support Vector Machine), kNN (k-Nearest Neighbors), and GAB (Gentle AdaBoost), to diagnostically classify melanocytic skin lesions into one of two diagnostic categories, melanoma or nevus. Experimental results achieved using 10-fold cross-validation on the publicly available MED-NODEE dermoscopy image dataset demonstrate that the proposed CAD framework performs either competitively or superiorly to several state-of-the-art methods with stronger training settings in relation to various diagnostic metrics, such as accuracy (94%), specificity (92%), and sensitivity (100%).
Collapse
Affiliation(s)
- Samy Bakheet
- Faculty of Computers and Artificial Intelligence, Sohag University, Sohag 82524, Egypt
- Institute for Information Technology and Communications (IIKT), Otto-von-Guericke-University Magdeburg, D-39106 Magdeburg, Germany
| | - Shtwai Alsubai
- College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, Al Kharj 11942, Saudi Arabia
| | - Aml El-Nagar
- Faculty of Computers and Artificial Intelligence, Sohag University, Sohag 82524, Egypt
| | - Abdullah Alqahtani
- College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, Al Kharj 11942, Saudi Arabia
| |
Collapse
|
25
|
Abstract
Configurational comparative methods (CCMs) and logic regression methods (LRMs) are two families of exploratory methods that employ very different techniques to analyze data generated by causal structures featuring conjunctural causation and equifinality. Aiming for the same by different means carries a substantive synergy potential, which, however, remains untapped so far because representatives of the two frameworks know little of each other. The purpose of this article is to change that. We first level the field for readers from both backgrounds by providing brief introductions to the basic ideas behind CCMs and LRMs. Then, we carve out the strengths and weaknesses of the two method families by benchmarking their performance when applied to binary data under a variety of different discovery contexts. It turns out that CCMs and LRMs have complementary strengths and weaknesses. This creates various promising avenues for cross-validation.
Collapse
|
26
|
González-Brignardello MP, Sánchez-Elvira Paniagua Á. Dimensional Structure of MAPS-15: Validation of the Multidimensional Academic Procrastination Scale. Int J Environ Res Public Health 2023; 20:3201. [PMID: 36833895 PMCID: PMC9965915 DOI: 10.3390/ijerph20043201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 02/03/2023] [Accepted: 02/05/2023] [Indexed: 06/18/2023]
Abstract
Academic procrastination is a complex behavior that hampers the cyclical process of self-regulation in learning, impeding the flow of actions necessary to achieve the goals and sub-goals that students have set out to attain. It has a high frequency of occurrence and has been linked to lessened student performance and a decrease in psychological and physical well-being. The objective of this study is to analyze the psychometric characteristics of a new academic procrastination scale MAPS-15 (Multidimensional Academic Procrastination Scale) applicable in self-regulated learning environments through a cross-validation study (exploratory factor analysis and confirmatory factor analysis). The sample consisted of 1289 students from a distance/online university, with a wide age range and sociocultural variability. The students completed self-reported online questionnaires on two dates: during the university access and adaptation phase and before the first period of compulsory exams. One-, two- and three-factor structures were tested as well as a second-order structure. The results support a three-dimensional structure of MAPS-15: core procrastination, a pure dimension of procrastinating behavior and difficulty in carrying out the action; poor time management, a dimension related to time organization and perceived control over time; and work disconnection, a dimension conceptually related to lack of persistence, and work interruptions.
Collapse
Affiliation(s)
- Marcela Paz González-Brignardello
- Department of Personality Psychology, Psychological Assessment and Treatment, Faculty of Psychology, Universidad Nacional de Educación a Distancia (UNED), 28040 Madrid, Spain
| | | |
Collapse
|
27
|
Hong F, Tian L, Devanarayan V. Improving the Robustness of Variable Selection and Predictive Performance of Regularized Generalized Linear Models and Cox Proportional Hazard Models. Mathematics (Basel) 2023; 11:557. [PMID: 37990696 PMCID: PMC10660556 DOI: 10.3390/math11030557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
High-dimensional data applications often entail the use of various statistical and machine-learning algorithms to identify an optimal signature based on biomarkers and other patient characteristics that predicts the desired clinical outcome in biomedical research. Both the composition and predictive performance of such biomarker signatures are critical in various biomedical research applications. In the presence of a large number of features, however, a conventional regression analysis approach fails to yield a good prediction model. A widely used remedy is to introduce regularization in fitting the relevant regression model. In particular, a L 1 penalty on the regression coefficients is extremely useful, and very efficient numerical algorithms have been developed for fitting such models with different types of responses. This L 1 -based regularization tends to generate a parsimonious prediction model with promising prediction performance, i.e., feature selection is achieved along with construction of the prediction model. The variable selection, and hence the composition of the signature, as well as the prediction performance of the model depend on the choice of the penalty parameter used in the L 1 regularization. The penalty parameter is often chosen by K-fold cross-validation. However, such an algorithm tends to be unstable and may yield very different choices of the penalty parameter across multiple runs on the same dataset. In addition, the predictive performance estimates from the internal cross-validation procedure in this algorithm tend to be inflated. In this paper, we propose a Monte Carlo approach to improve the robustness of regularization parameter selection, along with an additional cross-validation wrapper for objectively evaluating the predictive performance of the final model. We demonstrate the improvements via simulations and illustrate the application via a real dataset.
Collapse
Affiliation(s)
- Feng Hong
- Takeda Pharmaceuticals, Cambridge, MA 02139, USA
| | - Lu Tian
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Viswanath Devanarayan
- Eisai Inc., Nutley, NJ 07110, USA
- Department of Mathematics, Statistics, and Computer Science, University of Illinois Chicago, Chicago, IL 60607, USA
| |
Collapse
|
28
|
da Costa RF, Silva AM, Masset KVDSB, Cesário TDM, Cabral BGDAT, Ferrari G, Dantas PMS. Corrigendum: Development and cross-validation of a predictive equation for fat-free mass in Brazilian adolescents by bioelectrical impedance. Front Nutr 2023; 9:1128979. [PMID: 36712543 PMCID: PMC9880769 DOI: 10.3389/fnut.2022.1128979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 12/30/2022] [Indexed: 01/15/2023] Open
Abstract
[This corrects the article DOI: 10.3389/fnut.2022.820736.].
Collapse
Affiliation(s)
- Roberto Fernandes da Costa
- Physical Education Department, Health Sciences Center, Federal University of Rio Grande do Norte, Natal, Brazil,*Correspondence: Roberto Fernandes da Costa ✉
| | - Analiza M. Silva
- Exercise and Health Laboratory, CIPER, Faculdade Motricidade Humana, Universidade de Lisboa, Lisbon, Portugal
| | | | - Tatianny de Macêdo Cesário
- Physical Education Department, Health Sciences Center, Federal University of Rio Grande do Norte, Natal, Brazil
| | | | - Gerson Ferrari
- Escuela de Ciencias de la Actividad Física, el Deporte y la Salud, Universidad de Santiago de Chile (USACH), Santiago, Chile,Grupo de Estudio en Educación, Laboratorio de Rendimiento Humano, Actividad Física y Salud (GEEAFyS), Universidad Católica del Maule, Talca, Chile
| | - Paulo Moreira Silva Dantas
- Physical Education Department, Health Sciences Center, Federal University of Rio Grande do Norte, Natal, Brazil
| |
Collapse
|
29
|
Yoneda K, Amari S, Mikami M, Uchida K, Yokoi A, Okawada M, Furukawa T, Toyoshima K, Inamura N, Okazaki T, Yamoto M, Masumoto K, Terui K, Okuyama H, Hayakawa M, Taguchi T, Usui N, Isayama T. Development of mortality prediction models for infants with isolated, left-sided congenital diaphragmatic hernia before and after birth. Pediatr Pulmonol 2023; 58:152-160. [PMID: 36174997 DOI: 10.1002/ppul.26172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 08/29/2022] [Accepted: 09/27/2022] [Indexed: 01/11/2023]
Abstract
BACKGROUND Mortality prediction of congenital diaphragmatic hernia (CDH) is essential for developing treatment strategies, including fetal therapy. Several researchers have reported prognostic factors for this rare but life-threatening condition; however, the optimal combination of prognostic factors remains to be elucidated. OBJECTIVES This study aimed to develop the most discriminative prenatal and postnatal models to predict the mortality of infants with an isolated left-sided CDH. METHODS This multi-institutional retrospective cohort study included infants with CDH born at 15 tertiary hospitals of the Japanese CDH Study Group between 2011 and 2016. We developed multivariable logistic models with every possible combination of predictors and identified models with the highest cross-validated area under the receiver operating characteristic curve (AUC) for prenatal and postnatal predictions. RESULTS Among 302 eligible infants, 44 died before discharge. The prenatal mortality prediction model was based on the observed/expected lung area to head circumference ratio (O/E LHR), liver herniation, and stomach herniation (AUC, 0.830). The postnatal mortality prediction model was based on O/E LHR, liver herniation, and the lowest oxygenation index (AUC, 0.944). CONCLUSION Our models can facilitate the prenatal and postnatal mortality prediction of infants with isolated left-sided CDH.
Collapse
Affiliation(s)
- Kota Yoneda
- Division of Neonatology, Center for Maternal-Fetal, Neonatal and Reproductive Medicine, National Center for Child Health and Development, Setagaya-ku, Japan
| | - Shoichiro Amari
- Division of Neonatology, Center for Maternal-Fetal, Neonatal and Reproductive Medicine, National Center for Child Health and Development, Setagaya-ku, Japan
| | - Masashi Mikami
- Division of Biostatistics, Clinical Research Center, National Center for Child Health and Development, Setagaya-ku, Japan
| | - Keiichi Uchida
- Department of Gastrointestinal and Pediatric Surgery, Mie University Graduate School of Medicine, Tsu, Japan
| | - Akiko Yokoi
- Department of Pediatric Surgery, Kobe Children's Hospital, Kobe, Japan
| | - Manabu Okawada
- Department of Pediatric General and Urogenital Surgery, Juntendo University School of Medicine, Tokyo, Japan
| | - Taizo Furukawa
- Department of Pediatric Surgery, Graduate School of Medical Science, Kyoto Prefectural University of Medicine, Kyoto, Japan
| | - Katsuaki Toyoshima
- Department of Neonatology, Kanagawa Children's Medical Center, Yokohama, Japan
| | - Noboru Inamura
- Department of Pediatrics, Kindai University, Faculty of Medicine, Osaka-Sayama, Japan
| | - Tadaharu Okazaki
- Department of Pediatric Surgery, Juntendo University Urayasu Hospital, Urayasu, Japan
| | - Masaya Yamoto
- Department of Pediatric Surgery, Shizuoka Children's Hospital, Shizuoka, Japan
| | - Kouji Masumoto
- Department of Pediatric Surgery, Faculty of Medicine, University of Tsukuba, Tsukuba, Japan
| | - Keita Terui
- Department of Pediatric Surgery, Graduate School of Medicine, Chiba University, Chiba, Japan
| | - Hiroomi Okuyama
- Department of Pediatric Surgery, Osaka University Graduate School of Medicine, Suita, Japan
| | - Masahiro Hayakawa
- Center for Maternal-Neonatal Care, Nagoya University Hospital, Nagoya, Japan
| | - Tomoaki Taguchi
- Department of Pediatric Surgery, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan.,Fukuoka College of Health Sciences, Fukuoka, Japan
| | - Noriaki Usui
- Department of Pediatric Surgery, Osaka Women's and Children's Hospital, Izumi, Japan
| | - Tetsuya Isayama
- Division of Neonatology, Center for Maternal-Fetal, Neonatal and Reproductive Medicine, National Center for Child Health and Development, Setagaya-ku, Japan
| |
Collapse
|
30
|
Angarita Barajas BK, Cantet RJC, Steibel JP, Schrauf MF, Forneris NS. Heritability estimates and predictive ability for pig meat quality traits using identity-by-state and identity-by-descent relationships in an F 2 population. J Anim Breed Genet 2023; 140:13-27. [PMID: 36300585 DOI: 10.1111/jbg.12742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 10/05/2022] [Indexed: 12/13/2022]
Abstract
Genomic relationships can be computed with dense genome-wide genotypes through different methods, either based on identity-by-state (IBS) or identity-by-descent (IBD). The latter has been shown to increase the accuracy of both estimated relationships and predicted breeding values. However, it is not clear whether an IBD approach would achieve greater heritability ( h 2 ) and predictive ability ( r ̂ y , y ̂ ) than its IBS counterpart for data with low-depth pedigrees. Here, we compare both approaches in terms of the estimated of h 2 and r ̂ y , y ̂ , using data on meat quality and carcass traits recorded in experimental crossbred pigs, with a pedigree constrained to only three generations. Three animal models were fitted which differed on the relationship matrix: an IBS model ( G IBS ), an IBD (defined within the known pedigree) model ( G IBD ), and a pedigree model ( A 22 ). In 9 of 20 traits, the range of increase for the estimates of σ u 2 and h 2 was 1.2-2.9 times greater with G IBS and G IBD models than with A 22 . Whereas for all traits, both parameters were similar between genomic models. The r ̂ y , y ̂ of the genomic models was higher compared to A 22 . A scarce increment in r ̂ y , y ̂ was found with G IBS when compared to G IBD , most likely due to the former recovering sizeable relationships among founder F0 animals.
Collapse
Affiliation(s)
| | - Rodolfo J C Cantet
- Instituto de Investigaciones en Producción Animal (INPA-CONICET-UBA), Buenos Aires, Argentina.,Departamento de Producción Animal, Facultad de Agronomía, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Juan P Steibel
- Department of Animal Science, Michigan State University, East Lansing, Michigan, USA.,Department of Fisheries and Wildlife, Michigan State University, East Lansing, Michigan, USA
| | - Matias F Schrauf
- Departamento de Métodos Cuantitativos y Sistemas de Información, Facultad de Agronomía, Universidad de Buenos Aires, Buenos Aires, Argentina.,Animal Breeding & Genomics, Wageningen Livestock Research, Wageningen University & Research, Wageningen, The Netherlands
| | - Natalia S Forneris
- Instituto de Investigaciones en Producción Animal (INPA-CONICET-UBA), Buenos Aires, Argentina.,Departamento de Producción Animal, Facultad de Agronomía, Universidad de Buenos Aires, Buenos Aires, Argentina
| |
Collapse
|
31
|
Rudolph KE, Díaz I. When the Ends do not Justify the Means: Learning Who is Predicted to Have Harmful Indirect Effects. J R Stat Soc Ser A Stat Soc 2022; 185:S573-S589. [PMID: 37397280 PMCID: PMC10312488 DOI: 10.1111/rssa.12951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
There is a growing literature on finding rules by which to assign treatment based on an individual's characteristics such that a desired outcome under the intervention is maximized. A related goal entails identifying a subpopulation of individuals predicted to have a harmful indirect effect (the effect of treatment on an outcome through mediators), perhaps even in the presence of a predicted beneficial total treatment effect. In some cases, the implications of a likely harmful indirect effect may outweigh an anticipated beneficial total treatment effect, and would motivate further discussion of whether to treat identified individuals. We build on the mediation and optimal treatment rule literatures to propose a method of identifying a subgroup for which the treatment effect through the mediator is expected to be harmful. Our approach is nonparametric, incorporates post-treatment confounders of the mediator-outcome relationship, and does not make restrictions on the distribution of baseline covariates, mediating variables, or outcomes. We apply the proposed approach to identify a subgroup of boys in the MTO housing voucher experiment who are predicted to have a harmful indirect effect of housing voucher receipt on subsequent psychiatric disorder incidence through aspects of their school and neighborhood environments.
Collapse
Affiliation(s)
- Kara E Rudolph
- Department of Epidemiology, Mailman School of Public Health, Columbia University
| | - Iván Díaz
- Division of Biostatistics, Department of Population Health Sciences, Weill Cornell Medicine
| |
Collapse
|
32
|
Li Y, Xu C, Han J, An Z, Wang D, Ma H, Liu C. MHAU-Net: Skin Lesion Segmentation Based on Multi-Scale Hybrid Residual Attention Network. Sensors (Basel) 2022; 22:8701. [PMID: 36433298 PMCID: PMC9695536 DOI: 10.3390/s22228701] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 11/03/2022] [Accepted: 11/07/2022] [Indexed: 06/16/2023]
Abstract
Melanoma is a main factor that leads to skin cancer, and early diagnosis and treatment can significantly reduce the mortality of patients. Skin lesion boundary segmentation is a key to accurately localizing a lesion in dermoscopic images. However, the irregular shape and size of the lesions and the blurred boundary of the lesions pose significant challenges for researchers. In recent years, pixel-level semantic segmentation strategies based on convolutional neural networks have been widely used, but many methods still suffer from the inaccurate segmentation of fuzzy boundaries. In this paper, we proposed a multi-scale hybrid attentional convolutional neural network (MHAU-Net) for the precise localization and segmentation of skin lesions. MHAU-Net has four main components: multi-scale resolution input, hybrid residual attention (HRA), dilated convolution, and atrous spatial pyramid pooling. Multi-scale resolution inputs provide richer visual information, and HRA solves the problem of blurred boundaries and enhances the segmentation results. The Dice, mIoU, average specificity, and sensitivity on the ISIC2018 task 1 validation set were 93.69%, 90.02%, 92.7% and 93.9%, respectively. The segmentation metrics are significantly better than the latest DCSAU-Net, UNeXt, and U-Net, and excellent segmentation results are achieved on different datasets. We performed model robustness validations on the Kvasir-SEG dataset with an overall sensitivity and average specificity of 95.91% and 96.28%, respectively.
Collapse
Affiliation(s)
- Yingjie Li
- School of Integrated Circuits, Anhui University, Hefei 230601, China
- Anhui Engineering Laboratory of Agro-Ecological Big Data, Hefei 230601, China
| | - Chao Xu
- School of Integrated Circuits, Anhui University, Hefei 230601, China
- Anhui Engineering Laboratory of Agro-Ecological Big Data, Hefei 230601, China
| | - Jubao Han
- School of Integrated Circuits, Anhui University, Hefei 230601, China
- Anhui Engineering Laboratory of Agro-Ecological Big Data, Hefei 230601, China
| | - Ziheng An
- School of Integrated Circuits, Anhui University, Hefei 230601, China
- Anhui Engineering Laboratory of Agro-Ecological Big Data, Hefei 230601, China
| | - Deyu Wang
- School of Integrated Circuits, Anhui University, Hefei 230601, China
- Anhui Engineering Laboratory of Agro-Ecological Big Data, Hefei 230601, China
| | - Haichao Ma
- School of Integrated Circuits, Anhui University, Hefei 230601, China
- Anhui Engineering Laboratory of Agro-Ecological Big Data, Hefei 230601, China
| | - Chuanxu Liu
- School of Integrated Circuits, Anhui University, Hefei 230601, China
- Anhui Engineering Laboratory of Agro-Ecological Big Data, Hefei 230601, China
| |
Collapse
|
33
|
Meyer JA, DeChenne S, Foerder CA, Hengel SM. Bioanalysis of tucatinib and metabolite, and a five-way cross-validation to support clinical pharmacokinetic analysis. Bioanalysis 2022; 14:1443-52. [PMID: 36651218 DOI: 10.4155/bio-2022-0199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Tucatinib, a tyrosine kinase inhibitor of HER2, is approved in multiple regions for metastatic breast cancer and is being evaluated in metastatic colorectal and gastric cancers. During clinical development, quantification of tucatinib plasma concentrations for pharmacokinetic analysis was performed using MS/MS analysis by three laboratories using five different methods. Cross-validation was required to confirm data across laboratories were comparable. A five-way cross-validation procedure was developed where bioanalysis performed by one laboratory and method was used as a 'base' against which the other methods were validated. This cross-validation method provides an alternative to multiple head-to-head comparisons between two methods, and enabled combination of data from multiple tucatinib clinical trials for a single population pharmacokinetic analysis.
Collapse
|
34
|
Zhou Z, Cheng Q. Measuring Online Social Support: Development and Validation of a Short Form for Chinese Adolescents. Int J Environ Res Public Health 2022; 19:ijerph192114058. [PMID: 36360936 PMCID: PMC9656139 DOI: 10.3390/ijerph192114058] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 10/23/2022] [Accepted: 10/26/2022] [Indexed: 05/28/2023]
Abstract
Supportive interactions on social media have great potential to benefit adolescents' development. However, there is no instrument to measure online social support (OSS) in China. The study aimed to develop and validate a Chinese short version of the Online Social Support Scale (OSSS). The original scale was translated into Chinese through multiple forward and backward translation protocols. The calibration sample (N = 262) was used to select items and test the reliability, validity, and internal structure of the short form. The cross-validation sample (N = 267) was then used to assess measurement invariance by multigroup confirmatory factor analysis and examine criterion validity based on its relationships with life satisfaction, depression, and time on social media. The 20-item Chinese short version of OSSS (OSSS-CS) includes four factors: esteem/emotional support, social companionship, informational support, and instrumental support. Our results suggest that the OSSS-CS has high internal consistency, construct validity, and criterion validity. Furthermore, evidence of partial cross-validity demonstrated invariance of the variance-covariance matrices, factor structure, factor loadings, and factor variance across independent samples. The results also revealed that the original OSSS could be replicated across cultures. Finally, the short form developed in the study can be used as a reliable and valid measure of online social support among the Chinese adolescent population.
Collapse
|
35
|
Saleem MH, Potgieter J, Arif KM. A weight optimization-based transfer learning approach for plant disease detection of New Zealand vegetables. Front Plant Sci 2022; 13:1008079. [PMID: 36388538 PMCID: PMC9641257 DOI: 10.3389/fpls.2022.1008079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Accepted: 09/22/2022] [Indexed: 06/16/2023]
Abstract
Deep learning (DL) is an effective approach to identifying plant diseases. Among several DL-based techniques, transfer learning (TL) produces significant results in terms of improved accuracy. However, the usefulness of TL has not yet been explored using weights optimized from agricultural datasets. Furthermore, the detection of plant diseases in different organs of various vegetables has not yet been performed using a trained/optimized DL model. Moreover, the presence/detection of multiple diseases in vegetable organs has not yet been investigated. To address these research gaps, a new dataset named NZDLPlantDisease-v2 has been collected for New Zealand vegetables. The dataset includes 28 healthy and defective organs of beans, broccoli, cabbage, cauliflower, kumara, peas, potato, and tomato. This paper presents a transfer learning method that optimizes weights obtained through agricultural datasets for better outcomes in plant disease identification. First, several DL architectures are compared to obtain the best-suited model, and then, data augmentation techniques are applied. The Faster Region-based Convolutional Neural Network (RCNN) Inception ResNet-v2 attained the highest mean average precision (mAP) compared to the other DL models including different versions of Faster RCNN, Single-Shot Multibox Detector (SSD), Region-based Fully Convolutional Networks (RFCN), RetinaNet, and EfficientDet. Next, weight optimization is performed on datasets including PlantVillage, NZDLPlantDisease-v1, and DeepWeeds using image resizers, interpolators, initializers, batch normalization, and DL optimizers. Updated/optimized weights are then used to retrain the Faster RCNN Inception ResNet-v2 model on the proposed dataset. Finally, the results are compared with the model trained/optimized using a large dataset, such as Common Objects in Context (COCO). The final mAP improves by 9.25% and is found to be 91.33%. Moreover, the robustness of the methodology is demonstrated by testing the final model on an external dataset and using the stratified k-fold cross-validation method.
Collapse
Affiliation(s)
- Muhammad Hammad Saleem
- Department of Mechanical and Electrical Engineering, School of Food and Advanced Technology, Massey University, Auckland, New Zealand
| | - Johan Potgieter
- Massey AgriFood Digital Lab, Massey University, Palmerston North, New Zealand
| | - Khalid Mahmood Arif
- Department of Mechanical and Electrical Engineering, School of Food and Advanced Technology, Massey University, Auckland, New Zealand
| |
Collapse
|
36
|
Muthudoss P, Tewari I, Chi RLR, Young KJ, Ann EYC, Hui DNS, Khai OY, Allada R, Rao M, Shahane S, Das S, Babla I, Mhetre S, Paudel A. Machine Learning-Enabled NIR Spectroscopy in Assessing Powder Blend Uniformity: Clear-Up Disparities and Biases Induced by Physical Artefacts. AAPS PharmSciTech 2022; 23:277. [PMID: 36229571 DOI: 10.1208/s12249-022-02403-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 08/22/2022] [Indexed: 11/30/2022] Open
Abstract
NIR spectroscopy is a non-destructive characterization tool for the blend uniformity (BU) assessment. However, NIR spectra of powder blends often contain overlapping physical and chemical information of the samples. Deconvoluting the information related to chemical properties from that associated with the physical effects is one of the major objectives of this work. We achieve this aim in two ways. Firstly, we identified various sources of variability that might affect the BU results. Secondly, we leverage the machine learning-based sophisticated data analytics processes. To accomplish the aforementioned objectives, calibration samples of amlodipine as an active pharmaceutical ingredient (API) with the concentrations ranging between 67 and 133% w/w (dose ~ 3.6% w/w), in powder blends containing excipients, were prepared using a gravimetric approach and assessed using NIR spectroscopic analysis, followed by HPLC measurements. The bias in NIR results was investigated by employing data quality metrics (DQM) and bias-variance decomposition (BVD). To overcome the bias, the clustered regression (non-parametric and linear) was applied. We assessed the model's performance by employing the hold-out and k-fold internal cross-validation (CV). NIR-based blend homogeneity with low mean absolute error and an interval estimates of 0.674 (mean) ± 0.218 (standard deviation) w/w was established. Additionally, bootstrapping-based CV was leveraged as part of the NIR method lifecycle management that demonstrated the mean absolute error (MAE) of BU ± 3.5% w/w and BU ± 1.5% w/w for model generalizability and model transferability, respectively. A workflow integrating machine learning to NIR spectral analysis was established and implemented. Impact of various data learning approaches on NIR spectral data.
Collapse
Affiliation(s)
- Prakash Muthudoss
- Oncogen Pharma (Malaysia), Sdn Bhd, 3, Jalan Jururancang U1/21, Hicom-glenmarie Industrial Park, 40150, Shah Alam, Selangor, Malaysia.,A2Z4.0 Research and Analytics Private Limited, Old No:810, New No:62, CTH Road, Behind Lenskart, Thirumullaivoil, Chennai, Tamilnadu, India
| | - Ishan Tewari
- The Machine Learning Company, Beed, Maharashtra, India.,Institute of Technology, Nirma University, Ahmedabad, Gujarat, India
| | - Rayce Lim Rui Chi
- Oncogen Pharma (Malaysia), Sdn Bhd, 3, Jalan Jururancang U1/21, Hicom-glenmarie Industrial Park, 40150, Shah Alam, Selangor, Malaysia
| | - Kwok Jia Young
- Oncogen Pharma (Malaysia), Sdn Bhd, 3, Jalan Jururancang U1/21, Hicom-glenmarie Industrial Park, 40150, Shah Alam, Selangor, Malaysia
| | - Eddy Yii Chung Ann
- Oncogen Pharma (Malaysia), Sdn Bhd, 3, Jalan Jururancang U1/21, Hicom-glenmarie Industrial Park, 40150, Shah Alam, Selangor, Malaysia
| | - Doreen Ng Sean Hui
- Oncogen Pharma (Malaysia), Sdn Bhd, 3, Jalan Jururancang U1/21, Hicom-glenmarie Industrial Park, 40150, Shah Alam, Selangor, Malaysia
| | - Ooi Yee Khai
- Perkin Elmer Sdn Bhd, L2, 2-01, Wisma Academy, Jalan 19/1, Seksyen 19, 46300, Petaling Jaya, Selangor, Malaysia
| | - Ravikiran Allada
- Novugen Pharma (Malaysia), Sdn Bhd, 3, Jalan Jururancang U1/21, Hicom-glenmarie Industrial Park, 40150, Shah Alam, Selangor, Malaysia
| | - Manohar Rao
- PerkinElmer (India) Private Limited, Vayudooth Chambers, 12th floor, Trinity Circle, Mahatma Gandhi Rd, Bengaluru, Karnataka, 560001, India
| | | | - Samir Das
- Oncogen Pharma (Malaysia), Sdn Bhd, 3, Jalan Jururancang U1/21, Hicom-glenmarie Industrial Park, 40150, Shah Alam, Selangor, Malaysia
| | - Irfan Babla
- Oncogen Pharma (Malaysia), Sdn Bhd, 3, Jalan Jururancang U1/21, Hicom-glenmarie Industrial Park, 40150, Shah Alam, Selangor, Malaysia
| | - Sandeep Mhetre
- Oncogen Pharma (Malaysia), Sdn Bhd, 3, Jalan Jururancang U1/21, Hicom-glenmarie Industrial Park, 40150, Shah Alam, Selangor, Malaysia
| | - Amrit Paudel
- Research Center Pharmaceutical Engineering GmbH (RCPE), Inffeldgasse 13, 8010, Graz, Austria. .,Institute of Process and Particle Engineering, Graz University of Technology, Inffeldgasse 13/3, 8010, Graz, Austria.
| |
Collapse
|
37
|
Boileau P, Hejazi NS, van der Laan MJ, Dudoit S. Cross-Validated Loss-Based Covariance Matrix Estimator Selection in High Dimensions. J Comput Graph Stat 2022; 32:601-612. [PMID: 37273839 PMCID: PMC10237052 DOI: 10.1080/10618600.2022.2110883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2021] [Accepted: 07/28/2022] [Indexed: 10/15/2022]
Abstract
The covariance matrix plays a fundamental role in many modern exploratory and inferential statistical procedures, including dimensionality reduction, hypothesis testing, and regression. In low-dimensional regimes, where the number of observations far exceeds the number of variables, the optimality of the sample covariance matrix as an estimator of this parameter is well-established. High-dimensional regimes do not admit such a convenience. Thus, a variety of estimators have been derived to overcome the shortcomings of the canonical estimator in such settings. Yet, selecting an optimal estimator from among the plethora available remains an open challenge. Using the framework of cross-validated loss-based estimation, we develop the theoretical underpinnings of just such an estimator selection procedure. We propose a general class of loss functions for covariance matrix estimation and establish accompanying finite-sample risk bounds and conditions for the asymptotic optimality of the cross-validation selector. In numerical experiments, we demonstrate the optimality of our proposed selector in moderate sample sizes and across diverse data-generating processes. The practical benefits of our procedure are highlighted in a dimension reduction application to single-cell transcriptome sequencing data.
Collapse
Affiliation(s)
- Philippe Boileau
- Graduate Group in Biostatistics and Center for Computational Biology, UC Berkeley
| | - Nima S. Hejazi
- Division of Biostatistics, Department of Population Health Sciences, Weill Cornell Medicine
| | - Mark J. van der Laan
- Division of Biostatistics, Department of Statistics, and Center for Computational Biology, UC Berkeley
| | - Sandrine Dudoit
- Department of Statistics, Division of Biostatistics, and Center for Computational Biology, UC Berkeley
| |
Collapse
|
38
|
Prevodnik K, Trkman M, Grošelj D, Bartol J, Petrovčič A. An Assessment of the Structural Validity and Measurement Invariance of the Web-Use Skills Scale for Aging Internet Users. Cyberpsychol Behav Soc Netw 2022; 25:657-665. [PMID: 36130141 DOI: 10.1089/cyber.2022.0023] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Research on digital inequality has found that aging adults are often at risk of digital exclusion. Understanding the validity of survey measures assessing Internet skills in this population is critical to providing the high-quality data needed for effective digital inclusion policy interventions. This cross-validation study examines the structural validity and measurement invariance (across age, gender, and education groups) of the Web-Use Skills scale (WUS), which is commonly used as a proxy measure of Internet skills. We tested the 14-item version of the WUS. The scale was translated into the Slovenian language and pretested with older Internet users. Data were collected from two independent samples of Internet users aged 50+ years (N1 = 259 and N2 = 256) drawn from an online opt-in panel in Slovenia. The examination of structural validity confirmed that the WUS adequately reflects the one-factor structure of the web-use skills construct, although in a shorter six-item form. Moreover, the analysis confirmed strict measurement invariance between the two samples and, at least, scalar invariance between age, gender, and education groups. The results support the applicability of WUS in cross-group comparisons of Internet skills in the population of aging Internet users and point to several opportunities for future work.
Collapse
Affiliation(s)
- Katja Prevodnik
- Centre for Social Informatics, Faculty of Social Sciences, University of Ljubljana, Ljubljana, Slovenia
| | - Marina Trkman
- Centre for Social Informatics, Faculty of Social Sciences, University of Ljubljana, Ljubljana, Slovenia
| | - Darja Grošelj
- Centre for Social Informatics, Faculty of Social Sciences, University of Ljubljana, Ljubljana, Slovenia
| | - Jošt Bartol
- Centre for Social Informatics, Faculty of Social Sciences, University of Ljubljana, Ljubljana, Slovenia
- Faculty of Arts, University of Ljubljana, Ljubljana, Slovenia
| | - Andraž Petrovčič
- Centre for Social Informatics, Faculty of Social Sciences, University of Ljubljana, Ljubljana, Slovenia
| |
Collapse
|
39
|
Morzywołek P, Steen J, Van Biesen W, Decruyenaere J, Vansteelandt S. On estimation and cross-validation of dynamic treatment regimes with competing risks. Stat Med 2022; 41:5258-5275. [PMID: 36055675 DOI: 10.1002/sim.9568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 06/09/2022] [Accepted: 07/11/2022] [Indexed: 11/12/2022]
Abstract
The optimal moment to start renal replacement therapy in a patient with acute kidney injury (AKI) remains a challenging problem in intensive care nephrology. Multiple randomized controlled trials have tried to answer this question, but these contrast only a limited number of treatment initiation strategies. In view of this, we use routinely collected observational data from the Ghent University Hospital intensive care units (ICUs) to investigate different prespecified timing strategies for renal replacement therapy initiation based on time-updated levels of serum potassium, pH, and fluid balance in critically ill patients with AKI with the aim to minimize 30-day ICU mortality. For this purpose, we apply statistical techniques for evaluating the impact of specific dynamic treatment regimes in the presence of ICU discharge as a competing event. We discuss two approaches, a nonparametric one - using an inverse probability weighted Aalen-Johansen estimator - and a semiparametric one - using dynamic-regime marginal structural models. Furthermore, we suggest an easy to implement cross-validation technique to assess the out-of-sample performance of the optimal dynamic treatment regime. Our work illustrates the potential of data-driven medical decision support based on routinely collected observational data.
Collapse
Affiliation(s)
- Paweł Morzywołek
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Johan Steen
- Department of Internal Medicine and Pediatrics, Ghent University, Ghent, Belgium.,Renal Division, Ghent University Hospital, Ghent, Belgium.,Department of Intensive Care Medicine, Ghent University Hospital, Ghent, Belgium
| | - Wim Van Biesen
- Department of Internal Medicine and Pediatrics, Ghent University, Ghent, Belgium.,Renal Division, Ghent University Hospital, Ghent, Belgium
| | - Johan Decruyenaere
- Department of Internal Medicine and Pediatrics, Ghent University, Ghent, Belgium.,Department of Intensive Care Medicine, Ghent University Hospital, Ghent, Belgium
| | - Stijn Vansteelandt
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.,Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK
| |
Collapse
|
40
|
Jauhiainen S, Kauppi JP, Krosshaug T, Bahr R, Bartsch J, Äyrämö S. Predicting ACL Injury Using Machine Learning on Data From an Extensive Screening Test Battery of 880 Female Elite Athletes. Am J Sports Med 2022; 50:2917-2924. [PMID: 35984748 PMCID: PMC9442771 DOI: 10.1177/03635465221112095] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
BACKGROUND Injury risk prediction is an emerging field in which more research is needed to recognize the best practices for accurate injury risk assessment. Important issues related to predictive machine learning need to be considered, for example, to avoid overinterpreting the observed prediction performance. PURPOSE To carefully investigate the predictive potential of multiple predictive machine learning methods on a large set of risk factor data for anterior cruciate ligament (ACL) injury; the proposed approach takes into account the effect of chance and random variations in prediction performance. STUDY DESIGN Case-control study; Level of evidence, 3. METHODS The authors used 3-dimensional motion analysis and physical data collected from 791 female elite handball and soccer players. Four common classifiers were used to predict ACL injuries (n = 60). Area under the receiver operating characteristic curve (AUC-ROC) averaged across 100 cross-validation runs (mean AUC-ROC) was used as a performance metric. Results were confirmed with repeated permutation tests (paired Wilcoxon signed-rank-test; P < .05). Additionally, the effect of the most common class imbalance handling techniques was evaluated. RESULTS For the best classifier (linear support vector machine), the mean AUC-ROC was 0.63. Regardless of the classifier, the results were significantly better than chance, confirming the predictive ability of the data and methods used. AUC-ROC values varied substantially across repetitions and methods (0.51-0.69). Class imbalance handling did not improve the results. CONCLUSION The authors' approach and data showed statistically significant predictive ability, indicating that there exists information in this prospective data set that may be valuable for understanding injury causation. However, the predictive ability remained low from the perspective of clinical assessment, suggesting that included variables cannot be used for ACL prediction in practice.
Collapse
Affiliation(s)
- Susanne Jauhiainen
- Faculty of Information Technology,
University of Jyväskylä, Jyväskylä, Finland,Susanne Jauhiainen, MSc,
Faculty of Information Technology, University of Jyväskylä, PO Box 35, FI-40014,
Jyväskylä, Finland (
)
| | - Jukka-Pekka Kauppi
- Faculty of Information Technology,
University of Jyväskylä, Jyväskylä, Finland
| | - Tron Krosshaug
- Oslo Sports Trauma Research Center,
Department of Sports Medicine, Norwegian School of Sport Sciences, Oslo,
Norway
| | - Roald Bahr
- Oslo Sports Trauma Research Center,
Department of Sports Medicine, Norwegian School of Sport Sciences, Oslo,
Norway
| | - Julia Bartsch
- Oslo Sports Trauma Research Center,
Department of Sports Medicine, Norwegian School of Sport Sciences, Oslo,
Norway
| | - Sami Äyrämö
- Faculty of Information Technology,
University of Jyväskylä, Jyväskylä, Finland
| |
Collapse
|
41
|
Lai CL, Lu HK, Huang AC, Chu LP, Chuang HY, Hsieh KC. Bioimpedance analysis combined with sagittal abdominal diameter for abdominal subcutaneous fat measurement. Front Nutr 2022; 9:952929. [PMID: 36034888 PMCID: PMC9399717 DOI: 10.3389/fnut.2022.952929] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 07/25/2022] [Indexed: 11/13/2022] Open
Abstract
Abdominal subcutaneous fat tissue (ASFT) is an independent predictor of mortality. This prospective observational study aimed to establish a rapid, safe, and convenient estimation equation for abdominal subcutaneous fat area (SFA) using bioimpedance analysis (BIA) combined with sagittal abdominal diameter (SAD). A total of 520 adult subjects were recruited and were randomly divided into 2/3 (n = 346) and 1/3 (n = 174) to form a modeling group (MG) and a validation group (VG), respectively. Each subject's abdomen was scanned using computed tomography to obtain target variables (SFACT). Predictor variables for all subjects included bioimpedance index (h2/Z), anthropometric parameters height (h), weight (W), waist circumference (WC), hip circumference (HC), and SAD, along with age and sex (male =1, female = 0). SFA estimation equation SFABIA+SAD was established for the MG using stepwise multiple regression analysis. Cross-validation was performed using VG to evaluate the performance of the SFABIA+SAD estimation equation. Stepwise multiple regression analysis was applied from the MG, including SFABIA+SAD = 49.89 + 1.09 SAD-29.90 Sex + 4.71 W-3.63 h2/Z-1.50 h (r = 0.92, SEE = 28.10 cm2, n = 346, p < 0.001). Mean differences in SFABIA+SAD relative to SFACT were -1.21 ± 21.53, 2.85 ± 27.16, and -0.98 ± 36.6 cm2 at different levels of obesity (eutrophic, overweight, obese), respectively. This study did not have a large number of samples in different fields, so it did not have completely external validity. Application of BIA combined with SAD in anthropometric parameters achieves fast, accurate and convenient SAF measurement. Results of this study provide a simple, reliable, and practical measurement that can be widely used in epidemiological studies and in measuring individual SFA.
Collapse
Affiliation(s)
- Chung-Liang Lai
- Ministry of Health and Welfare, Department of Physical Medicine and Rehabilitation, Puzi Hospital, Chiayi, Taiwan.,Department of Occupational Therapy, Asia University, Taichung, Taiwan
| | - Hsueh-Kuan Lu
- General Education Center, National Taiwan University of Sport, Taichung, Taiwan
| | - Ai-Chun Huang
- Department of Oral Hygiene, Tzu-Hui Institute of Technology, Pingtung, Taiwan
| | - Lee-Ping Chu
- Department of Orthopedics, China Medical University Hospital, Taichung, Taiwan
| | - Hsiang-Yuan Chuang
- Ministry of Health and Welfare, Department of Physical Medicine and Rehabilitation, Taichung Hospital, Taichung, Taiwan
| | - Kuen-Chang Hsieh
- Department of Research and Development, Starbia Meditek Co., Ltd., Taichung, Taiwan.,Big Data Center, National Chung-Hsing University, Taichung, Taiwan
| |
Collapse
|
42
|
Abstract
The paper considers parameter estimation in count data models using penalized likelihood methods. The motivating data consists of multiple independent count variables with a moderate sample size per variable. The data were collected during the assessment of oral reading fluency (ORF) in school-aged children. A sample of fourth-grade students were given one of ten available passages to read with these differing in length and difficulty. The observed number of words read incorrectly (WRI) is used to measure ORF. Three models are considered for WRI scores, namely the binomial, the zero-inflated binomial, and the beta-binomial. We aim to efficiently estimate passage difficulty, a quantity expressed as a function of the underlying model parameters. Two types of penalty functions are considered for penalized likelihood with respective goals of shrinking parameter estimates closer to zero or closer to one another. A simulation study evaluates the efficacy of the shrinkage estimates using Mean Square Error (MSE) as metric. Big reductions in MSE relative to unpenalized maximum likelihood are observed. The paper concludes with an analysis of the motivating ORF data.
Collapse
Affiliation(s)
- Minh Thu Bui
- Department of Mathematics, Texas Christian University, Fort Worth, TX, USA
| | - Cornelis J. Potgieter
- Department of Mathematics, Texas Christian University, Fort Worth, TX, USA
- Department of Statistics, University of Johannesburg, Johannesburg, South Africa
| | - Akihito Kamata
- Simmons School of Education, Southern Methodist University, Dallas, TX, USA
| |
Collapse
|
43
|
Király B, Hangya B. Navigating the Statistical Minefield of Model Selection and Clustering in Neuroscience. eNeuro 2022; 9:ENEURO. [PMID: 35835556 DOI: 10.1523/ENEURO.0066-22.2022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 06/16/2022] [Accepted: 06/22/2022] [Indexed: 11/21/2022] Open
Abstract
Model selection is often implicit: when performing an ANOVA, one assumes that the normal distribution is a good model of the data; fitting a tuning curve implies that an additive and a multiplicative scaler describes the behavior of the neuron; even calculating an average implicitly assumes that the data were sampled from a distribution that has a finite first statistical moment: the mean. Model selection may be explicit, when the aim is to test whether one model provides a better description of the data than a competing one. As a special case, clustering algorithms identify groups with similar properties within the data. They are widely used from spike sorting to cell type identification to gene expression analysis. We discuss model selection and clustering techniques from a statistician’s point of view, revealing the assumptions behind, and the logic that governs the various approaches. We also showcase important neuroscience applications and provide suggestions how neuroscientists could put model selection algorithms to best use as well as what mistakes should be avoided.
Collapse
|
44
|
Branescua M, Swifta S, Tuckera A. A Comparison of Convolutional Neural Networks and Traditional Feature-Based Classification Applied to Leukaemia Image Analysis. Stud Health Technol Inform 2022; 295:545-550. [PMID: 35773932 DOI: 10.3233/shti220786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The accuracy of smear test image classification is a fundamental aspect in differentiating the type of leukaemia and determining the right treatment to improve the patient's chances of survival and recovery. Image Classification has lately become a very effective tool in detecting and analysing the right type of leukaemia as each type of the disease looks differently when evaluated under microscope. This paper is evaluating and comparing the efficiency and performance of feature extraction techniques (colour descriptors and Haralick texture descriptors) and a CNN (Convolutional Neural Network) built and trained by using the TensorFlow packages for classifying leukaemia images. Extracting texture and colour features from a given set of leukaemia images through computation was successful in detecting the type of disease and the results analysed with Weka Classifiers were giving the highest accuracy of 93.58%. TensorFlow tested with Cross-Validation proves efficient in training and customising the system, but the accuracy was median 56% and was not greatly improved by addressing the class imbalance issue from the data set with SMOTE. Further studies will investigate increasing the number of images by using a segmentation and image manipulation/augmentation techniques and increasing the accuracy of CNN through the addition of the investigated traditional features.
Collapse
Affiliation(s)
- Marinela Branescua
- The Department of Computer Science, Brunel University, West London, United Kingdom
| | - Stephen Swifta
- The Department of Computer Science, Brunel University, West London, United Kingdom
| | - Allan Tuckera
- The Department of Computer Science, Brunel University, West London, United Kingdom
| |
Collapse
|
45
|
Lafit G, Meers K, Ceulemans E. A Systematic Study into the Factors that Affect the Predictive Accuracy of Multilevel VAR(1) Models. Psychometrika 2022; 87:432-476. [PMID: 34724142 DOI: 10.1007/s11336-021-09803-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 07/13/2021] [Accepted: 08/02/2021] [Indexed: 06/13/2023]
Abstract
The use of multilevel VAR(1) models to unravel within-individual process dynamics is gaining momentum in psychological research. These models accommodate the structure of intensive longitudinal datasets in which repeated measurements are nested within individuals. They estimate within-individual auto- and cross-regressive relationships while incorporating and using information about the distributions of these effects across individuals. An important quality feature of the obtained estimates pertains to how well they generalize to unseen data. Bulteel and colleagues (Psychol Methods 23(4):740-756, 2018a) showed that this feature can be assessed through a cross-validation approach, yielding a predictive accuracy measure. In this article, we follow up on their results, by performing three simulation studies that allow to systematically study five factors that likely affect the predictive accuracy of multilevel VAR(1) models: (i) the number of measurement occasions per person, (ii) the number of persons, (iii) the number of variables, (iv) the contemporaneous collinearity between the variables, and (v) the distributional shape of the individual differences in the VAR(1) parameters (i.e., normal versus multimodal distributions). Simulation results show that pooling information across individuals and using multilevel techniques prevent overfitting. Also, we show that when variables are expected to show strong contemporaneous correlations, performing multilevel VAR(1) in a reduced variable space can be useful. Furthermore, results reveal that multilevel VAR(1) models with random effects have a better predictive performance than person-specific VAR(1) models when the sample includes groups of individuals that share similar dynamics.
Collapse
Affiliation(s)
- Ginette Lafit
- Research Group of Quantitative Psychology and Individual Differences, KU Leuven - University of Leuven, Leuven, Belgium.
| | - Kristof Meers
- Research Group of Quantitative Psychology and Individual Differences, KU Leuven - University of Leuven, Leuven, Belgium
| | - Eva Ceulemans
- Research Group of Quantitative Psychology and Individual Differences, KU Leuven - University of Leuven, Leuven, Belgium
| |
Collapse
|
46
|
Chatterjee M, Roy K. Application of cross-validation strategies to avoid overestimation of performance of 2D-QSAR models for the prediction of aquatic toxicity of chemical mixtures. SAR QSAR Environ Res 2022; 33:463-484. [PMID: 35638563 DOI: 10.1080/1062936x.2022.2081255] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 05/19/2022] [Indexed: 06/15/2023]
Abstract
The quantitative structure-activity relationship (QSAR) modelling of mixtures is not as simple as that for individual chemicals, and it needs additional care to avoid overestimation of the performance. In this research, we have developed a 2D-QSAR model using only 2D interpretable and reproducible descriptors to predict the aquatic toxicity of mixtures of polar and non-polar narcotic substances present in the environment. Partial least squares (PLS) regression has been used to model the response variable (log 1/EC50 against Photobacterium phosphoreum) and the structural features of 84 binary mixtures of polar and nonpolar narcotic toxicants complying with the Organization of Economic Co-operation and Development (OECD) protocols. The model was cross-validated by mixtures-out and compounds-out cross-validation to nullify the developmental bias. The reliability of prediction of the model has been judged by the Prediction Reliability Indicator (PRI) tool using a newly designed set. The new model is robust, reproducible, extremely predictive, easily interpretable, and can be used for reliable prediction of aquatic toxicity of any untested chemical mixtures within the applicability domain. We have additionally used a machine learning-based chemical read-across algorithm in this study to improve the quality of predictions for the toxicity of the mixtures with the modelled descriptors.
Collapse
Affiliation(s)
- M Chatterjee
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| | - K Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| |
Collapse
|
47
|
Teshigawara-Tanabe H, Hagihara M, Aoki J, Koyama S, Takahashi H, Nakajima Y, Kunimoto H, Tachibana T, Miyazaki T, Matsumoto K, Tanaka M, Yamazaki E, Fujisawa S, Kanamori H, Taguri M, Nakajima H. Clinical risk factors for patients with myelodysplastic syndromes undergoing allogeneic hematopoietic stem cell transplantation. Hematology 2022; 27:620-628. [PMID: 35621915 DOI: 10.1080/16078454.2022.2052601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
Objectives: Allogeneic hematopoietic stem cell transplantation (allo-HCT) is the only curative treatment for myelodysplastic syndromes (MDS), although predicting post-transplant outcomes remains inconclusive. This study evaluated patients who underwent allo-HCT for MDS to identify prognostic factors and develop a clinical risk model.Methods: We evaluated 55 patients between June 2000 and March 2015 to identify prognostic factors and develop a model for three-year overall survival (OS) and event-free survival (EFS). Cox regression analysis was performed on four factors: age ≥55 years; Hematopoietic Cell Transplant-Comorbidity Index >2; intermediate or worse cytogenetic status based on revised International Prognostic Scoring System; and unrelated donor status associated with poor OS in the univariate analysis. A clinical risk model was constructed using the sum of the regression coefficients and evaluated using receiver operating characteristic analysis and five-fold cross-validation.Results: Patient median age was 51 (range: 30-67) years. Median follow-up was 45.8 (range: 1.27-193) months; the three-year OS and EFS rates were 61.8% and 56.4%, respectively. The areas under the curves (AUCs) for OS and EFS were 0.738 and 0.778, respectively, and the average AUC for 50 times five-fold cross-validation were 0.711 and 0.723 for three-year OS and EFS, respectively.Conclusion: A four-clinical-risk-factor model that could effectively predict post-transplantation outcomes and help decision-making in MDS treatment was developed.
Collapse
Affiliation(s)
- Haruka Teshigawara-Tanabe
- Department of Stem Cell and Immune Regulation, Yokohama City University Graduate School of Medicine, Yokohama, Japan.,Department of Hematology and Clinical Immunology, Yokohama City University School of Medicine, Yokohama, Japan
| | - Maki Hagihara
- Department of Hematology and Clinical Immunology, Yokohama City University School of Medicine, Yokohama, Japan
| | - Jun Aoki
- Department of Hematology, Yokohama City University Medical Center, Yokohama, Japan
| | - Satoshi Koyama
- Department of Hematology and Clinical Immunology, Yokohama City University School of Medicine, Yokohama, Japan
| | - Hiroyuki Takahashi
- Department of Hematology and Clinical Immunology, Yokohama City University School of Medicine, Yokohama, Japan
| | - Yuki Nakajima
- Department of Hematology and Clinical Immunology, Yokohama City University School of Medicine, Yokohama, Japan
| | - Hiroyoshi Kunimoto
- Department of Stem Cell and Immune Regulation, Yokohama City University Graduate School of Medicine, Yokohama, Japan.,Department of Hematology and Clinical Immunology, Yokohama City University School of Medicine, Yokohama, Japan
| | | | - Takuya Miyazaki
- Department of Hematology and Clinical Immunology, Yokohama City University School of Medicine, Yokohama, Japan
| | - Kenji Matsumoto
- Department of Hematology and Clinical Immunology, Yokohama City University School of Medicine, Yokohama, Japan
| | - Masatsugu Tanaka
- Department of Hematology, Kanagawa Cancer Center, Yokohama, Japan
| | - Etsuko Yamazaki
- Department of Laboratory Medicine, Yokohama City University Hospital, Yokohama, Japan
| | - Shin Fujisawa
- Department of Hematology, Yokohama City University Medical Center, Yokohama, Japan
| | - Heiwa Kanamori
- Department of Hematology, Kanagawa Cancer Center, Yokohama, Japan
| | - Masataka Taguri
- Department of Biostatistics, Yokohama City University School of Medicine, Yokohama, Japan
| | - Hideaki Nakajima
- Department of Stem Cell and Immune Regulation, Yokohama City University Graduate School of Medicine, Yokohama, Japan.,Department of Hematology and Clinical Immunology, Yokohama City University School of Medicine, Yokohama, Japan
| |
Collapse
|
48
|
Brant SB, Hobæk Haff I. The fraud loss for selecting the model complexity in fraud detection. J Appl Stat 2022; 50:2209-2227. [PMID: 37434626 PMCID: PMC10332194 DOI: 10.1080/02664763.2022.2070137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 04/20/2022] [Indexed: 10/18/2022]
Abstract
Statistical fraud detection consists in making a system that automatically selects a subset of all cases (insurance claims, financial transactions, etc.) that are the most interesting for further investigation. The reason why such a system is needed is that the total number of cases typically is much higher than one realistically could investigate manually and that fraud tends to be quite rare. Further, the investigator is typically limited to controlling a restricted number k of cases, due to limited resources. The most efficient manner of allocating these resources is then to try selecting the k cases with the highest probability of being fraudulent. The prediction model used for this purpose must normally be regularised to avoid overfitting and consequently bad prediction performance. A loss function, denoted the fraud loss, is proposed for selecting the model complexity via a tuning parameter. A simulation study is performed to find the optimal settings for validation. Further, the performance of the proposed procedure is compared to the most relevant competing procedure, based on the area under the receiver operating characteristic curve (AUC), in a set of simulations, as well as on a credit card default dataset. Choosing the complexity of the model by the fraud loss resulted in either comparable or better results in terms of the fraud loss than choosing it according to the AUC.
Collapse
|
49
|
McGRATH E, Mahony N, Fleming N, Benavoli A, Donne B. Prediction of Functional Threshold Power from Graded Exercise Test Data in Highly-Trained Individuals. Int J Exerc Sci 2022; 15:747-759. [PMID: 35992499 PMCID: PMC9365101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The purpose of the current investigation was to derive an equation that could predict Functional Threshold Power (FTP) from Graded Exercise Test (GxT) data. The FTP test has been demonstrated to represent the highest cycling power output that can be maintained in a quasi-steady state for 60-min. Previous investigations to determine a comparable marker derived from a Graded Exercise test have had limited success to date. Consequently, the current study aimed to predict FTP from GxT data to provide an additional index of cycling performance. FTP has been reported to provide an insight not provided by a GxT and, in addition, does not require a formal exercise testing facility. The study design facilitated a deliberate and transparent sequence of statistical decisions, resolved in part from the perspective of exercise physiology. Seventy triathletes (male n=50, female n=20) completed cycling GxT and FTP tests in sequential order. Collected data (power output, blood lactate indices, VO2peak, body mass) were analysed using stepwise regression to identify the key parameters for predicting FTP, and confirmed using a Leave One Out (LOO) cross-validation. As a consequence of wittingly including some likely transiently highly correlated parameters on the basis of a physiological argument, the model's function is limited to predicting FTP. This investigation concluded the model (FTP = -6.62 + 0.32 FBLC-4 + 0.42 BM + 0.46 Pmax) was the prediction model of choice.
Collapse
Affiliation(s)
- Eanna McGRATH
- Human Performance Laboratory, Disciplines of Anatomy and Physiology, School of Medicine, Trinity College Dublin, IRL
| | - Nick Mahony
- Human Performance Laboratory, Disciplines of Anatomy and Physiology, School of Medicine, Trinity College Dublin, IRL
| | - Neil Fleming
- Human Performance Laboratory, Disciplines of Anatomy and Physiology, School of Medicine, Trinity College Dublin, IRL
| | - Alessio Benavoli
- School of Computer Science and Statistics, Trinity College Dublin, IRL
| | - Bernard Donne
- Human Performance Laboratory, Disciplines of Anatomy and Physiology, School of Medicine, Trinity College Dublin, IRL
| |
Collapse
|
50
|
Ecker A, Jenny M, Müller WC, Praprotnik K. How and why party position estimates from manifestos, expert, and party elite surveys diverge: A comparative analysis of the 'left-right' and the 'European integration' dimensions. Party Politics 2022; 28:528-540. [PMID: 35493065 PMCID: PMC9036146 DOI: 10.1177/1354068821990298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/29/2019] [Accepted: 01/06/2021] [Indexed: 05/09/2023]
Abstract
This paper examines the validity of three approaches to estimate party positions on the general left-right and EU dimensions. We newly introduce party elite data from the comprehensive IntUne survey and cross-validate it with existing expert survey and manifesto data. The general left-right estimates generated by elites and experts show a higher congruence than those derived from party manifestos; neither measure clearly materializes as more valid regarding EU positions. We identify which factors explain diverging estimates. For instance, disagreement among experts has greater impact than their mere number. The substantial centrist bias of the manifesto estimates persists even when alternative documents are used to substitute manifestos. Low response rates among elites have no systematic detrimental effect on the validity of party position estimates.
Collapse
Affiliation(s)
| | | | - Wolfgang C Müller
- Wolfgang C Müller, Universität Wien, Rooseveltplatz 3, Vienna, 1090, Austria.
| | | |
Collapse
|