1
|
Similarity-Based Predictive Models: Sensitivity Analysis and a Biological Application with Multi-Attributes. BIOLOGY 2023; 12:959. [PMID: 37508389 PMCID: PMC10376039 DOI: 10.3390/biology12070959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Revised: 06/20/2023] [Accepted: 06/27/2023] [Indexed: 07/30/2023]
Abstract
Predictive models based on empirical similarity are instrumental in biology and data science, where the premise is to measure the likeness of one observation with others in the same dataset. Biological datasets often encompass data that can be categorized. When using empirical similarity-based predictive models, two strategies for handling categorical covariates exist. The first strategy retains categorical covariates in their original form, applying distance measures and allocating weights to each covariate. In contrast, the second strategy creates binary variables, representing each variable level independently, and computes similarity measures solely through the Euclidean distance. This study performs a sensitivity analysis of these two strategies using computational simulations, and applies the results to a biological context. We use a linear regression model as a reference point, and consider two methods for estimating the model parameters, alongside exponential and fractional inverse similarity functions. The sensitivity is evaluated by determining the coefficient of variation of the parameter estimators across the three models as a measure of relative variability. Our results suggest that the first strategy excels over the second one in effectively dealing with categorical variables, and offers greater parsimony due to the use of fewer parameters.
Collapse
|
2
|
Structure of bivariate Rayleigh proportional hazard rate model with its associated copula applied on COVID-19 data. QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL 2022; 38:3451-3469. [PMID: 37123988 PMCID: PMC10128045 DOI: 10.1002/qre.3143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 04/09/2022] [Accepted: 05/10/2022] [Indexed: 05/03/2023]
Abstract
Visualizing the fatality of coronavirus is a very tricky point through the world. In this paper, a new construction via the proportional hazard rate model with Rayleigh marginal is introduced and applied on COVID-19 data set. The statistical and reliability characteristics of bivariate Rayleigh proportional hazard (BRPH) distribution are derived. The copula dependence structure and its properties are studied. The point estimation of the marginal and dependence parameters is introduced via maximum likelihood, method of moments, and inference function for margins (IFM) method. A simulation study is carried out to examine the effectiveness and the performance of the parameter estimates. Finally, an application on COVID-19 data is used in a comparison study between BRPH model and other constructed bivariate models. This application concerned with modeling the fatality on COVID-19. Throughout the results of goodness-of-fit criteria, BRPH provides a better fit than different competitors constructed bivariate models which reflects its flexibility and applicability on modeling the fatality of COVID-19.
Collapse
|
3
|
Bayesian and Frequentist Inferences on a Type I Half-Logistic Odd Weibull Generator with Applications in Engineering. ENTROPY (BASEL, SWITZERLAND) 2021; 23:446. [PMID: 33920069 PMCID: PMC8069396 DOI: 10.3390/e23040446] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 03/17/2021] [Accepted: 03/25/2021] [Indexed: 11/16/2022]
Abstract
In this article, we have proposed a new generalization of the odd Weibull-G family by consolidating two notable families of distributions. We have derived various mathematical properties of the proposed family, including quantile function, skewness, kurtosis, moments, incomplete moments, mean deviation, Bonferroni and Lorenz curves, probability weighted moments, moments of (reversed) residual lifetime, entropy and order statistics. After producing the general class, two of the corresponding parametric statistical models are outlined. The hazard rate function of the sub-models can take a variety of shapes such as increasing, decreasing, unimodal, and Bathtub shaped, for different values of the parameters. Furthermore, the sub-models of the introduced family are also capable of modelling symmetric and skewed data. The parameter estimation of the special models are discussed by numerous methods, namely, the maximum likelihood, simple least squares, weighted least squares, Cramér-von Mises, and Bayesian estimation. Under the Bayesian framework, we have used informative and non-informative priors to obtain Bayes estimates of unknown parameters with the squared error and generalized entropy loss functions. An extensive Monte Carlo simulation is conducted to assess the effectiveness of these estimation techniques. The applicability of two sub-models of the proposed family is illustrated by means of two real data sets.
Collapse
|
4
|
Abstract
Background: Designing trials to reduce treatment duration is important in several
therapeutic areas, including tuberculosis and bacterial infections. We
recently proposed a new randomised trial design to overcome some of the
limitations of standard two-arm non-inferiority trials. This DURATIONS
design involves randomising patients to a number of duration arms and
modelling the so-called ‘duration-response curve’. This article investigates
the operating characteristics (type-1 and type-2 errors) of different
statistical methods of drawing inference from the estimated curve. Methods: Our first estimation target is the shortest duration non-inferior to the
control (maximum) duration within a specific risk difference margin. We
compare different methods of estimating this quantity, including using model
confidence bands, the delta method and bootstrap. We then explore the
generalisability of results to estimation targets which focus on absolute
event rates, risk ratio and gradient of the curve. Results: We show through simulations that, in most scenarios and for most of the
estimation targets, using the bootstrap to estimate variability around the
target duration leads to good results for DURATIONS design-appropriate
quantities analogous to power and type-1 error. Using model confidence bands
is not recommended, while the delta method leads to inflated type-1 error in
some scenarios, particularly when the optimal duration is very close to one
of the randomised durations. Conclusions: Using the bootstrap to estimate the optimal duration in a DURATIONS design
has good operating characteristics in a wide range of scenarios and can be
used with confidence by researchers wishing to design a DURATIONS trial to
reduce treatment duration. Uncertainty around several different targets can
be estimated with this bootstrap approach.
Collapse
|
5
|
Application of Least Squares with Conditional Equations Method for Railway Track Inventory Using GNSS Observations. SENSORS 2020; 20:s20174948. [PMID: 32882914 PMCID: PMC7506772 DOI: 10.3390/s20174948] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Revised: 08/26/2020] [Accepted: 08/28/2020] [Indexed: 11/29/2022]
Abstract
Satellite geodetic networks are commonly used in surveying tasks, but they can also be used in mobile surveys. Mobile satellite surveys can be used for trackage inventory, diagnostics and design. The combination of modern technological solutions with the adaptation of research methods known in other fields of science offers an opportunity to acquire highly accurate solutions for railway track inventory. This article presents the effects of work carried out using a mobile surveying platform on which Global Navigation Satellite System (GNSS) receivers were mounted. The satellite observations (surveys) obtained were aligned using one of the methods known from classical land surveying. The records obtained during the surveying campaign on a 246th km railway track section were subjected to alignment. This article provides a description of the surveying campaign necessary to obtain measurement data and a theoretical description of the method employed to align observation results as well as their visualisation.
Collapse
|
6
|
Saddle-Reset for Robust Parameter Estimation and Identifiability Analysis of Nonlinear Mixed Effects Models. AAPS JOURNAL 2020; 22:90. [PMID: 32617704 PMCID: PMC7373158 DOI: 10.1208/s12248-020-00471-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 06/09/2020] [Indexed: 11/30/2022]
Abstract
Parameter estimation of a nonlinear model based on maximizing the
likelihood using gradient-based numerical optimization methods can often fail due to
premature termination of the optimization algorithm. One reason for such failure is
that these numerical optimization methods cannot distinguish between the minimum,
maximum, and a saddle point; hence, the parameters found by these optimization
algorithms can possibly be in any of these three stationary points on the likelihood
surface. We have found that for maximization of the likelihood for nonlinear mixed
effects models used in pharmaceutical development, the optimization algorithm
Broyden–Fletcher–Goldfarb–Shanno (BFGS) often terminates in saddle points, and we
propose an algorithm, saddle-reset, to avoid the termination at saddle points, based
on the second partial derivative test. In this algorithm, we use the approximated
Hessian matrix at the point where BFGS terminates, perturb the point in the
direction of the eigenvector associated with the lowest eigenvalue, and restart the
BFGS algorithm. We have implemented this algorithm in industry standard software for
nonlinear mixed effects modeling (NONMEM, version 7.4 and up) and showed that it can
be used to avoid termination of parameter estimation at saddle points, as well as
unveil practical parameter non-identifiability. We demonstrate this using four
published pharmacometric models and two models specifically designed to be
practically non-identifiable.
Collapse
|
7
|
Abstract
We examined the effect of estimation methods, maximum likelihood (ML), unweighted least squares (ULS), and diagonally weighted least squares (DWLS), on three population SEM (structural equation modeling) fit indices: the root mean square error of approximation (RMSEA), the comparative fit index (CFI), and the standardized root mean square residual (SRMR). We considered different types and levels of misspecification in factor analysis models: misspecified dimensionality, omitting cross-loadings, and ignoring residual correlations. Estimation methods had substantial impacts on the RMSEA and CFI so that different cutoff values need to be employed for different estimators. In contrast, SRMR is robust to the method used to estimate the model parameters. The same criterion can be applied at the population level when using the SRMR to evaluate model fit, regardless of the choice of estimation method.
Collapse
|
8
|
One-stage individual participant data meta-analysis models for continuous and binary outcomes: Comparison of treatment coding options and estimation methods. Stat Med 2020; 39:2536-2555. [PMID: 32394498 DOI: 10.1002/sim.8555] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Revised: 12/09/2019] [Accepted: 04/03/2020] [Indexed: 01/22/2023]
Abstract
A one-stage individual participant data (IPD) meta-analysis synthesizes IPD from multiple studies using a general or generalized linear mixed model. This produces summary results (eg, about treatment effect) in a single step, whilst accounting for clustering of participants within studies (via a stratified study intercept, or random study intercepts) and between-study heterogeneity (via random treatment effects). We use simulation to evaluate the performance of restricted maximum likelihood (REML) and maximum likelihood (ML) estimation of one-stage IPD meta-analysis models for synthesizing randomized trials with continuous or binary outcomes. Three key findings are identified. First, for ML or REML estimation of stratified intercept or random intercepts models, a t-distribution based approach generally improves coverage of confidence intervals for the summary treatment effect, compared with a z-based approach. Second, when using ML estimation of a one-stage model with a stratified intercept, the treatment variable should be coded using "study-specific centering" (ie, 1/0 minus the study-specific proportion of participants in the treatment group), as this reduces the bias in the between-study variance estimate (compared with 1/0 and other coding options). Third, REML estimation reduces downward bias in between-study variance estimates compared with ML estimation, and does not depend on the treatment variable coding; for binary outcomes, this requires REML estimation of the pseudo-likelihood, although this may not be stable in some situations (eg, when data are sparse). Two applied examples are used to illustrate the findings.
Collapse
|
9
|
Abstract
Aims: Statistics on drug-related deaths (DRD) provide crucial information on the drug situation. The European Monitoring Centre for Drug and Drug Addiction (EMCDDA) has published a specification for extracting DRD from national mortality registers to be used in international comparisons. However, surprisingly little is known of the accuracy of DRD statistics derived from national mortality registers. This study assesses the accuracy of Swedish data derived from national mortality registers by comparing it with other sources of data. Methods: We compared five Swedish datasets. Three were derived from national mortality registers, two according to a Swedish specification and one according to the EMCDDA specification. A fourth dataset was based on toxicological analyses. We used a fifth dataset, an inventory of DRD in Stockholm, to assess the completeness and coverage of the Swedish datasets. Results: All datasets were extracted from high-quality registers, but still did not capture all DRD, and both the numbers and demographic characteristics varied considerably. However, the time trends were consistent between the selections. In international comparisons, data completeness and investigation procedures may impact even more on stated numbers. Conclusions: Basing international comparisons on numbers or rates of DRDs gives misleading results, but comparing trends is still meaningful.
Collapse
|
10
|
Estimates of the actual relationship between half-sibs in a pig population. J Anim Breed Genet 2016; 134:109-118. [PMID: 27670252 DOI: 10.1111/jbg.12236] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Accepted: 08/04/2016] [Indexed: 12/26/2022]
Abstract
Genomic relationships based on markers capture the actual instead of the expected (based on pedigree) proportion of genome shared identical by descent (IBD). Several methods exist to estimate genomic relationships. In this research, we compare four such methods that were tested looking at the empirical distribution of the estimated relationships across 6704 pairs of half-sibs from a cross-bred pig population. The first method based on multiple marker linkage analysis displayed a mean and standard deviation (SD) in close agreement with the expected ones and was robust to changes in the minor allele frequencies (MAF). A single marker method that accounts for linkage disequilibrium (LD) and inbreeding came second, showing more sensitivity to changes in the MAF. Another single marker method that considers neither inbreeding nor LD showed the smallest empirical SD and was the most sensible to changes in MAF. A higher mean and SD were displayed by VanRaden's method, which was not sensitive to changes in MAF. Therefore, the method based on multiple marker linkage analysis and the single marker method that considers LD and inbreeding performed closer to theoretical values and were consistent with the estimates reported in literature for human half-sibs.
Collapse
|
11
|
Serum Creatinine Back-Estimation in Cardiac Surgery Patients: Misclassification of AKI Using Existing Formulae and a Data-Driven Model. Clin J Am Soc Nephrol 2016; 11:395-404. [PMID: 26801479 DOI: 10.2215/cjn.03560315] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Accepted: 12/01/2015] [Indexed: 12/21/2022]
Abstract
BACKGROUND AND OBJECTIVES A knowledge of baseline serum creatinine (bSCr) is mandatory for diagnosing and staging AKI. With often missing values, bSCr is estimated by back-calculation using several equations designed for the estimation of GFR, assuming a "true" GFR of 75 ml/min per 1.73 m(2). Using a data set from a large cardiac surgery cohort, we tested the appropriateness of such an approach and compared estimated and measured bSCr. Moreover, we designed a novel data-driven model (estimated serum creatinine [eSCr]) for estimating bSCr. Finally, we analyzed the extent of AKI and mortality rate misclassifications. DESIGN, SETTING, PARTICIPANTS, & MEASUREMENTS Data for 8024 patients (2833 women) in our cardiac surgery center were included from 1997 to 2008. Measured and estimated bSCr were plotted against age for men and women. Patients were classified to AKI stages defined by the Kidney Disease Improving Global Outcomes (KDIGO) group. Results were compared with data from another cardiac surgery center in Zurich, Switzerland. RESULTS The Modification of Diet in Renal Disease and the Chronic Kidney Disease Epidemiology Collaboration formulae describe higher estimated bSCr values in younger patients, but lower values in older patients compared with the measured bSCr values in both centers. The Pittsburgh Linear Three Variables formula correctly describes the increasing bSCr with age, however, it underestimates the overall bSCr level, being in the range of the 25% quantile of the measured values. Our eSCr model estimated measured bSCr best. AKI stage 1 classification using all formulae, including our eSCr model, was incorrect in 53%-80% of patients in Vienna and in 74%-91% in Zurich; AKI severity (according to KDIGO stages) and also mortality were overestimated. Mortality rate was higher among patients falsely classified into higher KDIGO stages by estimated bSCr. CONCLUSIONS bSCr values back-estimated using currently available eGFR formulae are inaccurate and cannot correctly classify AKI stages. Our model eSCr improves the prediction of AKI but to a still inadequate extent.
Collapse
|
12
|
Fundamental discrepancies in abortion estimates and abortion-related mortality: A reevaluation of recent studies in Mexico with special reference to the International Classification of Diseases. Int J Womens Health 2012; 4:613-23. [PMID: 23271925 PMCID: PMC3526871 DOI: 10.2147/ijwh.s38063] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
In countries where induced abortion is legally restricted, as in most of Latin America, evaluation of statistics related to induced abortions and abortion-related mortality is challenging. The present article reexamines recent reports estimating the number of induced abortions and abortion-related mortality in Mexico, with special reference to the International Classification of Diseases (ICD). We found significant overestimations of abortion figures in the Federal District of Mexico (up to 10-fold), where elective abortion has been legal since 2007. Significant overestimation of maternal and abortion-related mortality during the last 20 years in the entire Mexican country (up to 35%) was also found. Such overestimations are most likely due to the use of incomplete in-hospital records as well as subjective opinion surveys regarding induced abortion figures, and due to the consideration of causes of death that are unrelated to induced abortion, including flawed denominators of live births. Contrary to previous publications, we found important progress in maternal health, reflected by the decrease in overall maternal mortality (30.6%) from 1990 to 2010. The use of specific ICD codes revealed that the mortality ratio associated with induced abortion decreased 22.9% between 2002 and 2008 (from 1.48 to 1.14 deaths per 100,000 live births). Currently, approximately 98% of maternal deaths in Mexico are related to causes other than induced abortion, such as hemorrhage, hypertension and eclampsia, indirect causes, and other pathological conditions. Therefore, only marginal or null effects would be expected from changes in the legal status of abortion on overall maternal mortality rates. Rather, maternal health in Mexico would greatly benefit from increasing access to emergency and specialized obstetric care. Finally, more reliable methodologies to assess abortion-related deaths are clearly required.
Collapse
|
13
|
Targeted maximum likelihood estimation for dynamic treatment regimes in sequentially randomized controlled trials. Int J Biostat 2012; 8:Article 14. [PMID: 22740582 PMCID: PMC6084784 DOI: 10.1515/1557-4679.1406] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Sequential Randomized Controlled Trials (SRCTs) are rapidly becoming essential tools in the search for optimized treatment regimes in ongoing treatment settings. Analyzing data for multiple time-point treatments with a view toward optimal treatment regimes is of interest in many types of afflictions: HIV infection, Attention Deficit Hyperactivity Disorder in children, leukemia, prostate cancer, renal failure, and many others. Methods for analyzing data from SRCTs exist but they are either inefficient or suffer from the drawbacks of estimating equation methodology. We describe an estimation procedure, targeted maximum likelihood estimation (TMLE), which has been fully developed and implemented in point treatment settings, including time to event outcomes, binary outcomes and continuous outcomes. Here we develop and implement TMLE in the SRCT setting. As in the former settings, the TMLE procedure is targeted toward a pre-specified parameter of the distribution of the observed data, and thereby achieves important bias reduction in estimation of that parameter. As with the so-called Augmented Inverse Probability of Censoring Weight (A-IPCW) estimator, TMLE is double-robust and locally efficient. We report simulation results corresponding to two data-generating distributions from a longitudinal data structure.
Collapse
|
14
|
Methods for Measuring and Estimating Methane Emission from Ruminants. Animals (Basel) 2012; 2:160-83. [PMID: 26486915 PMCID: PMC4494326 DOI: 10.3390/ani2020160] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2012] [Revised: 03/08/2012] [Accepted: 04/02/2012] [Indexed: 11/29/2022] Open
Abstract
This paper is a brief introduction to the different methods used to quantify the enteric methane emission from ruminants. A thorough knowledge of the advantages and disadvantages of these methods is very important in order to plan experiments, understand and interpret experimental results, and compare them with other studies. The aim of the paper is to describe the principles, advantages and disadvantages of different methods used to quantify the enteric methane emission from ruminants. The best-known methods: Chambers/respiration chambers, SF₆ technique and in vitro gas production technique and the newer CO₂ methods are described. Model estimations, which are used to calculate national budget and single cow enteric emission from intake and diet composition, are also discussed. Other methods under development such as the micrometeorological technique, combined feeder and CH₄ analyzer and proxy methods are briefly mentioned. Methods of choice for estimating enteric methane emission depend on aim, equipment, knowledge, time and money available, but interpretation of results obtained with a given method can be improved if knowledge about the disadvantages and advantages are used in the planning of experiments.
Collapse
|