1
|
Li Y, Shelton BJ, St Clair W, Weiss HL, Villano JL, Stromberg AJ, Wang C, Chen L. Weighted mean difference statistics for paired data in the presence of missing values. Stat Methods Med Res 2023; 32:2033-2048. [PMID: 37647221 DOI: 10.1177/09622802231192947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
Missing data is a common issue in many biomedical studies. Under a paired design, some subjects may have missing values in either one or both of the conditions due to loss of follow-up, insufficient biological samples, etc. Such partially paired data complicate statistical comparison of the distribution of the variable of interest between the two conditions. In this article, we propose a general class of test statistics based on the difference in weighted sample means without imposing any distributional or model assumption. An optimal weight is derived from this class of tests. Simulation studies show that our proposed test with the optimal weight performs well and outperforms existing methods in practical situations. Two cancer biomarker studies are provided for illustration.
Collapse
Affiliation(s)
- Yuntong Li
- Regeneron Pharmaceuticals, Basking Ridge, NJ, USA
| | - Brent J Shelton
- Biostatistics and Bioinformatics Shared Resource Facility, Markey Cancer Center, University of Kentucky, Lexington, KY, USA
- Department of Internal Medicine, University of Kentucky, Lexington, KY, USA
| | - William St Clair
- Department of Radiation Medicine, University of Kentucky, Lexington, KY, USA
| | - Heidi L Weiss
- Biostatistics and Bioinformatics Shared Resource Facility, Markey Cancer Center, University of Kentucky, Lexington, KY, USA
- Department of Internal Medicine, University of Kentucky, Lexington, KY, USA
| | - John L Villano
- Department of Internal Medicine, University of Kentucky, Lexington, KY, USA
| | | | - Chi Wang
- Biostatistics and Bioinformatics Shared Resource Facility, Markey Cancer Center, University of Kentucky, Lexington, KY, USA
- Department of Internal Medicine, University of Kentucky, Lexington, KY, USA
- Department of Statistics, University of Kentucky, Lexington, KY, USA
| | - Li Chen
- Biostatistics and Bioinformatics Shared Resource Facility, Markey Cancer Center, University of Kentucky, Lexington, KY, USA
- Department of Internal Medicine, University of Kentucky, Lexington, KY, USA
| |
Collapse
|
2
|
Yamaguchi H, Kitani M, Murakami H. Robust testing procedures for scale differences in paired data. J STAT COMPUT SIM 2023. [DOI: 10.1080/00949655.2022.2163645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Affiliation(s)
- Hikaru Yamaguchi
- Department of Applied Mathematics, Graduate School of Science, Tokyo University of Science, Tokyo, Japan
| | - Masato Kitani
- Department of Applied Mathematics, Tokyo University of Science, Tokyo, Japan
| | - Hidetoshi Murakami
- Department of Applied Mathematics, Tokyo University of Science, Tokyo, Japan
| |
Collapse
|
3
|
Harrar SW, Cui Y. Nonparametric methods for clustered data in pre-post intervention design. J Stat Plan Inference 2022. [DOI: 10.1016/j.jspi.2022.05.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
4
|
Estimation and Testing of Wilcoxon–Mann–Whitney Effects in Factorial Clustered Data Designs. Symmetry (Basel) 2022. [DOI: 10.3390/sym14020244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Clustered data arise frequently in many practical applications whenever units are repeatedly observed under a certain condition. One typical example for clustered data are animal experiments, where several animals share the same cage and should not be assumed to be completely independent. Standard methods for the analysis of such data are Linear Mixed Models and Generalized Estimating Equations—however, checking their assumptions is not easy, especially in scenarios with small sample sizes, highly skewed, count, and ordinal or binary data. In such situations, Wilcoxon–Mann–Whitney type effects are suitable alternatives to mean-based or other distributional approaches. Hence, no specific data distribution, symmetric or asymmetric, is required. Within this work, we will present different estimation techniques of such effects in clustered factorial designs and discuss quadratic- and multiple contrast type-testing procedures for hypotheses formulated in terms of Wilcoxon–Mann–Whitney effects. Additionally, the framework allows for the occurrence of missing data: estimation and testing hypotheses are based on all-available data instead of complete-cases. An extensive simulation study investigates the precision of the estimators and the behavior of the test procedures in terms of their type-I error control. One real world dataset exemplifies the applicability of the newly proposed procedures.
Collapse
|
5
|
Rubarth K, Pauly M, Konietschke F. Ranking procedures for repeated measures designs with missing data: Estimation, testing and asymptotic theory. Stat Methods Med Res 2021; 31:105-118. [PMID: 34841991 PMCID: PMC8721540 DOI: 10.1177/09622802211046389] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
We develop purely nonparametric methods for the analysis of repeated measures designs with missing values. Hypotheses are formulated in terms of purely nonparametric treatment effects. In particular, data can have different shapes even under the null hypothesis and therefore, a solution to the nonparametric Behrens-Fisher problem in repeated measures designs will be presented. Moreover, global testing and multiple contrast test procedures as well as simultaneous confidence intervals for the treatment effects of interest will be developed. All methods can be applied for the analysis of metric, discrete, ordinal, and even binary data in a unified way. Extensive simulation studies indicate a satisfactory control of the nominal type-I error rate, even for small sample sizes and a high amount of missing data (up to 30%). We apply the newly developed methodology to a real data set, demonstrating its application and interpretation.
Collapse
Affiliation(s)
- Kerstin Rubarth
- 14903Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Biometry and Clinical Epidemiology, Charitéplatz 1, Berlin, Germany.,Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Straße 2, Berlin, Germany
| | - Markus Pauly
- Department of Statistics, TU Dortmund University, Dortmund, Germany
| | - Frank Konietschke
- 14903Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Biometry and Clinical Epidemiology, Charitéplatz 1, Berlin, Germany.,Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Straße 2, Berlin, Germany
| |
Collapse
|
6
|
Qi Q, Yan L, Tian L. Analyzing partially paired data: when can the unpaired portion(s) be safely ignored? J Appl Stat 2020; 49:1402-1420. [DOI: 10.1080/02664763.2020.1864813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Qianya Qi
- Department of Biostatistics, University at Buffalo, Buffalo, NY, USA
| | - Li Yan
- Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA
| | - Lili Tian
- Department of Biostatistics, University at Buffalo, Buffalo, NY, USA
| |
Collapse
|
7
|
Cui Y, Konietschke F, Harrar SW. The nonparametric Behrens-Fisher problem in partially complete clustered data. Biom J 2020; 63:148-167. [PMID: 33058259 DOI: 10.1002/bimj.201900310] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2019] [Revised: 05/04/2020] [Accepted: 05/10/2020] [Indexed: 11/05/2022]
Abstract
In randomized trials or observational studies involving clustered units, the assumption of independence within clusters is not practical. Existing parametric or semiparametric methods assume specific dependence structures within a cluster. Furthermore, parametric model assumptions may not even be realistic when data are measured in a nonmetric scale as commonly happens, for example, in quality-of-life outcomes. In this paper, nonparametric effect-size measures for clustered data that allow meaningful and interpretable probabilistic comparisons of treatments or intervention programs will be introduced. The dependence among observations within a cluster can be arbitrary. Point estimators along with their asymptotic properties for computing confidence intervals and performing hypothesis test will be discussed. Small sample approximations that retain some of the optimal asymptotic behaviors will be presented. In our setup, some clusters may involve observations coming from both intervention groups (referred to as complete clusters), while others may contain observations from one group only (referred to as incomplete clusters). In deriving the asymptotic theories, we do not impose any relation in the rate of divergence of the numbers of complete and incomplete clusters. Simulations show favorable performance of the methods for arbitrary combinations of complete and incomplete clusters. The developed nonparametric methods are illustrated using data from a randomized trial of indoor wood smoke reduction to improve asthma symptoms and a cluster-randomized trial for smoking cessation.
Collapse
Affiliation(s)
- Yue Cui
- Department of Mathematics, Missouri State University, Springfield, MO, USA
| | - Frank Konietschke
- Charité, - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute of Biometry and Clinical Epidemiology, Berlin, Germany.,Berlin Institute of Health (BIH), Berlin, Germany
| | - Solomon W Harrar
- Dr. Bing Zhang Department of Statistics, University of Kentucky, Lexington, KY, USA
| |
Collapse
|
8
|
|
9
|
Gaigall D. Testing marginal homogeneity of a continuous bivariate distribution with possibly incomplete paired data. METRIKA 2019. [DOI: 10.1007/s00184-019-00742-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
10
|
|
11
|
Amro L, Konietschke F, Pauly M. Multiplication‐combination tests for incomplete paired data. Stat Med 2019; 38:3243-3255. [DOI: 10.1002/sim.8178] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 02/27/2019] [Accepted: 04/04/2019] [Indexed: 11/06/2022]
Affiliation(s)
- Lubna Amro
- Institute of StatisticsUlm University Ulm Germany
| | - Frank Konietschke
- Institute of Biometry and Clinical EpidemiologyCharité—Universitätsmedizin Berlin Berlin Germany
- Berlin Institute of Health (BIH) Berlin Germany
| | - Markus Pauly
- Institute of StatisticsUlm University Ulm Germany
| |
Collapse
|
12
|
Fong Y, Huang Y, Lemos MP, Mcelrath MJ. Rank-based two-sample tests for paired data with missing values. Biostatistics 2019; 19:281-294. [PMID: 28968816 DOI: 10.1093/biostatistics/kxx039] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Accepted: 06/18/2017] [Indexed: 11/13/2022] Open
Abstract
Two-sample location problem is one of the most encountered problems in statistical practice. The two most commonly studied subtypes of two-sample location problem involve observations from two populations that are either independent or completely paired, but a third subtype can oftentimes occur in practice when some observations are paired and some are not. Partially paired two-sample problems, also known as paired two-sample problems with missing data, often arise in biomedical fields when it is difficult for some invasive procedures to collect data from an individual at both conditions we are interested in comparing. Existing rank-based two-sample comparison procedures for partially paired data, however, do not make efficient use of all available data. In order to improve the power of testing procedures for this problem, we propose several new rank-based test statistics and study their asymptotic distributions and, when necessary, exact variances. Through extensive numerical studies, we show that the best overall power come from the proposed tests based on weighted linear combinations of the test statistics comparing paired data and the test statistics comparing independent data, using weights inversely proportional to their variances. We illustrate the proposed methods with a real data example from HIV research for prevention.
Collapse
Affiliation(s)
- Youyi Fong
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N., Seattle, WA 98109, USA
| | - Ying Huang
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N., Seattle, WA 98109, USA
| | - Maria P Lemos
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N., Seattle, WA 98109, USA
| | - M Juliana Mcelrath
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N., Seattle, WA 98109, USA
| |
Collapse
|
13
|
Assessing non-inferiority for incomplete paired-data under non-ignorable missing mechanism. Comput Stat Data Anal 2018. [DOI: 10.1016/j.csda.2018.05.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
14
|
Fuchs N, Pölz W, Bathke AC. Confidence intervals for population means of partially paired observations. Stat Pap (Berl) 2015. [DOI: 10.1007/s00362-015-0686-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
15
|
|
16
|
Mittlböck M, Edler L, LeBlanc M, Niland J, Zwinderman K. Second Issue for Computational Statistics for Clinical Research. Comput Stat Data Anal 2012. [DOI: 10.1016/j.csda.2012.01.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|