1
|
Ren P, Liu G, Pu X. Generalized fiducial methods for testing the homogeneity of a three-sample problem with a mixture structure. J Appl Stat 2021; 50:1094-1114. [PMID: 37009592 PMCID: PMC10062230 DOI: 10.1080/02664763.2021.2017414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Recently, the likelihood ratio (LR) test was proposed to test the homogeneity of a three-sample model with a mixture structure. Because of the presence of the mixture structure, the null limiting distribution of the LR test has a complicated supremum form, which leads to challenges in determining p-values. In addition, the LR test cannot control type-I errors well under small to moderate sample size. In this paper, we propose seven generalized fiducial methods to test the homogeneity of the three-sample model. Via simulation studies, we find that our methods perform significantly better than the LR test method in controlling the type-I errors under small to moderate sample size, while they have comparable powers in most cases. A halibut data example is used to illustrate the proposed methods.
Collapse
Affiliation(s)
- Pengcheng Ren
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, People's Republic of China
| | - Guanfu Liu
- School of Statistics and Information, Shanghai University of International Business and Economics, Shanghai, People's Republic of China
| | - Xiaolong Pu
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, People's Republic of China
| |
Collapse
|
2
|
Affiliation(s)
- Jiahui Yu
- Department of Mathematics and Statistics, Boston University, Boston, MA
| | | | - Anna Liu
- Department of Mathematics and Statistics, University of Massachusetts, Amherst, MA
| | - Yuedong Wang
- Department Statistics and Applied Probability, University of California, Santa Barbara, CA
| |
Collapse
|
3
|
Zhang X, Tan Z. Semi‐supervised logistic learning based on exponential tilt mixture models. Stat (Int Stat Inst) 2020. [DOI: 10.1002/sta4.312] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Xinwei Zhang
- Department of Statistics Rutgers University Piscataway 08854 NJ U.S.A
| | - Zhiqiang Tan
- Department of Statistics Rutgers University Piscataway 08854 NJ U.S.A
| |
Collapse
|
4
|
Cai Y, Huang J, Ning J, Lee MLT, Rosner B, Chen Y. Two-sample test for correlated data under outcome-dependent sampling with an application to self-reported weight loss data. Stat Med 2019; 38:4999-5009. [PMID: 31489699 PMCID: PMC6800790 DOI: 10.1002/sim.8346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Revised: 07/07/2019] [Accepted: 07/17/2019] [Indexed: 11/09/2022]
Abstract
Standard methods for two-sample tests such as the t-test and Wilcoxon rank sum test may lead to incorrect type I errors when applied to longitudinal or clustered data. Recent alternatives of two-sample tests for clustered data often require certain assumptions on the correlation structure and/or noninformative cluster size. In this paper, based on a novel pseudolikelihood for correlated data, we propose a score test without knowledge of the correlation structure or assuming data missingness at random. The proposed score test can capture differences in the mean and variance between two groups simultaneously. We use projection theory to derive the limiting distribution of the test statistic, in which the covariance matrix can be empirically estimated. We conduct simulation studies to evaluate the proposed test and compare it with existing methods. To illustrate the usefulness proposed test, we use it to compare self-reported weight loss data in a friends' referral group, with the data from the Internet self-joining group.
Collapse
Affiliation(s)
- Yi Cai
- AT&T Services, Inc., Plano, TX 75247, USA
| | - Jing Huang
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jing Ning
- Department of Statistical Science, Cornell University, Ithaca, NY 14853, USA
| | - Mei-Ling Ting Lee
- Department of Epidemiology and Biostatistics, The University of Maryland School of Public Health, College Park, MD 20742, USA
| | - Bernard Rosner
- Department of Biostatistics, Harvard Medical School, MA 02115, USA
| | - Yong Chen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
5
|
Sustainable Regulation of Information Sharing with Electronic Data Interchange by a Trust-Embedded Contract. SUSTAINABILITY 2017. [DOI: 10.3390/su9060964] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
6
|
Affiliation(s)
- Pengfei Li
- Department of Statistics and Actuarial Sciences, University of Waterloo, Waterloo, Ontario, Canada
| | - Yukun Liu
- School of Statistics, East China Normal University, Shanghai, China
| | - Jing Qin
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD
| |
Collapse
|
7
|
Hong C, Ning Y, Wang S, Wu H, Carroll RJ, Chen Y. PLMET: A Novel Pseudolikelihood-Based EM Test for Homogeneity in Generalilzed Exponential Tilt Mixture Models. J Am Stat Assoc 2017; 112:1393-1404. [PMID: 29416190 PMCID: PMC5798902 DOI: 10.1080/01621459.2017.1280405] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Revised: 10/01/2016] [Indexed: 10/20/2022]
Abstract
Motivated by analyses of DNA methylation data, we propose a semiparametric mixture model, namely the generalized exponential tilt mixture model, to account for heterogeneity between differentially methylated and non-differentially methylated subjects in the cancer group, and capture the differences in higher order moments (e.g. mean and variance) between subjects in cancer and normal groups. A pairwise pseudolikelihood is constructed to eliminate the unknown nuisance function. To circumvent boundary and non-identifiability problems as in parametric mixture models, we modify the pseudolikelihood by adding a penalty function. In addition, the test with simple asymptotic distribution has computational advantages compared with permutation-based test for high-dimensional genetic or epigenetic data. We propose a pseudolikelihood based expectation-maximization test, and show the proposed test follows a simple chi-squared limiting distribution. Simulation studies show that the proposed test controls Type I errors well and has better power compared to several current tests. In particular, the proposed test outperforms the commonly used tests under all simulation settings considered, especially when there are variance differences between two groups. The proposed test is applied to a real data set to identify differentially methylated sites between ovarian cancer subjects and normal subjects.
Collapse
Affiliation(s)
- Chuan Hong
- Department of Biostatistics, Harvard University School of Public Health,
Boston, MA 02115, USA
| | - Yang Ning
- Department of Statistical Science, Cornell University, Ithaca, NY 14853,
USA
| | - Shuang Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia
University, New York, NY 10027, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Rollins School of Public
Health, Emory University, Atlanta, GA 30322, USA
| | - Raymond J. Carroll
- Department of Statistics, Texas A&M University, College Station, TX
77843-3143, USA
| | - Yong Chen
- Department of Biostatistics and Epidemiology, University of Pennsylvania,
Philadelphia, PA 19104, USA
| |
Collapse
|
8
|
|
9
|
Ning Y, Chen Y. A Class of Pseudolikelihood Ratio Tests for Homogeneity in Exponential Tilt Mixture Models. Scand Stat Theory Appl 2014. [DOI: 10.1111/sjos.12119] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- Yang Ning
- Department of Statistics and Actuarial Science University of Waterloo
| | - Yong Chen
- Division of Biostatistics The University of Texas School of Public Health
| |
Collapse
|
10
|
|
11
|
Chen Y, Ning Y, Hong C, Wang S. Semiparametric Tests for Identifying Differentially Methylated Loci With Case-Control Designs Using Illumina Arrays. Genet Epidemiol 2013; 38:42-50. [DOI: 10.1002/gepi.21774] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2013] [Revised: 09/13/2013] [Accepted: 10/17/2013] [Indexed: 02/06/2023]
Affiliation(s)
- Yong Chen
- Division of Biostatistics, School of Public Health; The University of Texas; Houston Texas United States of America
| | - Yang Ning
- Department of Statistics and Actuarial Science; University of Waterloo; Ontario Canada
| | - Chuan Hong
- Division of Biostatistics, School of Public Health; The University of Texas; Houston Texas United States of America
| | - Shuang Wang
- Department of Biostatistics; Mailman School of Public Health, Columbia University; New York City New York United States of America
| |
Collapse
|
12
|
Xu J, Zheng G, Yuan A. Case-Control Genome-wide Joint Association Study Using Semiparametric Empirical Model and Approximate Bayes Factor. J STAT COMPUT SIM 2013; 83:1191-1209. [PMID: 24532860 DOI: 10.1080/00949655.2011.654119] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Abstract
We propose a semiparametric approach for the analysis of case-control genome-wide association study. Parametric components are used to model both the conditional distribution of the case status given the covariates and the distribution of genotype counts, whereas the distribution of the covariates are modeled nonparametrically. This yields a direct and joint modeling of the case status, covariates and genotype counts, and gives better understanding of the disease mechanism and results in more reliable conclusions. Side information, such as the disease prevalence, can be conveniently incorporated into the model by empirical likelihood approach and leads to more efficient estimates and powerful test in the detection of disease-associated SNPs. Profiling is used to eliminate a nuisance nonparametric component, and the resulting profile empirical likelihood estimates are shown to be consistent and asymptotically normal. For the hypothesis test on disease association, we apply the approximate Bayes factor (ABF) which is computationally simple and most desirable in genome-wide association studies where hundreds of thousands to a million genetic markers are tested. We treat the approximate Bayes factor as a hybrid Bayes factor which replaces the full data by the maximum likelihood estimates of the parameters of interest in the full model and derive it under a general setting. The deviation from Hardy-Weinberg Equilibrium (HWE) is also taken into account and the ABF for HWE using cases is shown to provide evidence of association between a disease and a genetic marker. Simulation studies and an application are further provided to illustrate the utility of the proposed methodology.
Collapse
Affiliation(s)
- Jinfeng Xu
- Department of Statistics and Applied Probability, National University of Singapore, Singapore 117546
| | - Gang Zheng
- Office of Biostatistics Research, DPPS, National Heart, Lung and Blood Institute, 6701 Rockledge Drive, Bethesda, MD 20892, USA
| | - Ao Yuan
- National Human Genome Center, Howard University, 2216 Sixth Street N.W., Washington, DC 20059
| |
Collapse
|
13
|
Cheng J, Qin J, Zhang B. Semiparametric estimation and inference for distributional and general treatment effects. J R Stat Soc Series B Stat Methodol 2009. [DOI: 10.1111/j.1467-9868.2009.00715.x] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|