1
|
Liyanage JSS, Estepp JH, Srivastava K, Li Y, Mori M, Kang G. GMEPS: a fast and efficient likelihood approach for genome-wide mediation analysis under extreme phenotype sequencing. Stat Appl Genet Mol Biol 2022; 21:sagmb-2021-0071. [PMID: 35266368 DOI: 10.1515/sagmb-2021-0071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 02/17/2022] [Indexed: 11/15/2022]
Abstract
Due to many advantages such as higher statistical power of detecting the association of genetic variants in human disorders and cost saving, extreme phenotype sequencing (EPS) is a rapidly emerging study design in epidemiological and clinical studies investigating how genetic variations associate with complex phenotypes. However, the investigation of the mediation effect of genetic variants on phenotypes is strictly restrictive under the EPS design because existing methods cannot well accommodate the non-random extreme tails sampling process incurred by the EPS design. In this paper, we propose a likelihood approach for testing the mediation effect of genetic variants through continuous and binary mediators on a continuous phenotype under the EPS design (GMEPS). Besides implementing in EPS design, it can also be utilized as a general mediation analysis procedure. Extensive simulations and two real data applications of a genome-wide association study of benign ethnic neutropenia under EPS design and a candidate-gene study of neurocognitive performance in patients with sickle cell disease under random sampling design demonstrate the superiority of GMEPS under the EPS design over widely used mediation analysis procedures, while demonstrating compatible capabilities under the general random sampling framework.
Collapse
Affiliation(s)
- Janaka S S Liyanage
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis 38105, TN, USA
| | - Jeremie H Estepp
- Departments of Global Pediatric Medicine and Hematology, St. Jude Children's Research Hospital, Memphis 38105, TN, USA
| | - Kumar Srivastava
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis 38105, TN, USA
| | - Yun Li
- Department of Biostatistics, Department of Genetics, Department of Computer Science, The University of North Carolina at Chapel Hill, Chapel Hill 27599, NC, USA
| | - Motomi Mori
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis 38105, TN, USA
| | - Guolian Kang
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis 38105, TN, USA
| |
Collapse
|
2
|
Tounkara F, Lefebvre G, Greenwood C, Oualkacha K. A flexible copula-based approach for the analysis of secondary phenotypes in ascertained samples. Stat Med 2020; 39:517-543. [PMID: 31868965 DOI: 10.1002/sim.8416] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2018] [Revised: 04/30/2019] [Accepted: 09/04/2019] [Indexed: 12/20/2022]
Abstract
Data collected for a genome-wide association study of a primary phenotype are often used for additional genome-wide association analyses of secondary phenotypes. However, when the primary and secondary traits are dependent, naïve analyses of secondary phenotypes may induce spurious associations in non-randomly ascertained samples. Previously, retrospective likelihood-based methods have been proposed to correct for sampling biases arising in secondary trait association analyses. However, most methods have been introduced to handle studies featuring a case-control design based on a binary primary phenotype. As such, these methods are not directly applicable to more complicated study designs such as multiple-trait studies, where the sampling mechanism also depends on the secondary phenotype, or extreme-trait studies, where individuals with extreme primary phenotype values are selected. To accommodate these more complicated sampling mechanisms, only a few prospective likelihood approaches have been proposed. These approaches assume a normal distribution for the secondary phenotype (or the latent secondary phenotype) and a bivariate normal distribution for the primary-secondary phenotype dependence. In this paper, we propose a unified copula-based approach to appropriately detect genetic variant/secondary phenotype association in the presence of selected samples. Primary phenotype is either binary or continuous and the secondary phenotype is continuous although not necessary normal. We use both prospective and retrospective likelihoods to account for the sampling mechanism and use a copula model to allow for potentially different dependence structures between the primary and secondary phenotypes. We demonstrate the effectiveness of our approach through simulation studies and by analyzing data from the Avon Longitudinal Study of Parents and Children cohort.
Collapse
Affiliation(s)
- Fodé Tounkara
- Lunenfeld-Tenenbaum Research Institute, Toronto, Canada
| | - Geneviève Lefebvre
- Department of Mathematics, Université du Québec à Montréal, Montreal, Canada
| | - Celia Greenwood
- Lady Davis Research Institute, Centre for Clinical Epidemiology, Jewish General Hospital, Montreal, Canada.,Gerald Bronfman Department of Oncology, McGill University, Montreal, Canada.,Department of Epidemiology, Biostatistics & Occupational Health, McGill University, Montreal, Canada.,Department of Human Genetics, McGill University, Montreal, Canada
| | - Karim Oualkacha
- Department of Mathematics, Université du Québec à Montréal, Montreal, Canada
| |
Collapse
|
3
|
Zhang H, Bi W, Cui Y, Chen H, Chen J, Zhao Y, Kang G. Extreme-value sampling design is cost-beneficial only with a valid statistical approach for exposure-secondary outcome association analyses. Stat Methods Med Res 2019; 29:466-480. [PMID: 30945605 DOI: 10.1177/0962280219839093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In epidemiology cohort studies, exposure data are collected in sub-studies based on a primary outcome (PO) of interest, as with the extreme-value sampling design (EVSD), to investigate their correlation. Secondary outcomes (SOs) data are also readily available, enabling researchers to assess the correlations between the exposure and the SOs. However, when the EVSD is used, the data for SOs are not representative samples of a general population; thus, many commonly used statistical methods, such as the generalized linear model (GLM), are not valid. A prospective likelihood method has been developed to associate SOs with single-nucleotide polymorphisms under an extreme phenotype sequencing design. In this paper, we describe the application of the prospective likelihood method (STEVSD) to exposure-SO association analysis under an EVSD. We undertook extensive simulations to assess the performance of the STEVSD method in associating binary and continuous exposures with SOs, comparing it to the simple GLM method that ignores the EVSD. To demonstrate the cost-benefit of the STEVSD method, we also mimicked the design of two new retrospective studies, as would be done in actual practice, based on the PO of interest, which was the same as the SO in the EVSD study. We then analyzed these data by using the GLM method and compared its power to that of the STEVSD method. We demonstrated the usefulness of the STEVSD method by applying it to a benign ethnic neutropenia dataset. Our results indicate that the STEVSD method can control type I error well, whereas the GLM method cannot do so owing to its ignorance of EVSD, and that the STEVSD method is cost-effective because it has statistical power similar to that of two new retrospective studies that require collecting new exposure data for selected individuals.
Collapse
Affiliation(s)
- Hang Zhang
- Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, PR China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, PR China
| | - Wenjian Bi
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, MI, USA
| | - Honglei Chen
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, USA
| | - Jinbo Chen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Yanlong Zhao
- Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, PR China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, PR China
| | - Guolian Kang
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, TN, USA
| |
Collapse
|