1
|
Waters MR, Inkman M, Jayachandran K, Kowalchuk RM, Robinson C, Schwarz JK, Swamidass SJ, Griffith OL, Szymanski JJ, Zhang J. GAiN: An integrative tool utilizing generative adversarial neural networks for augmented gene expression analysis. Patterns (N Y) 2024; 5:100910. [PMID: 38370125 PMCID: PMC10873154 DOI: 10.1016/j.patter.2023.100910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 10/23/2023] [Accepted: 12/07/2023] [Indexed: 02/20/2024]
Abstract
Big genomic data and artificial intelligence (AI) are ushering in an era of precision medicine, providing opportunities to study previously under-represented subtypes and rare diseases rather than categorize them as variances. However, clinical researchers face challenges in accessing such novel technologies as well as reliable methods to study small datasets or subcohorts with unique phenotypes. To address this need, we developed an integrative approach, GAiN, to capture patterns of gene expression from small datasets on the basis of an ensemble of generative adversarial networks (GANs) while leveraging big population data. Where conventional biostatistical methods fail, GAiN reliably discovers differentially expressed genes (DEGs) and enriched pathways between two cohorts with limited numbers of samples (n = 10) when benchmarked against a gold standard. GAiN is freely available at GitHub. Thus, GAiN may serve as a crucial tool for gene expression analysis in scenarios with limited samples, as in the context of rare diseases, under-represented populations, or limited investigator resources.
Collapse
Affiliation(s)
- Michael R. Waters
- Department of Radiation Oncology, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Matthew Inkman
- Department of Radiation Oncology, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Kay Jayachandran
- Department of Radiation Oncology, Washington University School of Medicine, St. Louis, MO 63108, USA
| | | | - Clifford Robinson
- Department of Radiation Oncology, Washington University School of Medicine, St. Louis, MO 63108, USA
- Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Julie K. Schwarz
- Department of Radiation Oncology, Washington University School of Medicine, St. Louis, MO 63108, USA
- Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO 63110, USA
- Department of Cell Biology and Physiology, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - S. Joshua Swamidass
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110, USA
- Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, MO 63105, USA
- Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO 63105, USA
| | - Obi L. Griffith
- Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Jeffrey J. Szymanski
- Department of Radiation Oncology, Washington University School of Medicine, St. Louis, MO 63108, USA
- Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Jin Zhang
- Department of Radiation Oncology, Washington University School of Medicine, St. Louis, MO 63108, USA
- Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO 63110, USA
- Institute for Informatics (I), Washington University School of Medicine, St. Louis, MO 63110, USA
| |
Collapse
|
2
|
Abstract
Single-Case Experimental Designs (SCEDs) are increasingly recognized as a valuable alternative to group designs. Mediation analysis is useful in SCEDs contexts because it informs researchers about the underlying mechanism through which an intervention influences the outcome. However, methods for conducting mediation analysis in SCEDs have only recently been proposed. Furthermore, repeated measures of a target behavior present the challenges of autocorrelation and missing data. This paper aims to extend methods for estimating indirect effects in piecewise regression analysis in SCEDs by (1) evaluating three methods for modeling autocorrelation, namely, Newey-West (NW) estimation, feasible generalized least squares (FGLS) estimation, and explicit modeling of an autoregressive structure of order one (AR(1)) in the error terms and (2) evaluating multiple imputation in the presence of data that are missing completely at random. FGLS and AR(1) outperformed NW and OLS estimation in terms of efficiency, Type I error rates, and coverage, while OLS was superior to the methods in terms of power for larger samples. The performance of all methods is consistent across 0% and 20% missing data conditions. 50% missing data led to unsatisfactory power and biased estimates. In light of these findings, we provide recommendations for applied researchers.
Collapse
Affiliation(s)
- Emma Somer
- Department of Psychology, 5620McGill University, Montreal, QC, Canada
| | - Christian Gische
- Department of Psychology, 9373Humboldt-Universitätzu Berlin, Berlin, Germany
| | - Milica Miočević
- Department of Psychology, 5620McGill University, Montreal, QC, Canada
| |
Collapse
|
3
|
Wang B, Zheng Y, Fang D, Kamarianakis Y, Wilson JR. Split bootstrap hierarchical modeling of antibiotics abuse in China. Stat Med 2019; 38:2282-2291. [PMID: 30773666 DOI: 10.1002/sim.8118] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 12/10/2018] [Accepted: 01/19/2019] [Indexed: 11/11/2022]
Abstract
In the 1990s, China experienced a high degree of antibiotics abuse, which resulted in increased drug resistance. As a result, the World Health Organization introduced a program for children under the age of 5 years who had an acute respiratory tract infection. We analyze the data pertaining to the treatment provided by doctors in several hospitals in China in order to understand the relationships in the data. The data are nested in a three-level hierarchical structure with small cluster sizes ranging from 2 to 10. While large sample theory provides a mechanism to construct confidence intervals and test hypotheses about regression coefficients, the estimation algorithms often fail to converge when they are applied to small cluster sizes. This paper presents a combination of the cluster bootstrap and primary unit splitting methods, called split bootstrap, which is a novel combination that can be used as an alternative when analyzing data pertaining to the abuse of antibiotics in China with small cluster sizes. The split bootstrap method provides accurate estimations with a minimal reduction in precision.
Collapse
Affiliation(s)
- Bei Wang
- School of Mathematical and Statistical Sciences, Arizona State University, Tempe, Arizona
| | - Yi Zheng
- School of Mathematical and Statistical Sciences, Arizona State University, Tempe, Arizona.,Mary Lou Fulton Teachers College, Arizona State University, Tempe, Arizona
| | - Di Fang
- Department of Agricultural Economics and Agribusiness, University of Arkansas, Fayetteville, Arkansas
| | - Yiannis Kamarianakis
- School of Mathematical and Statistical Sciences, Arizona State University, Tempe, Arizona.,Institute of Applied and Computational Mathematics, Foundation for Research and Technology-Hellas, Heraklion, Greece
| | - Jeffrey R Wilson
- Department of Economics, Arizona State University, Tempe, Arizona
| |
Collapse
|
4
|
Abstract
It was suggested that Bayesian methods have potential for increasing power in mediation analysis (Koopman, Howe, Hollenbeck, & Sin, 2015; Yuan & MacKinnon, 2009). This paper compares the power of Bayesian credibility intervals for the mediated effect to the power of normal theory, distribution of the product, percentile, and bias-corrected bootstrap confidence intervals at N≤ 200. Bayesian methods with diffuse priors have power comparable to the distribution of the product and bootstrap methods, and Bayesian methods with informative priors had the most power. Varying degrees of precision of prior distributions were also examined. Increased precision led to greater power only when N≥ 100 and the effects were small, N < 60 and the effects were large, and N < 200 and the effects were medium. An empirical example from psychology illustrated a Bayesian analysis of the single mediator model from prior selection to interpreting results.
Collapse
Affiliation(s)
| | | | - Roy Levy
- T. Denny Sanford School of Social & Family Dynamics, Arizona State University
| |
Collapse
|
5
|
Zhou JJ, Hu T, Qiao D, Cho MH, Zhou H. Boosting Gene Mapping Power and Efficiency with Efficient Exact Variance Component Tests of Single Nucleotide Polymorphism Sets. Genetics 2016; 204:921-31. [PMID: 27646141 DOI: 10.1534/genetics.116.190454] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Accepted: 09/07/2016] [Indexed: 11/18/2022] Open
Abstract
Single nucleotide polymorphism (SNP) set tests have been a powerful method in analyzing next-generation sequencing (NGS) data. The popular sequence kernel association test (SKAT) method tests a set of variants as random effects in the linear mixed model setting. Its P-value is calculated based on asymptotic theory that requires a large sample size. Therefore, it is known that SKAT is conservative and can lose power at small or moderate sample sizes. Given the current cost of sequencing technology, scales of NGS are still limited. In this report, we derive and implement computationally efficient, exact (nonasymptotic) score (eScore), likelihood ratio (eLRT), and restricted likelihood ratio (eRLRT) tests, ExactVCTest, that can achieve high power even when sample sizes are small. We perform simulation studies under various genetic scenarios. Our ExactVCTest (i.e., eScore, eLRT, eRLRT) exhibits well-controlled type I error. Under the alternative model, eScore P-values are universally smaller than those from SKAT. eLRT and eRLRT demonstrate significantly higher power than eScore, SKAT, and SKAT optimal (SKAT-o) across all scenarios and various samples sizes. We applied these tests to an exome sequencing study. Our findings replicate previous results and shed light on rare variant effects within genes. The software package is implemented in the open source, high-performance technical computing language Julia, and is freely available at https://github.com/Tao-Hu/VarianceComponentTest.jl Analysis of each trait in the exome sequencing data set with 399 individuals and 16,619 genes takes around 1 min on a desktop computer.
Collapse
|