1
|
Yu C, Guo W, Song X, Cui H. Feature screening with latent responses. Biometrics 2022. [PMID: 35246841 DOI: 10.1111/biom.13658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Accepted: 02/25/2022] [Indexed: 11/30/2022]
Abstract
A novel feature screening method is proposed to examine the correlation between latent responses and potential predictors in ultrahigh dimensional data analysis. First, a confirmatory factor analysis (CFA) model is used to characterize latent responses through multiple observed variables. The expectation-maximization algorithm is employed to estimate the parameters in the CFA model. Second, R-Vector (RV) correlation is used to measure the dependence between the multivariate latent responses and covariates of interest. Third, a feature screening procedure is proposed on the basis of an unbiased estimator of the RV coefficient. The sure screening property of the proposed screening procedure is established under certain mild conditions. Monte Carlo simulations are conducted to assess the finite sample performance of the feature screening procedure. The proposed method is applied to an investigation of the relationship between psychological well-being and the human genome. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Congran Yu
- School of Mathematical Sciences, Capital Normal University, Beijing, China
| | - Wenwen Guo
- School of Mathematical Sciences, Capital Normal University, Beijing, China
| | - Xinyuan Song
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong, China
| | - Hengjian Cui
- School of Mathematical Sciences, Capital Normal University, Beijing, China
| |
Collapse
|
2
|
Choi J, Lu D, Beg MF, Graham J, McNeney B. The Contribution Plot: Decomposition and Graphical Display of the RV Coefficient, with Application to Genetic and Brain Imaging Biomarkers of Alzheimer's Disease. Hum Hered 2019; 84:59-72. [PMID: 31430752 PMCID: PMC9008771 DOI: 10.1159/000501334] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2018] [Accepted: 06/05/2019] [Indexed: 10/21/2023] Open
Abstract
BACKGROUND/AIMS Alzheimer's disease (AD) is a chronic neurodegenerative disease that causes memory loss and a decline in cognitive abilities. AD is the sixth leading cause of death in the USA, affecting an estimated 5 million Americans. To assess the association between multiple genetic variants and multiple measurements of structural changes in the brain, a recent study of AD used a multivariate measure of linear dependence, the RV coefficient. The authors decomposed the RV coefficient into contributions from individual variants and displayed these contributions graphically. METHODS We investigate the properties of such a "contribution plot" in terms of an underlying linear model, and discuss shrinkage estimation of the components of the plot when the correlation signal may be sparse. RESULTS The contribution plot is applied to simulated data and to genomic and brain imaging data from the Alzheimer's Disease Neuroimaging Initiative (ADNI). CONCLUSIONS The contribution plot with shrinkage estimation can reveal truly associated explanatory variables.
Collapse
Affiliation(s)
- JinCheol Choi
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Donghuan Lu
- School of Engineering Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Mirza Faisal Beg
- School of Engineering Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Jinko Graham
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Brad McNeney
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, British Columbia, Canada,
| |
Collapse
|
3
|
Yamaura Y, Blanchet FG, Higa M. Analyzing community structure subject to incomplete sampling: hierarchical community model vs. canonical ordinations. Ecology 2019; 100:e02759. [PMID: 31131887 DOI: 10.1002/ecy.2759] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 04/17/2019] [Indexed: 11/11/2022]
Abstract
Recently developing hierarchical community models (HCMs) accounting for incomplete sampling are promising approaches to understand community organization. However, pros and cons of incorporating incomplete sampling in the analysis and related design issues remain unknown. In this study, we compared HCM and canonical redundancy analysis (RDA) carried out with 10 different dissimilarity coefficients to evaluate how each approach restores true community abundance data sampled with imperfect detection. We conducted simulation experiments with varying numbers of sampling sites, visits, mean detectability and mean abundance. Performance of HCM was measured by estimates of "expected" (mean) abundance ( λ ^ ij ) and realized abundance ( N ^ ij : direct estimate of site- and species-specific abundance). We also compared HCM and different types of RDA (normal, partial, and weighted), all performed with the same ten different dissimilarity coefficients, with unequal number of visits to sampling sites. In addition, we applied the models to a virtual survey carried out on the Barro Colorado Island tree plot data for which we know true community abundance. Simulation experiments showed that N ^ ij yielded by HCM best restored the underlying abundance of constituent species among 12 abundance estimates by HCM and RDA regardless if the sampling was equal or unequal. Mean abundance predominantly affected the performance of HCM and RDA while λ ^ ij yielded by HCM had comparable performance to percentage difference and Gower dissimilarity coefficients of RDA. Relative performance of RDA types depended on the combination of dissimilarity coefficients and the distribution of sampling effort. Best performance of N ^ ij followed by λ ^ ij , percentage difference and Gower dissimilarity were also observed for the analysis of tree plot data, and graphical plots (triplots) based on λ ^ ij rather than N ^ ij clearly separated the effects of two environmental covariates on the abundance of constituent species. Under our conditions of model evaluation and the method, we concluded that, in terms of assessing the environmental dependence of abundance, HCMs and RDA can have comparable performance if we can choose appropriate dissimilarity coefficients for RDA. However, since HCMs provide straightforward biological interpretations of parameter estimates and flexibility of the analysis, HCMs would be useful in many situations as well as conventional canonical ordinations.
Collapse
Affiliation(s)
- Yuichi Yamaura
- Department of Forest Vegetation, Forestry and Forest Products Research Institute, 1 Matsunosato, Tsukuba, 305-8687, Japan.,Fenner School of Environment and Society, Australian National University, Canberra, Australian Capital Territory, 2601, Australia.,Shikoku Research Center, Forestry and Forest Products Research Institute, 2-915 Asakuranishi, Kochi, 780-8077, Japan
| | - F Guillaume Blanchet
- Department of Mathematics and Statistics, McMaster University, Hamilton Hall, Room 218, 1280 Main Street West, Hamilton, Ontario, L8S 4K1, Canada.,Département de biologie, Faculté des sciences, Université de Sherbrooke, 2500 Boulevard Université, Sherbrooke, Québec, J1K 2R1, Canada
| | - Motoki Higa
- Faculty of Science and Technology, Kochi University, 2-5-1 Akebono-cho, Kochi, 780-8520, Japan
| |
Collapse
|
4
|
He W, Chung HY. Comparison between quantitative descriptive analysis and flash profile in profiling the sensory properties of commercial red sufu (Chinese fermented soybean curd). J Sci Food Agric 2019; 99:3024-3033. [PMID: 30488614 DOI: 10.1002/jsfa.9516] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 10/25/2018] [Accepted: 11/25/2018] [Indexed: 06/09/2023]
Abstract
BACKGROUND Red sufu is a type of sufu produced by solid-state fermentation of soybean curd and coloration with red mold rice. The purposes of this study were: (i) to characterize commercial red sufu samples using the quantitative descriptive analysis (QDA) and flash profile (FP) by ten trained and ten untrained panelists, respectively; (ii) to compare the differences in panel performance, descriptive abilities and sensory maps between the two methodologies; and (iii) to compare the efficiency between QDA and FP using red sufu as the matrix. Techniques in multivariate analysis were utilized to explore the data. RESULTS Results from generalized procrustes analysis (GPA) showed that panel performance by QDA was more repeatable and reached higher homogeneity than that by FP. Despite the confidence ellipse results of the 12 red sufus being better discriminated by QDA, the RV coefficient was high (RV = 0.852) between the configurations of the two-dimensional model (F1 and F2) of the two methodologies, indicating that the two methods are similar and closely related. Overall, QDA provided more accurate and detailed information, while FP provided a similar sensory map on product location and descriptive results. CONCLUSION The FP technique appeared to be an efficient alternative approach to quickly evaluate sensory properties, including appearance, flavor, aroma and textural properties of an array of red sufu products. © 2018 Society of Chemical Industry.
Collapse
Affiliation(s)
- Wenmeng He
- Food & Nutritional Sciences Programme, School of Life Sciences, The Chinese University of Hong Kong, Shatin N. T., Hong Kong SAR, China
| | - Hau Yin Chung
- Food & Nutritional Sciences Programme, School of Life Sciences, The Chinese University of Hong Kong, Shatin N. T., Hong Kong SAR, China
| |
Collapse
|
5
|
Guo B, Wu B. Reader reaction on the fast small-sample kernel independence test for microbiome community-level association analysis. Biometrics 2017; 74:1120-1124. [PMID: 29192963 DOI: 10.1111/biom.12823] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2017] [Revised: 08/01/2017] [Accepted: 09/01/2017] [Indexed: 01/11/2023]
Abstract
Zhan et al. () presented a kernel RV coefficient (KRV) test to evaluate the overall association between host gene expression and microbiome composition, and showed its competitive performance compared to existing methods. In this article, we clarify the close relation of KRV to the existing generalized RV (GRV) coefficient, and show that KRV and GRV have very similar performance. Although the KRV test could control the type I error rate well at 1% and 5% levels, we show that it could largely underestimate p-values at small significance levels leading to significantly inflated type I errors. As a partial remedy, we propose an alternative p-value calculation, which is efficient and more accurate than KRV p-value at small significance levels. We recommend that small KRV test p-values should always be accompanied and verified by the permutation p-value in practice. In addition, we analytically show that KRV can be written as a form of correlation coefficient, which can dramatically expedite its computation and make permutation p-value calculation more efficient.
Collapse
Affiliation(s)
- Bin Guo
- Division of Biostatistics, School of Public Health University of Minnesota, Minneapolis, Minnesota, U.S.A
| | - Baolin Wu
- Division of Biostatistics, School of Public Health University of Minnesota, Minneapolis, Minnesota, U.S.A
| |
Collapse
|
6
|
Zhan X, Plantinga A, Zhao N, Wu MC. A fast small-sample kernel independence test for microbiome community-level association analysis. Biometrics 2017; 73:1453-1463. [PMID: 28295177 DOI: 10.1111/biom.12684] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Revised: 02/01/2017] [Accepted: 02/01/2017] [Indexed: 12/13/2022]
Abstract
To fully understand the role of microbiome in human health and diseases, researchers are increasingly interested in assessing the relationship between microbiome composition and host genomic data. The dimensionality of the data as well as complex relationships between microbiota and host genomics pose considerable challenges for analysis. In this article, we apply a kernel RV coefficient (KRV) test to evaluate the overall association between host gene expression and microbiome composition. The KRV statistic can capture nonlinear correlations and complex relationships among the individual data types and between gene expression and microbiome composition through measuring general dependency. Testing proceeds via a similar route as existing tests of the generalized RV coefficients and allows for rapid p-value calculation. Strategies to allow adjustment for confounding effects, which is crucial for avoiding misleading results, and to alleviate the problem of selecting the most favorable kernel are considered. Simulation studies show that KRV is useful in testing statistical independence with finite samples given the kernels are appropriately chosen, and can powerfully identify existing associations between microbiome composition and host genomic data while protecting type I error. We apply the KRV to a microbiome study examining the relationship between host transcriptome and microbiome composition within the context of inflammatory bowel disease and are able to derive new biological insights and provide formal inference on prior qualitative observations.
Collapse
Affiliation(s)
- Xiang Zhan
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, U.S.A
| | - Anna Plantinga
- Department of Biostatistics, University of Washington, Seattle, Washington 98195, U.S.A
| | - Ni Zhao
- Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland 21205, U.S.A
| | - Michael C Wu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, U.S.A
| |
Collapse
|
7
|
Abstract
Simple correlation coefficients between two variables have been generalized to measure association between two matrices in many ways. Coefficients such as the RV coefficient, the distance covariance (dCov) coefficient and kernel based coefficients are being used by different research communities. Scientists use these coefficients to test whether two random vectors are linked. Once it has been ascertained that there is such association through testing, then a next step, often ignored, is to explore and uncover the association's underlying patterns. This article provides a survey of various measures of dependence between random vectors and tests of independence and emphasizes the connections and differences between the various approaches. After providing definitions of the coefficients and associated tests, we present the recent improvements that enhance their statistical properties and ease of interpretation. We summarize multi-table approaches and provide scenarii where the indices can provide useful summaries of heterogeneous multi-block data. We illustrate these different strategies on several examples of real data and suggest directions for future research.
Collapse
Affiliation(s)
- Julie Josse
- Department of Statistics, Agrocampus Ouest - INRIA, Saclay Paris Sud University, France
| | - Susan Holmes
- Department of Statistics, Stanford University, California, USA
| |
Collapse
|