1
|
van den Heuvel E, Zhan Z. Myths about linear and monotonic associations: Pearson’s r, Spearman’s ρ, and Kendall’s τ. AM STAT 2021. [DOI: 10.1080/00031305.2021.2004922] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Affiliation(s)
- Edwin van den Heuvel
- Boston University, School of Medicine, Preventive Medicine and Epidemiology, 72 East Concorde Street, Boston 02118 MA, United States of America
- Eindhoven University of Technology, Department of Mathematics and Computer Science, Den Dolech 2, 5612 AZ Eindhoven, The Netherlands
| | - Zhuozhao Zhan
- Eindhoven University of Technology, Department of Mathematics and Computer Science, Den Dolech 2, 5612 AZ Eindhoven, The Netherlands
| |
Collapse
|
2
|
Xia Y. Correlation and association analyses in microbiome study integrating multiomics in health and disease. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2020; 171:309-491. [PMID: 32475527 DOI: 10.1016/bs.pmbts.2020.04.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Correlation and association analyses are one of the most widely used statistical methods in research fields, including microbiome and integrative multiomics studies. Correlation and association have two implications: dependence and co-occurrence. Microbiome data are structured as phylogenetic tree and have several unique characteristics, including high dimensionality, compositionality, sparsity with excess zeros, and heterogeneity. These unique characteristics cause several statistical issues when analyzing microbiome data and integrating multiomics data, such as large p and small n, dependency, overdispersion, and zero-inflation. In microbiome research, on the one hand, classic correlation and association methods are still applied in real studies and used for the development of new methods; on the other hand, new methods have been developed to target statistical issues arising from unique characteristics of microbiome data. Here, we first provide a comprehensive view of classic and newly developed univariate correlation and association-based methods. We discuss the appropriateness and limitations of using classic methods and demonstrate how the newly developed methods mitigate the issues of microbiome data. Second, we emphasize that concepts of correlation and association analyses have been shifted by introducing network analysis, microbe-metabolite interactions, functional analysis, etc. Third, we introduce multivariate correlation and association-based methods, which are organized by the categories of exploratory, interpretive, and discriminatory analyses and classification methods. Fourth, we focus on the hypothesis testing of univariate and multivariate regression-based association methods, including alpha and beta diversities-based, count-based, and relative abundance (or compositional)-based association analyses. We demonstrate the characteristics and limitations of each approaches. Fifth, we introduce two specific microbiome-based methods: phylogenetic tree-based association analysis and testing for survival outcomes. Sixth, we provide an overall view of longitudinal methods in analysis of microbiome and omics data, which cover standard, static, regression-based time series methods, principal trend analysis, and newly developed univariate overdispersed and zero-inflated as well as multivariate distance/kernel-based longitudinal models. Finally, we comment on current association analysis and future direction of association analysis in microbiome and multiomics studies.
Collapse
Affiliation(s)
- Yinglin Xia
- Department of Medicine, University of Illinois at Chicago, Chicago, IL, United States.
| |
Collapse
|
3
|
Perreault S, Duchesne T, Nešlehová JG. Detection of block-exchangeable structure in large-scale correlation matrices. J MULTIVARIATE ANAL 2019. [DOI: 10.1016/j.jmva.2018.10.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
4
|
Astivia OLO, Zumbo BD. Population models and simulation methods: The case of the Spearman rank correlation. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2017; 70:347-367. [PMID: 28140458 DOI: 10.1111/bmsp.12085] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Revised: 08/26/2016] [Indexed: 06/06/2023]
Abstract
The purpose of this paper is to highlight the importance of a population model in guiding the design and interpretation of simulation studies used to investigate the Spearman rank correlation. The Spearman rank correlation has been known for over a hundred years to applied researchers and methodologists alike and is one of the most widely used non-parametric statistics. Still, certain misconceptions can be found, either explicitly or implicitly, in the published literature because a population definition for this statistic is rarely discussed within the social and behavioural sciences. By relying on copula distribution theory, a population model is presented for the Spearman rank correlation, and its properties are explored both theoretically and in a simulation study. Through the use of the Iman-Conover algorithm (which allows the user to specify the rank correlation as a population parameter), simulation studies from previously published articles are explored, and it is found that many of the conclusions purported in them regarding the nature of the Spearman correlation would change if the data-generation mechanism better matched the simulation design. More specifically, issues such as small sample bias and lack of power of the t-test and r-to-z Fisher transformation disappear when the rank correlation is calculated from data sampled where the rank correlation is the population parameter. A proof for the consistency of the sample estimate of the rank correlation is shown as well as the flexibility of the copula model to encompass results previously published in the mathematical literature.
Collapse
Affiliation(s)
| | - Bruno D Zumbo
- University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
5
|
Song Z, Du H, Zhang Y, Xu Y. Unraveling Core Functional Microbiota in Traditional Solid-State Fermentation by High-Throughput Amplicons and Metatranscriptomics Sequencing. Front Microbiol 2017; 8:1294. [PMID: 28769888 PMCID: PMC5509801 DOI: 10.3389/fmicb.2017.01294] [Citation(s) in RCA: 140] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2017] [Accepted: 06/27/2017] [Indexed: 11/24/2022] Open
Abstract
Fermentation microbiota is specific microorganisms that generate different types of metabolites in many productions. In traditional solid-state fermentation, the structural composition and functional capacity of the core microbiota determine the quality and quantity of products. As a typical example of food fermentation, Chinese Maotai-flavor liquor production involves a complex of various microorganisms and a wide variety of metabolites. However, the microbial succession and functional shift of the core microbiota in this traditional food fermentation remain unclear. Here, high-throughput amplicons (16S rRNA gene amplicon sequencing and internal transcribed space amplicon sequencing) and metatranscriptomics sequencing technologies were combined to reveal the structure and function of the core microbiota in Chinese soy sauce aroma type liquor production. In addition, ultra-performance liquid chromatography and headspace-solid phase microextraction-gas chromatography-mass spectrometry were employed to provide qualitative and quantitative analysis of the major flavor metabolites. A total of 10 fungal and 11 bacterial genera were identified as the core microbiota. In addition, metatranscriptomic analysis revealed pyruvate metabolism in yeasts (genera Pichia, Schizosaccharomyces, Saccharomyces, and Zygosaccharomyces) and lactic acid bacteria (genus Lactobacillus) classified into two stages in the production of flavor components. Stage I involved high-level alcohol (ethanol) production, with the genus Schizosaccharomyces serving as the core functional microorganism. Stage II involved high-level acid (lactic acid and acetic acid) production, with the genus Lactobacillus serving as the core functional microorganism. The functional shift from the genus Schizosaccharomyces to the genus Lactobacillus drives flavor component conversion from alcohol (ethanol) to acid (lactic acid and acetic acid) in Chinese Maotai-flavor liquor production. Our findings provide insight into the effects of the core functional microbiota in soy sauce aroma type liquor production and the characteristics of the fermentation microbiota under different environmental conditions.
Collapse
Affiliation(s)
- Zhewei Song
- State Key Laboratory of Food Science and Technology, Key Laboratory of Industrial Biotechnology of Ministry of Education, Synergetic Innovation Center of Food Safety and Nutrition, School of Biotechnology, Jiangnan UniversityWuxi, China
| | - Hai Du
- State Key Laboratory of Food Science and Technology, Key Laboratory of Industrial Biotechnology of Ministry of Education, Synergetic Innovation Center of Food Safety and Nutrition, School of Biotechnology, Jiangnan UniversityWuxi, China
| | - Yan Zhang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology - Ministry of Education Key Laboratory of Systems Biomedicine, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong UniversityShanghai, China
| | - Yan Xu
- State Key Laboratory of Food Science and Technology, Key Laboratory of Industrial Biotechnology of Ministry of Education, Synergetic Innovation Center of Food Safety and Nutrition, School of Biotechnology, Jiangnan UniversityWuxi, China
| |
Collapse
|
6
|
|
7
|
Belalia M, Bouezmarni T, Lemyre FC, Taamouti A. Testing independence based on Bernstein empirical copula and copula density. J Nonparametr Stat 2017. [DOI: 10.1080/10485252.2017.1303063] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- M. Belalia
- Département de mathématiques, Université de Sherbrooke, Sherbrooke, Canada
| | - T. Bouezmarni
- Département de mathématiques, Université de Sherbrooke, Sherbrooke, Canada
| | - F. C. Lemyre
- Département de mathématiques, Université de Sherbrooke, Sherbrooke, Canada
| | - A. Taamouti
- Durham University Business School, Durham University, UK
| |
Collapse
|
8
|
Zhao J, Zhang Y, Qian Y, Pan Z, Zhu Y, Zhang Y, Guo J, Xu L. Coincidence of variation in potato yield and climate in northern China. THE SCIENCE OF THE TOTAL ENVIRONMENT 2016; 573:965-973. [PMID: 27599060 DOI: 10.1016/j.scitotenv.2016.08.195] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Revised: 08/29/2016] [Accepted: 08/29/2016] [Indexed: 06/06/2023]
Abstract
Understanding the effects of climate change on crops is vital for food security. We aimed to characterise the coincidence of yield variations with weather variable for potato in northern China using long-term datasets. Daily climate variables obtained from 607 meteorological stations from 1961 to 2014, detailed field experimental data for the period of 1982 to 2012 in northern China, and multivariate linear statistical model were used in this study. In particular, the first difference method was used to disentangle the contributions of climate change to potato yield. We concluded that during the potato growing, the average daily, maximum and minimum temperatures significantly increased by 0.23°C per decade, 0.20°C per decade and 0.36°C per decade from 1961 to 2014 in northern China, respectively. However, average total radiation, total annual precipitation and potential evapotranspiration from April to September all exhibited downward trends, but the variation of evapotranspiration (-9.99mm per decade) was greater than that of precipitation (-2.65mm per decade). The key climatic factors limiting potato yields in northern China over the past 30years at a regional scale were diurnal temperature range, precipitation, radiation and ET0. The potato yield in northern China was the most sensitive to variation of the diurnal temperature range followed by radiation, precipitation and reference crop evapotranspiration (ET0). Specifically, when the diurnal temperature range decreased 1°C, the potato yield increased 543.9kg·ha-1. When the total radiation decreased 1MJ·m2, the potato yield increased 63.8kg·ha-1. When the ET0 decreased 1mm, the potato yield increased 62.7kg·ha-1. When the precipitation increased 1mm, the potato yield increased 62.9kg·ha-1. A regression model describing the combined effects of different climate variables on potato yield in northern China was established.
Collapse
Affiliation(s)
- Junfang Zhao
- State Key Laboratory of Severe Weather, Chinese Academy of Meteorological Sciences, Beijing 100081, China.
| | - Yanhong Zhang
- National Meteorological Center, Beijing 10081, China
| | - Yonglan Qian
- National Meteorological Center, Beijing 10081, China
| | - Zhihua Pan
- College of Resources and Environmental Sciences, China Agricultural University, Beijing 100193, China
| | - Yujie Zhu
- China Meteorological Administration Training Centre, Beijing 10081, China
| | - Yi Zhang
- State Key Laboratory of Severe Weather, Chinese Academy of Meteorological Sciences, Beijing 100081, China
| | - Jianping Guo
- State Key Laboratory of Severe Weather, Chinese Academy of Meteorological Sciences, Beijing 100081, China
| | - Lingling Xu
- National Meteorological Center, Beijing 10081, China
| |
Collapse
|
9
|
Molnár F, Derzsy N, Szymanski BK, Korniss G. Building damage-resilient dominating sets in complex networks against random and targeted attacks. Sci Rep 2015; 5:8321. [PMID: 25662371 PMCID: PMC4321165 DOI: 10.1038/srep08321] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Accepted: 01/14/2015] [Indexed: 11/09/2022] Open
Abstract
We study the vulnerability of dominating sets against random and targeted node removals in complex networks. While small, cost-efficient dominating sets play a significant role in controllability and observability of these networks, a fixed and intact network structure is always implicitly assumed. We find that cost-efficiency of dominating sets optimized for small size alone comes at a price of being vulnerable to damage; domination in the remaining network can be severely disrupted, even if a small fraction of dominator nodes are lost. We develop two new methods for finding flexible dominating sets, allowing either adjustable overall resilience, or dominating set size, while maximizing the dominated fraction of the remaining network after the attack. We analyze the efficiency of each method on synthetic scale-free networks, as well as real complex networks.
Collapse
Affiliation(s)
- F. Molnár
- Department of Physics, Applied Physics, and Astronomy, Rensselaer Polytechnic Institute, 110 8 Street, Troy, NY, 12180-3590 USA
- Social Cognitive Networks Academic Research Center, Rensselaer Polytechnic Institute, 110 8 Street, Troy, NY, 12180-3590 USA
| | - N. Derzsy
- Department of Physics, Applied Physics, and Astronomy, Rensselaer Polytechnic Institute, 110 8 Street, Troy, NY, 12180-3590 USA
- Social Cognitive Networks Academic Research Center, Rensselaer Polytechnic Institute, 110 8 Street, Troy, NY, 12180-3590 USA
| | - B. K. Szymanski
- Social Cognitive Networks Academic Research Center, Rensselaer Polytechnic Institute, 110 8 Street, Troy, NY, 12180-3590 USA
- Department of Computer Science, Rensselaer Polytechnic Institute, 110 8 Street, Troy, NY, 12180-3590 USA
| | - G. Korniss
- Department of Physics, Applied Physics, and Astronomy, Rensselaer Polytechnic Institute, 110 8 Street, Troy, NY, 12180-3590 USA
- Social Cognitive Networks Academic Research Center, Rensselaer Polytechnic Institute, 110 8 Street, Troy, NY, 12180-3590 USA
| |
Collapse
|
10
|
Litvak N, van der Hofstad R. Uncovering disassortativity in large scale-free networks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2013; 87:022801. [PMID: 23496562 DOI: 10.1103/physreve.87.022801] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2012] [Revised: 12/14/2012] [Indexed: 06/01/2023]
Abstract
Mixing patterns in large self-organizing networks, such as the Internet, the World Wide Web, and social and biological networks, are often characterized by degree-degree dependencies between neighboring nodes. In this paper, we propose a new way of measuring degree-degree dependencies. One of the problems with the commonly used assortativity coefficient is that in disassortative networks its magnitude decreases with the network size. We mathematically explain this phenomenon and validate the results on synthetic graphs and real-world network data. As an alternative, we suggest to use rank correlation measures such as Spearman's ρ. Our experiments convincingly show that Spearman's ρ produces consistent values in graphs of different sizes but similar structure, and it is able to reveal strong (positive or negative) dependencies in large graphs. In particular, we discover much stronger negative degree-degree dependencies in Web graphs than was previously thought. Rank correlations allow us to compare the assortativity of networks of different sizes, which is impossible with the assortativity coefficient due to its genuine dependence on the network size. We conclude that rank correlations provide a suitable and informative method for uncovering network mixing patterns.
Collapse
Affiliation(s)
- Nelly Litvak
- University of Twente, Faculty of Electrical Engineering, Mathematics and Computer Sciences, P.O. Box 217, 7500 AE, Enschede, The Netherlands.
| | | |
Collapse
|
11
|
|
12
|
|
13
|
|
14
|
Linear rank tests for independence in bivariate distributions—power comparisons by simulation. Comput Stat Data Anal 2004. [DOI: 10.1016/j.csda.2003.09.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|