1
|
Browne RP, Bagnato L, Punzo A. Parsimony and parameter estimation for mixtures of multivariate leptokurtic-normal distributions. ADV DATA ANAL CLASSI 2023; 18:597-625. [PMID: 39309701 PMCID: PMC11411007 DOI: 10.1007/s11634-023-00558-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 07/02/2023] [Indexed: 09/25/2024]
Abstract
Mixtures of multivariate leptokurtic-normal distributions have been recently introduced in the clustering literature based on mixtures of elliptical heavy-tailed distributions. They have the advantage of having parameters directly related to the moments of practical interest. We derive two estimation procedures for these mixtures. The first one is based on the majorization-minimization algorithm, while the second is based on a fixed point approximation. Moreover, we introduce parsimonious forms of the considered mixtures and we use the illustrated estimation procedures to fit them. We use simulated and real data sets to investigate various aspects of the proposed models and algorithms. Supplementary Information The online version contains supplementary material available at 10.1007/s11634-023-00558-2.
Collapse
Affiliation(s)
- Ryan P. Browne
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON Canada
| | - Luca Bagnato
- Department of Economic and Social Sciences, Catholic University of the Sacred Heart, Milano, Italy
| | - Antonio Punzo
- Department of Economics and Business, University of Catania, Catania, Italy
| |
Collapse
|
2
|
Sugasawa S, Kobayashi G. Robust fitting of mixture models using weighted complete estimating equations. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
3
|
Browne RP. Revitalizing the multivariate elliptical leptokurtic-normal distribution and its application in model-based clustering. Stat Probab Lett 2022. [DOI: 10.1016/j.spl.2022.109640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
4
|
Dimension-wise scaled normal mixtures with application to finance and biometry. J MULTIVARIATE ANAL 2022. [DOI: 10.1016/j.jmva.2022.105020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
5
|
Bagnato L, Punzo A, Zoia MG. Leptokurtic moment-parameterized elliptically contoured distributions with application to financial stock returns. COMMUN STAT-THEOR M 2022. [DOI: 10.1080/03610926.2020.1751202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Luca Bagnato
- Dipartimento di Scienze Economiche e Sociali, Università Cattolica del Sacro Cuore, Piacenza, Italy
| | - Antonio Punzo
- Dipartimento di Economia e Impresa, Università di Catania, Catania, Italy
| | - Maria Grazia Zoia
- Dipartimento di Politica Economica, Università Cattolica del Sacro Cuore, Milan, Italy
| |
Collapse
|
6
|
Tomarchio SD, Gallaugher MP, Punzo A, McNicholas PD. Mixtures of Matrix-Variate Contaminated Normal Distributions. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2021.1999825] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
| | | | - Antonio Punzo
- Department of Economics and Business, University of Catania, Catania, Italy
| | - Paul D. McNicholas
- Department of Mathematics and Statistics, McMaster University, Hamilton, Ontario, Canada
| |
Collapse
|
7
|
Punzo A, Bagnato L. Multiple scaled symmetric distributions in allometric studies. Int J Biostat 2021; 18:219-242. [PMID: 33730771 DOI: 10.1515/ijb-2020-0059] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Accepted: 12/07/2020] [Indexed: 11/15/2022]
Abstract
In allometric studies, the joint distribution of the log-transformed morphometric variables is typically symmetric and with heavy tails. Moreover, in the bivariate case, it is customary to explain the morphometric variation of these variables by fitting a convenient line, as for example the first principal component (PC). To account for all these peculiarities, we propose the use of multiple scaled symmetric (MSS) distributions. These distributions have the advantage to be directly defined in the PC space, the kind of symmetry involved is less restrictive than the commonly considered elliptical symmetry, the behavior of the tails can vary across PCs, and their first PC is less sensitive to outliers. In the family of MSS distributions, we also propose the multiple scaled shifted exponential normal distribution, equivalent of the multivariate shifted exponential normal distribution in the MSS framework. For the sake of parsimony, we also allow the parameter governing the leptokurtosis on each PC, in the considered MSS distributions, to be tied across PCs. From an inferential point of view, we describe an EM algorithm to estimate the parameters by maximum likelihood, we illustrate how to compute standard errors of the obtained estimates, and we give statistical tests and confidence intervals for the parameters. We use artificial and real allometric data to appreciate the advantages of the MSS distributions over well-known elliptically symmetric distributions and to compare the robustness of the line from our models with respect to the lines fitted by well-established robust and non-robust methods available in the literature.
Collapse
Affiliation(s)
- Antonio Punzo
- Dipartimento di Economia e Impresa, Università di Catania, Catania, Italy
| | - Luca Bagnato
- Dipartimento di Scienze Economiche e Sociali, Università Cattolica del Sacro Cuore, Piacenza, Italy
| |
Collapse
|
8
|
Bagnato L, Punzo A. Unconstrained representation of orthogonal matrices with application to common principal components. Comput Stat 2020. [DOI: 10.1007/s00180-020-01041-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
AbstractMany statistical problems involve the estimation of a $$\left( d\times d\right) $$
d
×
d
orthogonal matrix $$\varvec{Q}$$
Q
. Such an estimation is often challenging due to the orthonormality constraints on $$\varvec{Q}$$
Q
. To cope with this problem, we use the well-known PLU decomposition, which factorizes any invertible $$\left( d\times d\right) $$
d
×
d
matrix as the product of a $$\left( d\times d\right) $$
d
×
d
permutation matrix $$\varvec{P}$$
P
, a $$\left( d\times d\right) $$
d
×
d
unit lower triangular matrix $$\varvec{L}$$
L
, and a $$\left( d\times d\right) $$
d
×
d
upper triangular matrix $$\varvec{U}$$
U
. Thanks to the QR decomposition, we find the formulation of $$\varvec{U}$$
U
when the PLU decomposition is applied to $$\varvec{Q}$$
Q
. We call the result as PLR decomposition; it produces a one-to-one correspondence between $$\varvec{Q}$$
Q
and the $$d\left( d-1\right) /2$$
d
d
-
1
/
2
entries below the diagonal of $$\varvec{L}$$
L
, which are advantageously unconstrained real values. Thus, once the decomposition is applied, regardless of the objective function under consideration, we can use any classical unconstrained optimization method to find the minimum (or maximum) of the objective function with respect to $$\varvec{L}$$
L
. For illustrative purposes, we apply the PLR decomposition in common principle components analysis (CPCA) for the maximum likelihood estimation of the common orthogonal matrix when a multivariate leptokurtic-normal distribution is assumed in each group. Compared to the commonly used normal distribution, the leptokurtic-normal has an additional parameter governing the excess kurtosis; this makes the estimation of $$\varvec{Q}$$
Q
in CPCA more robust against mild outliers. The usefulness of the PLR decomposition in leptokurtic-normal CPCA is illustrated by two biometric data analyses.
Collapse
|
9
|
Punzo A, Bagnato L. The multivariate tail-inflated normal distribution and its application in finance. J STAT COMPUT SIM 2020. [DOI: 10.1080/00949655.2020.1805451] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Antonio Punzo
- Dipartimento di Economia e Impresa, Università degli Studi di Catania, Catania, Italy
| | - Luca Bagnato
- Dipartimento di Scienze Economiche e Sociali, Università Cattolica del Sacro Cuore, Piacenza, Italy
| |
Collapse
|
10
|
Yang YC, Lin TI, Castro LM, Wang WL. Extending finite mixtures of t linear mixed-effects models with concomitant covariates. Comput Stat Data Anal 2020. [DOI: 10.1016/j.csda.2020.106961] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
11
|
Punzo A, Bagnato L. Allometric analysis using the multivariate shifted exponential normal distribution. Biom J 2020; 62:1525-1543. [PMID: 32240556 DOI: 10.1002/bimj.201900248] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Revised: 01/30/2020] [Accepted: 02/02/2020] [Indexed: 11/10/2022]
Abstract
In allometric studies, the joint distribution of the log-transformed morphometric variables is typically elliptical and with heavy tails. To account for these peculiarities, we introduce the multivariate shifted exponential normal (MSEN) distribution , an elliptical heavy-tailed generalization of the multivariate normal (MN). The MSEN belongs to the family of MN scale mixtures (MNSMs) by choosing a convenient shifted exponential as mixing distribution. The probability density function of the MSEN has a simple closed-form characterized by only one additional parameter, with respect to the nested MN, governing the tail weight. The first four moments exist and the excess kurtosis can assume any positive value. The membership to the family of MNSMs allows us a simple computation of the maximum likelihood (ML) estimates of the parameters via the expectation-maximization (EM) algorithm; advantageously, the M-step is computationally simplified by closed-form updates of all the parameters. We also evaluate the existence of the ML estimates. Since the parameter governing the tail weight is estimated from the data, robust estimates of the mean vector of the nested MN distribution are automatically obtained by downweighting; we show this aspect theoretically but also by means of a simulation study. We fit the MSEN distribution to multivariate allometric data where we show its usefulness also in comparison with other well-established multivariate elliptical distributions.
Collapse
Affiliation(s)
- Antonio Punzo
- Dipartimento di Economia e Impresa, Università di Catania, Catania, Sicilia, Italy
| | - Luca Bagnato
- Dipartimento di Scienze Economiche e Sociali, Università Cattolica del Sacro Cuore, Piacenza, Emilia-Romagna, Italy
| |
Collapse
|
12
|
Punzo A, Tortora C. Multiple scaled contaminated normal distribution and its application in clustering. STAT MODEL 2019. [DOI: 10.1177/1471082x19890935] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The multivariate contaminated normal (MCN) distribution represents a simple heavy-tailed generalization of the multivariate normal (MN) distribution to model elliptical contoured scatters in the presence of mild outliers (also referred to as ‘bad’ points herein) and automatically detect bad points. The price of these advantages is two additional parameters: proportion of good observations and degree of contamination. However, in a multivariate setting, only one proportion of good observations and only one degree of contamination may be limiting. To overcome this limitation, we propose a multiple scaled contaminated normal (MSCN) distribution. Among its parameters, we have an orthogonal matrix Γ. In the space spanned by the vectors (principal components) of Γ, there is a proportion of good observations and a degree of contamination for each component. Moreover, each observation has a posterior probability of being good with respect to each principal component. Thanks to this probability, the method provides directional robust estimates of the parameters of the nested MN and automatic directional detection of bad points. The term ‘directional’ is added to specify that the method works separately for each principal component. Mixtures of MSCN distributions are also proposed, and an expectation-maximization algorithm is used for parameter estimation. Real and simulated data are considered to show the usefulness of our mixture with respect to well-established mixtures of symmetric distributions with heavy tails.
Collapse
Affiliation(s)
- Antonio Punzo
- Department of Economics and Business, University of Catania, Catania, Italy
| | - Cristina Tortora
- Department of Mathematics and Statistics, San José State University, San José, CA, USA
| |
Collapse
|
13
|
|
14
|
Morris K, Punzo A, McNicholas PD, Browne RP. Asymmetric clusters and outliers: Mixtures of multivariate contaminated shifted asymmetric Laplace distributions. Comput Stat Data Anal 2019. [DOI: 10.1016/j.csda.2018.12.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
15
|
|
16
|
Punzo A. A new look at the inverse Gaussian distribution with applications to insurance and economic data. J Appl Stat 2018. [DOI: 10.1080/02664763.2018.1542668] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Antonio Punzo
- Department of Economics and Business, University of Catania, Catania, Italy
| |
Collapse
|
17
|
|
18
|
Punzo A, Mazza A, Maruotti A. Fitting insurance and economic data with outliers: a flexible approach based on finite mixtures of contaminated gamma distributions. J Appl Stat 2018. [DOI: 10.1080/02664763.2018.1428288] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Antonio Punzo
- Dipartimento di Economia e Impresa, Università di Catania, Catania, Italy
| | - Angelo Mazza
- Dipartimento di Economia e Impresa, Università di Catania, Catania, Italy
| | - Antonello Maruotti
- Dipartimento di Scienze Economiche, Politiche e delle Lingue Moderne, Libera Università Maria Ss. Assunta, Roma, Italy
- Centre for Innovation and Leadership in Health Sciences, University of Southampton, Southampton, UK
| |
Collapse
|
19
|
|