1
|
Lu Z, Chandra NK. A sparse factor model for clustering high-dimensional longitudinal data. Stat Med 2024; 43:3633-3648. [PMID: 38885953 DOI: 10.1002/sim.10151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 04/09/2024] [Accepted: 06/06/2024] [Indexed: 06/20/2024]
Abstract
Recent advances in engineering technologies have enabled the collection of a large number of longitudinal features. This wealth of information presents unique opportunities for researchers to investigate the complex nature of diseases and uncover underlying disease mechanisms. However, analyzing such kind of data can be difficult due to its high dimensionality, heterogeneity and computational challenges. In this article, we propose a Bayesian nonparametric mixture model for clustering high-dimensional mixed-type (eg, continuous, discrete and categorical) longitudinal features. We employ a sparse factor model on the joint distribution of random effects and the key idea is to induce clustering at the latent factor level instead of the original data to escape the curse of dimensionality. The number of clusters is estimated through a Dirichlet process prior. An efficient Gibbs sampler is developed to estimate the posterior distribution of the model parameters. Analysis of real and simulated data is presented and discussed. Our study demonstrates that the proposed model serves as a useful analytical tool for clustering high-dimensional longitudinal data.
Collapse
Affiliation(s)
- Zihang Lu
- Department of Public Health Sciences, Queen's University, Kingston, Ontario, Canada
- Department of Mathematics and Statistics, Queen's University, Kingston, Ontario, Canada
| | - Noirrit Kiran Chandra
- Department of Mathematical Sciences, The University of Texas at Dallas, Richardson, Texas, USA
| |
Collapse
|
2
|
Falck F, Zhu X, Ghalebikesabi S, Kormaksson M, Vandemeulebroecke M, Zhang C, Martin R, Gardiner S, Kwok CH, West DM, Santos L, Tian C, Pang Y, Readie A, Ligozio G, Gandhi KK, Nichols TE, Mallon AM, Kelly L, Ohlssen D, Nicholson G. A framework for longitudinal latent factor modelling of treatment response in clinical trials with applications to Psoriatic Arthritis and Rheumatoid Arthritis. J Biomed Inform 2024; 154:104641. [PMID: 38642627 DOI: 10.1016/j.jbi.2024.104641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 03/10/2024] [Accepted: 04/11/2024] [Indexed: 04/22/2024]
Abstract
OBJECTIVE Clinical trials involve the collection of a wealth of data, comprising multiple diverse measurements performed at baseline and follow-up visits over the course of a trial. The most common primary analysis is restricted to a single, potentially composite endpoint at one time point. While such an analytical focus promotes simple and replicable conclusions, it does not necessarily fully capture the multi-faceted effects of a drug in a complex disease setting. Therefore, to complement existing approaches, we set out here to design a longitudinal multivariate analytical framework that accepts as input an entire clinical trial database, comprising all measurements, patients, and time points across multiple trials. METHODS Our framework composes probabilistic principal component analysis with a longitudinal linear mixed effects model, thereby enabling clinical interpretation of multivariate results, while handling data missing at random, and incorporating covariates and covariance structure in a computationally efficient and principled way. RESULTS We illustrate our approach by applying it to four phase III clinical trials of secukinumab in Psoriatic Arthritis (PsA) and Rheumatoid Arthritis (RA). We identify three clinically plausible latent factors that collectively explain 74.5% of empirical variation in the longitudinal patient database. We estimate longitudinal trajectories of these factors, thereby enabling joint characterisation of disease progression and drug effect. We perform benchmarking experiments demonstrating our method's competitive performance at estimating average treatment effects compared to existing statistical and machine learning methods, and showing that our modular approach leads to relatively computationally efficient model fitting. CONCLUSION Our multivariate longitudinal framework has the potential to illuminate the properties of existing composite endpoint methods, and to enable the development of novel clinical endpoints that provide enhanced and complementary perspectives on treatment response.
Collapse
Affiliation(s)
- Fabian Falck
- Department of Statistics, University of Oxford, UK; The Alan Turing Institute, London, UK
| | - Xuan Zhu
- Novartis Pharmaceuticals Corporation, East Hanover, United States
| | | | | | | | - Cong Zhang
- China Novartis Institutes for Bio-medical Research CO., Shanghai, China
| | - Ruvie Martin
- Novartis Pharmaceuticals Corporation, East Hanover, United States
| | - Stephen Gardiner
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Population Health, University of Oxford, UK
| | | | | | | | - Chengeng Tian
- China Novartis Institutes for Bio-medical Research CO., Shanghai, China
| | - Yu Pang
- China Novartis Institutes for Bio-medical Research CO., Shanghai, China
| | - Aimee Readie
- Novartis Pharmaceuticals Corporation, East Hanover, United States
| | - Gregory Ligozio
- Novartis Pharmaceuticals Corporation, East Hanover, United States
| | - Kunal K Gandhi
- Novartis Pharmaceuticals Corporation, East Hanover, United States
| | - Thomas E Nichols
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Population Health, University of Oxford, UK; Wellcome Centre for Integrative Neuroimaging, Nuffield Department of Clinical Neurosciences, University of Oxford, UK
| | | | - Luke Kelly
- School of Mathematical Sciences, University College Cork, Ireland
| | - David Ohlssen
- Novartis Pharmaceuticals Corporation, East Hanover, United States
| | | |
Collapse
|
3
|
Abstract
Summary
Factorization models express a statistical object of interest in terms of a collection of simpler objects. For example, a matrix or tensor can be expressed as a sum of rank-one components. However, in practice, it can be challenging to infer the relative impact of the different components as well as the number of components. A popular idea is to include infinitely many components having impact decreasing with the component index. This article is motivated by two limitations of existing methods: (i) the lack of careful consideration of the within component sparsity structure; and (ii) no accommodation for grouped variables and other non-exchangeable structures. We propose a general class of infinite factorization models that address these limitations. Theoretical support is provided, practical gains are shown in simulation studies, and an ecology application focusing on modelling bird species occurrence is discussed.
Collapse
Affiliation(s)
- L Schiavon
- Department of Statistical Sciences, University of Padova, Via Cesare Battisti 241, 35121 Padova, Italy
| | - A Canale
- Department of Statistical Sciences, University of Padova, Via Cesare Battisti 241, 35121 Padova, Italy
| | - D B Dunson
- Department of Statistical Science, Duke University, Box 90251, Durham, North Carolina 27708, U.S.A
| |
Collapse
|
4
|
Estimating the Variance of Estimator of the Latent Factor Linear Mixed Model Using Supplemented Expectation-Maximization Algorithm. Symmetry (Basel) 2021. [DOI: 10.3390/sym13071286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
This paper deals with symmetrical data that can be modelled based on Gaussian distribution, such as linear mixed models for longitudinal data. The latent factor linear mixed model (LFLMM) is a method generally used for analysing changes in high-dimensional longitudinal data. It is usual that the model estimates are based on the expectation-maximization (EM) algorithm, but unfortunately, the algorithm does not produce the standard errors of the regression coefficients, which then hampers testing procedures. To fill in the gap, the Supplemented EM (SEM) algorithm for the case of fixed variables is proposed in this paper. The computational aspects of the SEM algorithm have been investigated by means of simulation. We also calculate the variance matrix of beta using the second moment as a benchmark to compare with the asymptotic variance matrix of beta of SEM. Both the second moment and SEM produce symmetrical results, the variance estimates of beta are getting smaller when number of subjects in the simulation increases. In addition, the practical usefulness of this work was illustrated using real data on political attitudes and behaviour in Flanders-Belgium.
Collapse
|
5
|
Adjakossa EH, Hounkonnou NM, Nuel G. Computationally Stable Estimation Procedure for the Multivariate Linear Mixed-Effect Model and Application to Malaria Public Health Problem. Int J Biostat 2019; 15:ijb-2017-0076. [PMID: 31226099 DOI: 10.1515/ijb-2017-0076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2017] [Accepted: 05/05/2019] [Indexed: 11/15/2022]
Abstract
In this paper, we provide the ML (Maximum Likelihood) and the REML (REstricted ML) criteria for consistently estimating multivariate linear mixed-effects models with arbitrary correlation structure between the random effects across dimensions, but independent (and possibly heteroscedastic) residuals. By factorizing the random effects covariance matrix, we provide an explicit expression of the profiled deviance through a reparameterization of the model. This strategy can be viewed as the generalization of the estimation procedure used by Douglas Bates and his co-authors in the context of the fitting of one-dimensional linear mixed-effects models. Beside its robustness regarding the starting points, the approach enables a numerically consistent estimate of the random effects covariance matrix while classical alternatives such as the EM algorithm are usually non-consistent. In a simulation study, we compare the estimates obtained from the present method with the EM algorithm-based estimates. We finally apply the method to a study of an immune response to Malaria in Benin.
Collapse
Affiliation(s)
- Eric Houngla Adjakossa
- International Chair in Mathematical Physics and Applications (ICMPA-UNESCO Chair), Université d'Abomey-Calavi, Cotonou, Benin.,Laboratoire de Probabilités, Statistique et Modélisation (UMR 8001), Sorbonne Université, Paris, France
| | - Norbert Mahouton Hounkonnou
- International Chair in Mathematical Physics and Applications (ICMPA-UNESCO Chair), Université d'Abomey-Calavi, Cotonou, Benin
| | - Grégory Nuel
- Laboratoire de Probabilités, Statistique et Modélisation (UMR 8001), Sorbonne Université, Paris, France
| |
Collapse
|
6
|
Wang J, Luo S. Multidimensional latent trait linear mixed model: an application in clinical studies with multivariate longitudinal outcomes. Stat Med 2017; 36:3244-3256. [PMID: 28569393 DOI: 10.1002/sim.7347] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Revised: 04/24/2017] [Accepted: 05/01/2017] [Indexed: 12/11/2022]
Abstract
Multilevel item response theory (MLIRT) models have been widely used to analyze the multivariate longitudinal data of mixed types (e.g., categorical and continuous) in clinical studies. The MLIRT models often have unidimensional assumption, that is, the multiple outcomes are clinical manifestations of a univariate latent variable. However, the unidimensional assumption may be unrealistic because some diseases may be heterogeneous and characterized by multiple impaired domains with variable clinical symptoms and disease progressions. We relax this assumption and propose a multidimensional latent trait linear mixed model (MLTLMM) to allow multiple latent variables and within-item multidimensionality (one outcome can be a manifestation of more than one latent variable). We conduct extensive simulation studies to assess the unidimensional MLIRT model and the proposed MLTLMM model. The simulation studies suggest that the MLTLMM model outperforms unidimensional model when the multivariate longitudinal outcomes are manifested by multiple latent variables. The proposed model is applied to two motivating studies of amyotrophic lateral sclerosis: a clinical trial of ceftriaxone and the Pooled Resource Open-Access ALS Clinical Trials database. Copyright © 2017 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Jue Wang
- Department of Biostatistics, The University of Texas Health Science Center at Houston, Houston, 77030, TX, U.S.A
| | - Sheng Luo
- Department of Biostatistics, The University of Texas Health Science Center at Houston, Houston, 77030, TX, U.S.A
| |
Collapse
|
7
|
Adjakossa EH, Sadissou I, Hounkonnou MN, Nuel G. Multivariate Longitudinal Analysis with Bivariate Correlation Test. PLoS One 2016; 11:e0159649. [PMID: 27537692 PMCID: PMC4990185 DOI: 10.1371/journal.pone.0159649] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Accepted: 07/06/2016] [Indexed: 12/02/2022] Open
Abstract
In the context of multivariate multilevel data analysis, this paper focuses on the multivariate linear mixed-effects model, including all the correlations between the random effects when the dimensional residual terms are assumed uncorrelated. Using the EM algorithm, we suggest more general expressions of the model's parameters estimators. These estimators can be used in the framework of the multivariate longitudinal data analysis as well as in the more general context of the analysis of multivariate multilevel data. By using a likelihood ratio test, we test the significance of the correlations between the random effects of two dependent variables of the model, in order to investigate whether or not it is useful to model these dependent variables jointly. Simulation studies are done to assess both the parameter recovery performance of the EM estimators and the power of the test. Using two empirical data sets which are of longitudinal multivariate type and multivariate multilevel type, respectively, the usefulness of the test is illustrated.
Collapse
Affiliation(s)
- Eric Houngla Adjakossa
- Laboratoire de Probabilités et Modèles Aléatoires /Université Pierre et Marie Curie, Case courrier 188 - 4, Place Jussieu 75252 Paris cedex 05 France
- University of Abomey-Calavi, 072 B.P. 50 Cotonou, Republic of Benin
| | - Ibrahim Sadissou
- Laboratoire de Biologie et de Physiologie Cellulaires /University of Abomey-Calavi, Cotonou, Republic of Benin
- Centre d’Etude et de Recherche sur le Paludisme Associé à la Grossesse et à l’Enfance (CERPAGE), Cotonou, Republic of Benin
| | | | - Gregory Nuel
- Laboratoire de Probabilités et Modèles Aléatoires /Université Pierre et Marie Curie, Case courrier 188 - 4, Place Jussieu 75252 Paris cedex 05 France
| |
Collapse
|
8
|
Kondaurova MV, Bergeson TR, Xu H, Kitamura C. Affective Properties of Mothers' Speech to Infants With Hearing Impairment and Cochlear Implants. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2015; 58:590-600. [PMID: 25679195 PMCID: PMC4610283 DOI: 10.1044/2015_jslhr-s-14-0095] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2014] [Revised: 10/01/2014] [Accepted: 01/21/2015] [Indexed: 05/08/2023]
Abstract
PURPOSE The affective properties of infant-directed speech influence the attention of infants with normal hearing to speech sounds. This study explored the affective quality of maternal speech to infants with hearing impairment (HI) during the 1st year after cochlear implantation as compared to speech to infants with normal hearing. METHOD Mothers of infants with HI and mothers of infants with normal hearing matched by age (NH-AM) or hearing experience (NH-EM) were recorded playing with their infants during 3 sessions over a 12-month period. Speech samples of 25 s were low-pass filtered, leaving intonation but not speech information intact. Sixty adults rated the stimuli along 5 scales: positive/negative affect and intention to express affection, to encourage attention, to comfort/soothe, and to direct behavior. RESULTS Low-pass filtered speech to HI and NH-EM groups was rated as more positive, affective, and comforting compared with the such speech to the NH-AM group. Speech to infants with HI and with NH-AM was rated as more directive than speech to the NH-EM group. Mothers decreased affective qualities in speech to all infants but increased directive qualities in speech to infants with NH-EM over time. CONCLUSIONS Mothers fine-tune communicative intent in speech to their infant's developmental stage. They adjust affective qualities to infants' hearing experience rather than to chronological age but adjust directive qualities of speech to the chronological age of their infants.
Collapse
Affiliation(s)
| | | | - Huiping Xu
- Indiana University–Purdue University Indianapolis
| | | |
Collapse
|
9
|
Bentler PM, Huang W. On Components, Latent Variables, PLS and Simple Methods: Reactions to Rigdon's Rethinking of PLS. LONG RANGE PLANNING 2014; 47:138-145. [PMID: 24926106 PMCID: PMC4048869 DOI: 10.1016/j.lrp.2014.02.005] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Rigdon (2012) suggests that partial least squares (PLS) can be improved by killing it, that is, by making it into a different methodology based on components. We provide some history on problems with component-type methods and develop some implications of Rigdon's suggestion. It seems more appropriate to maintain and improve PLS as far as possible, but also to freely utilize alternative models and methods when those are more relevant in certain data analytic situations. Huang's (2013) new consistent and efficient PLSe2 methodology is suggested as a candidate for an improved PLS.
Collapse
|