1
|
Guo X, Zeng D, Wang Y. HMM for discovering decision-making dynamics using reinforcement learning experiments. Biostatistics 2024:kxae033. [PMID: 39226534 DOI: 10.1093/biostatistics/kxae033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 07/20/2024] [Accepted: 07/25/2024] [Indexed: 09/05/2024] Open
Abstract
Major depressive disorder (MDD), a leading cause of years of life lived with disability, presents challenges in diagnosis and treatment due to its complex and heterogeneous nature. Emerging evidence indicates that reward processing abnormalities may serve as a behavioral marker for MDD. To measure reward processing, patients perform computer-based behavioral tasks that involve making choices or responding to stimulants that are associated with different outcomes, such as gains or losses in the laboratory. Reinforcement learning (RL) models are fitted to extract parameters that measure various aspects of reward processing (e.g. reward sensitivity) to characterize how patients make decisions in behavioral tasks. Recent findings suggest the inadequacy of characterizing reward learning solely based on a single RL model; instead, there may be a switching of decision-making processes between multiple strategies. An important scientific question is how the dynamics of strategies in decision-making affect the reward learning ability of individuals with MDD. Motivated by the probabilistic reward task within the Establishing Moderators and Biosignatures of Antidepressant Response in Clinical Care (EMBARC) study, we propose a novel RL-HMM (hidden Markov model) framework for analyzing reward-based decision-making. Our model accommodates decision-making strategy switching between two distinct approaches under an HMM: subjects making decisions based on the RL model or opting for random choices. We account for continuous RL state space and allow time-varying transition probabilities in the HMM. We introduce a computationally efficient Expectation-maximization (EM) algorithm for parameter estimation and use a nonparametric bootstrap for inference. Extensive simulation studies validate the finite-sample performance of our method. We apply our approach to the EMBARC study to show that MDD patients are less engaged in RL compared to the healthy controls, and engagement is associated with brain activities in the negative affect circuitry during an emotional conflict task.
Collapse
Affiliation(s)
- Xingche Guo
- Department of Biostatistics, Columbia University, 722 West 168th St, New York, NY, 10032, United States
| | - Donglin Zeng
- Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, MI, 48109, United States
| | - Yuanjia Wang
- Department of Biostatistics, Columbia University, 722 West 168th St, New York, NY, 10032, United States
- Department of Psychiatry, Columbia University, 1051 Riverside Drive, New York, NY, 10032, United States
| |
Collapse
|
2
|
Qu Y, Lee CY. Estimation of standardized real-time fatality rate for ongoing epidemics. PLoS One 2024; 19:e0303861. [PMID: 38771824 PMCID: PMC11108209 DOI: 10.1371/journal.pone.0303861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 05/02/2024] [Indexed: 05/23/2024] Open
Abstract
BACKGROUND The fatality rate is a crucial metric for guiding public health policies during an ongoing epidemic. For COVID-19, the age structure of the confirmed cases changes over time, bringing a substantial impact on the real-time estimation of fatality. A 'spurious decrease' in fatality rate can be caused by a shift in confirmed cases towards younger ages even if the fatalities remain unchanged across different ages. METHODS To address this issue, we propose a standardized real-time fatality rate estimator. A simulation study is conducted to evaluate the performance of the estimator. The proposed method is applied for real-time fatality rate estimation of COVID-19 in Germany from March 2020 to May 2022. FINDINGS The simulation results suggest that the proposed estimator can provide an accurate trend of disease fatality in all cases, while the existing estimator may convey a misleading signal of the actual situation when the changes in temporal age distribution take place. The application to Germany data shows that there was an increment in the fatality rate at the implementation of the 'live with COVID' strategy. CONCLUSIONS As many countries have chosen to coexist with the coronavirus, frequent examination of the fatality rate is of paramount importance.
Collapse
Affiliation(s)
- Yuanke Qu
- Department of Computer Science and Engineering, Guangdong Ocean University, Zhanjiang, People’s Republic of China
| | - Chun Yin Lee
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong
| |
Collapse
|
3
|
Sutiene K, Schwendner P, Sipos C, Lorenzo L, Mirchev M, Lameski P, Kabasinskas A, Tidjani C, Ozturkkal B, Cerneviciene J. Enhancing portfolio management using artificial intelligence: literature review. Front Artif Intell 2024; 7:1371502. [PMID: 38650961 PMCID: PMC11033520 DOI: 10.3389/frai.2024.1371502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 03/12/2024] [Indexed: 04/25/2024] Open
Abstract
Building an investment portfolio is a problem that numerous researchers have addressed for many years. The key goal has always been to balance risk and reward by optimally allocating assets such as stocks, bonds, and cash. In general, the portfolio management process is based on three steps: planning, execution, and feedback, each of which has its objectives and methods to be employed. Starting from Markowitz's mean-variance portfolio theory, different frameworks have been widely accepted, which considerably renewed how asset allocation is being solved. Recent advances in artificial intelligence provide methodological and technological capabilities to solve highly complex problems, and investment portfolio is no exception. For this reason, the paper reviews the current state-of-the-art approaches by answering the core question of how artificial intelligence is transforming portfolio management steps. Moreover, as the use of artificial intelligence in finance is challenged by transparency, fairness and explainability requirements, the case study of post-hoc explanations for asset allocation is demonstrated. Finally, we discuss recent regulatory developments in the European investment business and highlight specific aspects of this business where explainable artificial intelligence could advance transparency of the investment process.
Collapse
Affiliation(s)
- Kristina Sutiene
- Department of Mathematical Modeling, Kaunas University of Technology, Kaunas, Lithuania
| | - Peter Schwendner
- School of Management and Law, Institute of Wealth and Asset Management, Zurich University of Applied Sciences, Winterthur, Switzerland
| | - Ciprian Sipos
- Department of Economics and Modelling, West University of Timisoara, Timisoara, Romania
| | - Luis Lorenzo
- Faculty of Statistic Studies, Complutense University of Madrid, Madrid, Spain
| | - Miroslav Mirchev
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, Skopje, North Macedonia
- Complexity Science Hub Vienna, Vienna, Austria
| | - Petre Lameski
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, Skopje, North Macedonia
| | - Audrius Kabasinskas
- Department of Mathematical Modeling, Kaunas University of Technology, Kaunas, Lithuania
| | - Chemseddine Tidjani
- Division of Firms and Industrial Economics, Research Center in Applied Economics for Development, Algiers, Algeria
| | - Belma Ozturkkal
- Department of International Trade and Finance, Kadir Has University, Istanbul, Türkiye
| | - Jurgita Cerneviciene
- Department of Mathematical Modeling, Kaunas University of Technology, Kaunas, Lithuania
| |
Collapse
|
4
|
Lücke M, Winkelmann S, Heitzig J, Molkenthin N, Koltai P. Learning interpretable collective variables for spreading processes on networks. Phys Rev E 2024; 109:L022301. [PMID: 38491651 DOI: 10.1103/physreve.109.l022301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 12/28/2023] [Indexed: 03/18/2024]
Abstract
Collective variables (CVs) are low-dimensional projections of high-dimensional system states. They are used to gain insights into complex emergent dynamical behaviors of processes on networks. The relation between CVs and network measures is not well understood and its derivation typically requires detailed knowledge of both the dynamical system and the network topology. In this Letter, we present a data-driven method for algorithmically learning and understanding CVs for binary-state spreading processes on networks of arbitrary topology. We demonstrate our method using four example networks: the stochastic block model, a ring-shaped graph, a random regular graph, and a scale-free network generated by the Albert-Barabási model. Our results deliver evidence for the existence of low-dimensional CVs even in cases that are not yet understood theoretically.
Collapse
Affiliation(s)
- Marvin Lücke
- Modeling and Simulation of Complex Processes, Zuse Institute Berlin, 14195 Berlin, Germany
| | - Stefanie Winkelmann
- Modeling and Simulation of Complex Processes, Zuse Institute Berlin, 14195 Berlin, Germany
| | - Jobst Heitzig
- FutureLab on Game Theory and Networks of Interacting Agents, Potsdam Institute for Climate Impact Research, 14473 Potsdam, Germany and Zuse Institute Berlin, 14195 Berlin, Germany
| | - Nora Molkenthin
- Complexity Science Department, Potsdam Institute for Climate Impact Research, 14473 Potsdam, Germany
| | - Péter Koltai
- Department of Mathematics, University of Bayreuth, 95447 Bayreuth, Germany
| |
Collapse
|
5
|
Chung MK, House JS, Akhtari FS, Makris KC, Langston MA, Islam KT, Holmes P, Chadeau-Hyam M, Smirnov AI, Du X, Thessen AE, Cui Y, Zhang K, Manrai AK, Motsinger-Reif A, Patel CJ. Decoding the exposome: data science methodologies and implications in exposome-wide association studies (ExWASs). EXPOSOME 2024; 4:osae001. [PMID: 38344436 PMCID: PMC10857773 DOI: 10.1093/exposome/osae001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 10/16/2023] [Accepted: 11/20/2023] [Indexed: 03/07/2024]
Abstract
This paper explores the exposome concept and its role in elucidating the interplay between environmental exposures and human health. We introduce two key concepts critical for exposomics research. Firstly, we discuss the joint impact of genetics and environment on phenotypes, emphasizing the variance attributable to shared and nonshared environmental factors, underscoring the complexity of quantifying the exposome's influence on health outcomes. Secondly, we introduce the importance of advanced data-driven methods in large cohort studies for exposomic measurements. Here, we introduce the exposome-wide association study (ExWAS), an approach designed for systematic discovery of relationships between phenotypes and various exposures, identifying significant associations while controlling for multiple comparisons. We advocate for the standardized use of the term "exposome-wide association study, ExWAS," to facilitate clear communication and literature retrieval in this field. The paper aims to guide future health researchers in understanding and evaluating exposomic studies. Our discussion extends to emerging topics, such as FAIR Data Principles, biobanked healthcare datasets, and the functional exposome, outlining the future directions in exposomic research. This abstract provides a succinct overview of our comprehensive approach to understanding the complex dynamics of the exposome and its significant implications for human health.
Collapse
Affiliation(s)
- Ming Kei Chung
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- School of Public Health and Primary Care, The Chinese University of Hong Kong, Hong Kong, China
- Institute of Environment, Energy and Sustainability, The Chinese University of Hong Kong, Hong Kong, China
| | - John S House
- Biostatistics and Computational Biology Branch, Division of Intramural Research, National Institute of Environmental Health Sciences, Durham, NC, USA
| | - Farida S Akhtari
- Biostatistics and Computational Biology Branch, Division of Intramural Research, National Institute of Environmental Health Sciences, Durham, NC, USA
| | - Konstantinos C Makris
- Cyprus International Institute for Environmental and Public Health, School of Health Sciences, Cyprus University of Technology, Limassol, Cyprus
| | - Michael A Langston
- Department of Electrical Engineering and Computer Science, University of TN, Knoxville, TN, USA
| | - Khandaker Talat Islam
- Department of Population and Public Health Sciences, Keck School of Medicine of the University of Southern CA, Los Angeles, CA, USA
| | - Philip Holmes
- Department of Physics, Villanova University, Villanova, Philadelphia, USA
| | - Marc Chadeau-Hyam
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK
| | - Alex I Smirnov
- Department of Chemistry, NC State University, Raleigh, NC, USA
| | - Xiuxia Du
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of NC at Charlotte, Charlotte, NC, USA
| | - Anne E Thessen
- Department of Biomedical Informatics, University of CO Anschutz Medical Campus, Aurora, CO, USA
| | - Yuxia Cui
- Exposure, Response, and Technology Branch, Division of Extramural Research and Training, National Institute of Environmental Health Sciences, Durham, NC, USA
| | - Kai Zhang
- Department of Environmental Health Sciences, School of Public Health, University at Albany, State University of NY, Rensselaer, NY, USA
| | - Arjun K Manrai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Alison Motsinger-Reif
- Biostatistics and Computational Biology Branch, Division of Intramural Research, National Institute of Environmental Health Sciences, Durham, NC, USA
| | - Chirag J Patel
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
6
|
Tu X, Liang H, Jakobsson A, Huang Y, Ding X. Adaptive sparse estimation of nonlinear chirp signals using Laplace priors. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:78-93. [PMID: 38174966 DOI: 10.1121/10.0024248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Accepted: 12/14/2023] [Indexed: 01/05/2024]
Abstract
The identification of nonlinear chirp signals has attracted notable attention in the recent literature, including estimators such as the variational mode decomposition and the nonlinear chirp mode estimator. However, most presented methods fail to process signals with close frequency intervals or depend on user-determined parameters that are often non-trivial to select optimally. In this work, we propose a fully adaptive method, termed the adaptive nonlinear chirp mode estimation. The method decomposes a combined nonlinear chirp signal into its principal modes, accurately representing each mode's time-frequency representation simultaneously. Exploiting the sparsity of the instantaneous amplitudes, the proposed method can produce estimates that are smooth in the sense of being piecewise linear. Furthermore, we analyze the decomposition problem from a Bayesian perspective, using hierarchical Laplace priors to form an efficient implementation, allowing for a fully automatic parameter selection. Numerical simulations and experimental data analysis show the effectiveness and advantages of the proposed method. Notably, the algorithm is found to yield reliable estimates even when encountering signals with crossed modes. The method's practical potential is illustrated on a whale whistle signal.
Collapse
Affiliation(s)
- Xiaotong Tu
- School of Informatics, Xiamen University, Xiamen, China
| | - Hao Liang
- School of Informatics, Xiamen University, Xiamen, China
| | | | - Yue Huang
- School of Informatics, Xiamen University, Xiamen, China
| | - Xinghao Ding
- School of Informatics, Xiamen University, Xiamen, China
| |
Collapse
|
7
|
Mo C, Ye Z, Pan Y, Zhang Y, Wu Q, Bi C, Liu S, Mitchell B, Kochunov P, Hong LE, Ma T, Chen S. An in-depth association analysis of genetic variants within nicotine-related loci: Meeting in middle of GWAS and genetic fine-mapping. Mol Cell Neurosci 2023; 127:103895. [PMID: 37634742 PMCID: PMC11128188 DOI: 10.1016/j.mcn.2023.103895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 08/21/2023] [Accepted: 08/24/2023] [Indexed: 08/29/2023] Open
Abstract
In the last two decades of Genome-wide association studies (GWAS), nicotine-dependence-related genetic loci (e.g., nicotinic acetylcholine receptor - nAChR subunit genes) are among the most replicable genetic findings. Although GWAS results have reported tens of thousands of SNPs within these loci, further analysis (e.g., fine-mapping) is required to identify the causal variants. However, it is computationally challenging for existing fine-mapping methods to reliably identify causal variants from thousands of candidate SNPs based on the posterior inclusion probability. To address this challenge, we propose a new method to select SNPs by jointly modeling the SNP-wise inference results and the underlying structured network patterns of the linkage disequilibrium (LD) matrix. We use adaptive dense subgraph extraction method to recognize the latent network patterns of the LD matrix and then apply group LASSO to select causal variant candidates. We applied this new method to the UK biobank data to identify the causal variant candidates for nicotine addiction. Eighty-one nicotine addiction-related SNPs (i.e.,-log(p) > 50) of nAChR were selected, which are highly correlated (average r2>0.8) although they are physically distant (e.g., >200 kilobase away) and from various genes. These findings revealed that distant SNPs from different genes can show higher LD r2 than their neighboring SNPs, and jointly contribute to a complex trait like nicotine addiction.
Collapse
Affiliation(s)
- Chen Mo
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States
| | - Zhenyao Ye
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States
| | - Yezhi Pan
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States
| | - Yuan Zhang
- Department of Statistics, College of Arts and Sciences, Ohio State University, Columbus, Ohio, United States
| | - Qiong Wu
- Department of Mathematics, University of Maryland, College Park, Maryland, United States
| | - Chuan Bi
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States
| | - Song Liu
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong, China
| | - Braxton Mitchell
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States
| | - Peter Kochunov
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States
| | - L. Elliot Hong
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States
| | - Tianzhou Ma
- Department of Epidemiology and Biostatistics, School of Public Health, University of Maryland, College Park, Maryland, United States
| | - Shuo Chen
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States
- Division of Biostatistics and Bioinformatics, Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, Maryland, United States
| |
Collapse
|
8
|
DeHaan LM, Burns MD, Egan JP, Bloom DD. Diadromy Drives Elevated Rates of Trait Evolution and Ecomorphological Convergence in Clupeiformes (Herring, Shad, and Anchovies). Am Nat 2023; 202:830-850. [PMID: 38033182 DOI: 10.1086/726894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2023]
Abstract
AbstractMigration can have a profound influence on rates and patterns of phenotypic evolution. Diadromy is the migration between marine and freshwater habitats for feeding and reproduction that can require individuals to travel tens to thousands of kilometers. The high energetic demands of diadromy are predicted to select for ecomorphological traits that maximize swimming and locomotor efficiency. Intraspecific studies have shown repeated instances of divergence among diadromous and nondiadromous populations in locomotor and foraging traits, which suggests that at a macroevolutionary scale diadromous lineages may experience convergent evolution onto one or multiple adaptive optima. We tested for differences in rates and patterns of phenotypic evolution among diadromous and nondiadromous lineages in Clupeiformes, a clade that has evolved diadromy more than 10 times. Our results show that diadromous clupeiforms show convergent evolution for some locomotor traits and faster rates of evolution, which we propose are adaptive responses to the locomotor demands of migration. We also find evidence that diadromous lineages show convergence into multiple regions of multivariate trait space and suggest that these respective trait spaces are associated with differences in migration and trophic ecology. However, not all locomotor traits and no trophic traits show evidence of convergence or elevated rates of evolution associated with diadromy. Our results show that long-distance migration influences the tempo and patterns of phenotypic evolution at macroevolutionary scales, but there is not a single diadromous syndrome.
Collapse
|
9
|
Trillos NG, Murray R, Thorpe M. Rates of convergence for regression with the graph poly-Laplacian. SAMPLING THEORY, SIGNAL PROCESSING, AND DATA ANALYSIS 2023; 21:35. [PMID: 38037599 PMCID: PMC10682086 DOI: 10.1007/s43670-023-00075-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Accepted: 10/30/2023] [Indexed: 12/02/2023]
Abstract
In the (special) smoothing spline problem one considers a variational problem with a quadratic data fidelity penalty and Laplacian regularization. Higher order regularity can be obtained via replacing the Laplacian regulariser with a poly-Laplacian regulariser. The methodology is readily adapted to graphs and here we consider graph poly-Laplacian regularization in a fully supervised, non-parametric, noise corrupted, regression problem. In particular, given a dataset { x i } i = 1 n and a set of noisy labels { y i } i = 1 n ⊂ R we let u n : { x i } i = 1 n → R be the minimizer of an energy which consists of a data fidelity term and an appropriately scaled graph poly-Laplacian term. When y i = g ( x i ) + ξ i , for iid noise ξ i , and using the geometric random graph, we identify (with high probability) the rate of convergence of u n to g in the large data limit n → ∞ . Furthermore, our rate is close to the known rate of convergence in the usual smoothing spline model.
Collapse
Affiliation(s)
| | - Ryan Murray
- Department of Mathematics, North Carolina State University, Raleigh, NC 27695 USA
| | - Matthew Thorpe
- Department of Statistics, University of Warwick, Coventry, CV4 7AL UK
| |
Collapse
|
10
|
Reeder HT, Lu J, Haneuse S. Penalized estimation of frailty-based illness-death models for semi-competing risks. Biometrics 2023; 79:1657-1669. [PMID: 36125235 PMCID: PMC10025166 DOI: 10.1111/biom.13761] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Accepted: 09/12/2022] [Indexed: 11/30/2022]
Abstract
Semi-competing risks refer to the time-to-event analysis setting, where the occurrence of a non-terminal event is subject to whether a terminal event has occurred, but not vice versa. Semi-competing risks arise in a broad range of clinical contexts, including studies of preeclampsia, a condition that may arise during pregnancy and for which delivery is a terminal event. Models that acknowledge semi-competing risks enable investigation of relationships between covariates and the joint timing of the outcomes, but methods for model selection and prediction of semi-competing risks in high dimensions are lacking. Moreover, in such settings researchers commonly analyze only a single or composite outcome, losing valuable information and limiting clinical utility-in the obstetric setting, this means ignoring valuable insight into timing of delivery after preeclampsia has onset. To address this gap, we propose a novel penalized estimation framework for frailty-based illness-death multi-state modeling of semi-competing risks. Our approach combines non-convex and structured fusion penalization, inducing global sparsity as well as parsimony across submodels. We perform estimation and model selection via a pathwise routine for non-convex optimization, and prove statistical error rate results in this setting. We present a simulation study investigating estimation error and model selection performance, and a comprehensive application of the method to joint risk modeling of preeclampsia and timing of delivery using pregnancy data from an electronic health record.
Collapse
Affiliation(s)
- Harrison T. Reeder
- Biostatistics, Massachusetts General Hospital, Boston, Massachusetts, U.S.A
- Department of Medicine, Harvard Medical School, Boston, Massachusetts, U.S.A
| | - Junwei Lu
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, U.S.A
| | - Sebastien Haneuse
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, U.S.A
| |
Collapse
|
11
|
Wang S, Chen S, Gao Y, Zhou H. Bioinformatics led discovery of biomarkers related to immune infiltration in diabetes nephropathy. Medicine (Baltimore) 2023; 102:e34992. [PMID: 37656997 PMCID: PMC10476789 DOI: 10.1097/md.0000000000034992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 08/08/2023] [Indexed: 09/03/2023] Open
Abstract
BACKGROUND The leading cause of end-stage renal disease is diabetic nephropathy (DN). A key factor in DN is immune cell infiltration (ICI). It has been shown that immune-related genes play a significant role in inflammation and immune cell recruitment. However, neither the underlying mechanisms nor immune-related biomarkers have been identified in DNs. Using bioinformatics, this study investigated biomarkers associated with immunity in DN. METHODS Using bioinformatic methods, this study aimed to identify biomarkers and immune infiltration associated with DN. Gene expression profiles (GSE30528, GSE47183, and GSE104948) were selected from the Gene Expression Omnibus database. First, we identified 23 differentially expressed immune-related genes and 7 signature genes, LYZ, CCL5, ALB, IGF1, CXCL2, NR4A2, and RBP4. Subsequently, protein-protein interaction networks were created, and functional enrichment analysis and genome enrichment analysis were performed using the gene ontology and Kyoto Encyclopedia of Genes and Genome databases. In the R software, the ConsensusClusterPlus package identified 2 different immune modes (cluster A and cluster B) following the consistent clustering method. The infiltration of immune cells between the 2 clusters was analyzed by applying the CIBERSORT method. And preliminarily verified the characteristic genes through in vitro experiments. RESULTS In this study, the samples of diabetes nephropathy were classified based on immune related genes, and the Hub genes LYZ, CCL5, ALB, IGF1, CXCL2, NR4A2 and RBP4 related to immune infiltration of diabetes nephropathy were obtained through the analysis of gene expression differences between different subtypes. CONCLUSIONS This study was based on bioinformatics technology to analyze the biomarkers of immune related genes in diabetes nephropathy. To analyze the pathogenesis of diabetes nephropathy at the RNA level, and ultimately provide guidance for disease diagnosis, treatment, and prognosis.
Collapse
Affiliation(s)
- Shuo Wang
- The First Affiliated Hospital of Jinan University, Jinan University, Guangzhou, People’s Republic of China
- Department of Endocrinology, First Affiliated Hospital of Jinzhou Medical University, Jinzhou, People’s Republic of China
| | - Shengwu Chen
- Department of Orthopaedics, Third Affiliated Hospital of Jinzhou Medical University, Jinzhou, People’s Republic of China
| | - Yixuan Gao
- Department of Orthopaedics, Third Affiliated Hospital of Jinzhou Medical University, Jinzhou, People’s Republic of China
| | - Hongli Zhou
- The First Affiliated Hospital of Jinan University, Jinan University, Guangzhou, People’s Republic of China
- Department of Nephrology, First Affiliated Hospital of Jinzhou Medical University, Jinzhou, People’s Republic of China
| |
Collapse
|
12
|
Zhao Y, Wang B, Liu CF, Faria AV, Miller MI, Caffo BS, Luo X. Identifying brain hierarchical structures associated with Alzheimer's disease using a regularized regression method with tree predictors. Biometrics 2023; 79:2333-2345. [PMID: 36263865 PMCID: PMC10115907 DOI: 10.1111/biom.13775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 10/03/2022] [Indexed: 11/30/2022]
Abstract
Brain segmentation at different levels is generally represented as hierarchical trees. Brain regional atrophy at specific levels was found to be marginally associated with Alzheimer's disease outcomes. In this study, we propose an ℓ1 -type regularization for predictors that follow a hierarchical tree structure. Considering a tree as a directed acyclic graph, we interpret the model parameters from a path analysis perspective. Under this concept, the proposed penalty regulates the total effect of each predictor on the outcome. With regularity conditions, it is shown that under the proposed regularization, the estimator of the model coefficient is consistent in ℓ2 -norm and the model selection is also consistent. When applied to a brain sMRI dataset acquired from the Alzheimer's Disease Neuroimaging Initiative (ADNI), the proposed approach identifies brain regions where atrophy in these regions demonstrates the declination in memory. With regularization on the total effects, the findings suggest that the impact of atrophy on memory deficits is localized from small brain regions, but at various levels of brain segmentation. Data used in preparation of this paper were obtained from the ADNI database.
Collapse
Affiliation(s)
- Yi Zhao
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Bingkai Wang
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Chin-Fu Liu
- Center for Imaging Science, Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, USA
| | - Andreia V. Faria
- Department of Radiology, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Michael I. Miller
- Center for Imaging Science, Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, USA
| | - Brian S. Caffo
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Xi Luo
- Department of Biostatistics and Data Science, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| |
Collapse
|
13
|
Rahardiantoro S, Sakamoto W. Spatio-temporal clustering analysis using generalized lasso with an application to reveal the spread of Covid-19 cases in Japan. Comput Stat 2023:1-25. [PMID: 37360994 PMCID: PMC10089565 DOI: 10.1007/s00180-023-01331-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 01/27/2023] [Indexed: 06/28/2023]
Abstract
This study addressed the issue of determining multiple potential clusters with regularization approaches for the purpose of spatio-temporal clustering. The generalized lasso framework has flexibility to incorporate adjacencies between objects in the penalty matrix and to detect multiple clusters. A generalized lasso model with two L 1 penalties is proposed, which can be separated into two generalized lasso models: trend filtering of temporal effect and fused lasso of spatial effect for each time point. To select the tuning parameters, the approximate leave-one-out cross-validation (ALOCV) and generalized cross-validation (GCV) are considered. A simulation study is conducted to evaluate the proposed method compared to other approaches in different problems and structures of multiple clusters. The generalized lasso with ALOCV and GCV provided smaller MSE in estimating the temporal and spatial effect compared to unpenalized method, ridge, lasso, and generalized ridge. In temporal effects detection, the generalized lasso with ALOCV and GCV provided relatively smaller and more stable MSE than other methods, for different structure of true risk values. In spatial effects detection, the generalized lasso with ALOCV provided higher index of edges detection accuracy. The simulation also suggested using a common tuning parameter over all time points in spatial clustering. Finally, the proposed method was applied to the weekly Covid-19 data in Japan form March 21, 2020, to September 11, 2021, along with the interpretation of dynamic behavior of multiple clusters.
Collapse
Affiliation(s)
- Septian Rahardiantoro
- Department of Human Ecology, Graduate School of Environmental and Life Science, Okayama University, Okayama, 700-8350 Japan
- Department of Statistics, Faculty of Mathematics and Natural Science, IPB University, Bogor, 16680 Indonesia
| | - Wataru Sakamoto
- Department of Human Ecology, Graduate School of Environmental and Life Science, Okayama University, Okayama, 700-8350 Japan
| |
Collapse
|
14
|
Zhao Y, Huo X. Accelerate the warm-up stage in the Lasso computation via a homotopic approach. Comput Stat Data Anal 2023. [DOI: 10.1016/j.csda.2023.107747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2023]
|
15
|
Zhu W, Lévy-Leduc C, Ternès N. Identification of prognostic and predictive biomarkers in high-dimensional data with PPLasso. BMC Bioinformatics 2023; 24:25. [PMID: 36690931 PMCID: PMC9869528 DOI: 10.1186/s12859-023-05143-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Accepted: 01/09/2023] [Indexed: 01/24/2023] Open
Abstract
In clinical trials, identification of prognostic and predictive biomarkers has became essential to precision medicine. Prognostic biomarkers can be useful for the prevention of the occurrence of the disease, and predictive biomarkers can be used to identify patients with potential benefit from the treatment. Previous researches were mainly focused on clinical characteristics, and the use of genomic data in such an area is hardly studied. A new method is required to simultaneously select prognostic and predictive biomarkers in high dimensional genomic data where biomarkers are highly correlated. We propose a novel approach called PPLasso, that integrates prognostic and predictive effects into one statistical model. PPLasso also takes into account the correlations between biomarkers that can alter the biomarker selection accuracy. Our method consists in transforming the design matrix to remove the correlations between the biomarkers before applying the generalized Lasso. In a comprehensive numerical evaluation, we show that PPLasso outperforms the traditional Lasso and other extensions on both prognostic and predictive biomarker identification in various scenarios. Finally, our method is applied to publicly available transcriptomic and proteomic data.
Collapse
Affiliation(s)
- Wencan Zhu
- Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA Paris-Saclay, 91120, Palaiseau, France.
- Biostatistics and Programming Department, Sanofi R&D, 91380, Chilly Mazarin, France.
| | - Céline Lévy-Leduc
- Université Paris-Saclay, AgroParisTech, INRAE, UMR MIA Paris-Saclay, 91120, Palaiseau, France
| | - Nils Ternès
- Biostatistics and Programming Department, Sanofi R&D, 91380, Chilly Mazarin, France
| |
Collapse
|
16
|
Bak KY. The regularization paths of total variation-penalized regression splines. COMMUN STAT-SIMUL C 2023. [DOI: 10.1080/03610918.2023.2170410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Affiliation(s)
- Kwan-Young Bak
- School of Mathematics, Statistics and Data Science, Sungshin Women’s University Data Science Center, Sungshin Women’s University, Seoul, Republic of Korea
| |
Collapse
|
17
|
Crescenzi F. Hedonic pricing modelling with unstructured predictors: an application to Italian Fashion Industry. ASTA ADVANCES IN STATISTICAL ANALYSIS 2022. [DOI: 10.1007/s10182-022-00465-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
18
|
Rafati I, Destrempes F, Yazdani L, Gesnik M, Tang A, Cloutier G. Regularized Ultrasound Phantom-Free Local Attenuation Coefficient Slope (ACS) Imaging in Homogeneous and Heterogeneous Tissues. IEEE TRANSACTIONS ON ULTRASONICS, FERROELECTRICS, AND FREQUENCY CONTROL 2022; 69:3338-3352. [PMID: 36318570 DOI: 10.1109/tuffc.2022.3218920] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Attenuation maps or measurements based on the local attenuation coefficient slope (ACS) in quantitative ultrasound (QUS) have shown potential for the diagnosis of liver steatosis. In liver cancers, tissue abnormalities and tumors detected using ACS are also of interest to provide new image contrast to clinicians. Current phantom-based approaches have the limitation of assuming a comparable speed of sound between the reference phantom and insonified tissues. Moreover, these methods present the inconvenience for operators to acquire data on phantoms and patients. The main goal was to alleviate these drawbacks by proposing a methodology for constructing phantom-free regularized (PF-R) local ACS maps and investigate the performance in both homogeneous and heterogeneous media. The proposed method was tested on two tissue-mimicking media with different ACS constructed as homogeneous phantoms, side-by-side and top-to-bottom phantoms, and inclusion phantoms with different attenuations. Moreover, an in vivo proof-of-concept was performed on healthy, steatotic, and cancerous human liver datasets. Modifications brought to previous works include: 1) a linear interpolation of the power spectrum in the log scale; 2) the relaxation of the underlying hypothesis on the diffraction factor; 3) a generalization to nonhomogeneous local ACS; and 4) an adaptive restriction of frequencies to a more reliable range than the usable frequency range. Regularization was formulated as a generalized least absolute shrinkage and selection operator (LASSO), and a variant of the Bayesian information criterion (BIC) was applied to estimate the Lagrangian multiplier on the LASSO constraint. In addition, we evaluated the proposed algorithm when applying median filtering before and after regularization. Tests conducted showed that the PF-R yielded robust results in all tested conditions, suggesting potential for additional validation as a diagnosis method.
Collapse
|
19
|
Bao R, Yamada H, Hayakawa K. l1common trend filtering: an extension. J STAT COMPUT SIM 2022. [DOI: 10.1080/00949655.2022.2144314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Affiliation(s)
- Ruoyi Bao
- Graduate School of Humanities and Social Sciences, Hiroshima University, Higashihiroshima, Japan
| | - Hiroshi Yamada
- School of Informatics and Data Science, Hiroshima University, Higashihiroshima, Japan
| | - Kazuhiko Hayakawa
- Graduate School of Humanities and Social Sciences, Hiroshima University, Higashihiroshima, Japan
| |
Collapse
|
20
|
Ko S, Zhou H, Zhou JJ, Won JH. High-Performance Statistical Computing in the Computing Environments of the 2020s. Stat Sci 2022; 37:494-518. [PMID: 37168541 PMCID: PMC10168006 DOI: 10.1214/21-sts835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Technological advances in the past decade, hardware and software alike, have made access to high-performance computing (HPC) easier than ever. We review these advances from a statistical computing perspective. Cloud computing makes access to supercomputers affordable. Deep learning software libraries make programming statistical algorithms easy and enable users to write code once and run it anywhere-from a laptop to a workstation with multiple graphics processing units (GPUs) or a supercomputer in a cloud. Highlighting how these developments benefit statisticians, we review recent optimization algorithms that are useful for high-dimensional models and can harness the power of HPC. Code snippets are provided to demonstrate the ease of programming. We also provide an easy-to-use distributed matrix data structure suitable for HPC. Employing this data structure, we illustrate various statistical applications including large-scale positron emission tomography and ℓ1-regularized Cox regression. Our examples easily scale up to an 8-GPU workstation and a 720-CPU-core cluster in a cloud. As a case in point, we analyze the onset of type-2 diabetes from the UK Biobank with 200,000 subjects and about 500,000 single nucleotide polymorphisms using the HPC ℓ1-regularized Cox regression. Fitting this half-million-variate model takes less than 45 minutes and reconfirms known associations. To our knowledge, this is the first demonstration of the feasibility of penalized regression of survival outcomes at this scale.
Collapse
Affiliation(s)
- Seyoon Ko
- Department of Biostatistics, UCLA Fielding School of Public Health, Los Angeles, California 90095, USA
| | - Hua Zhou
- Department of Biostatistics, UCLA Fielding School of Public Health, Los Angeles, California 90095, USA
| | - Jin J Zhou
- Department of Medicine, UCLA David Geffen School of Medicine, Los Angeles, California 90095, USA, and Department of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, University of Arizona, Tucson, Arizona 85724, USA
| | - Joong-Ho Won
- Department of Statistics, Seoul National University, Seoul, Korea
| |
Collapse
|
21
|
Qu Y, Lee CY, Lam KF. A novel method to monitor COVID-19 fatality rate in real-time, a key metric to guide public health policy. Sci Rep 2022; 12:18277. [PMID: 36316534 PMCID: PMC9619021 DOI: 10.1038/s41598-022-23138-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Accepted: 10/25/2022] [Indexed: 12/31/2022] Open
Abstract
An accurate estimator of the real-time fatality rate is warranted to monitor the progress of ongoing epidemics, hence facilitating the policy-making process. However, most of the existing estimators fail to capture the time-varying nature of the fatality rate and are often biased in practice. A simple real-time fatality rate estimator with adjustment for reporting delays is proposed in this paper using the fused lasso technique. This approach is easy to use and can be broadly applied to public health practice as only basic epidemiological data are required. A large-scale simulation study suggests that the proposed estimator is a reliable benchmark for formulating public health policies during an epidemic with high accuracy and sensitivity in capturing the changes in the fatality rate over time, while the other two commonly-used case fatality rate estimators may convey delayed or even misleading signals of the true situation. The application to the COVID-19 data in Germany between January 2020 and January 2022 demonstrates the importance of the social restrictions in the early phase of the pandemic when vaccines were not available, and the beneficial effects of vaccination in suppressing the fatality rate to a low level since August 2021 irrespective of the rebound in infections driven by the more infectious Delta and Omicron variants during the fourth wave.
Collapse
Affiliation(s)
- Yuanke Qu
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong, People's Republic of China
- Guangdong Ocean University, Zhanjiang, People's Republic of China
| | - Chun Yin Lee
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong, People's Republic of China
| | - K F Lam
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong, People's Republic of China.
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore.
| |
Collapse
|
22
|
Ding Y, Li Y, Song R. Statistical Learning for Individualized Asset Allocation. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2139265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
Affiliation(s)
- Yi Ding
- Faculty of Business Administration, University of Macau, Macau (e-mail: )
| | | | | |
Collapse
|
23
|
Son W, Lim J, Yu D. Path algorithms for fused lasso signal approximator with application to COVID-19 spread in Korea. Int Stat Rev 2022; 91:INSR12521. [PMID: 36710888 PMCID: PMC9874640 DOI: 10.1111/insr.12521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Revised: 07/22/2022] [Accepted: 09/26/2022] [Indexed: 02/01/2023]
Abstract
The fused lasso signal approximator (FLSA) is a smoothing procedure for noisy observations that uses fused lasso penalty on unobserved mean levels to find sparse signal blocks. Several path algorithms have been developed to obtain the whole solution path of the FLSA. However, it is known that the FLSA has model selection inconsistency when the underlying signals have a stair-case block, where three consecutive signal blocks are either strictly increasing or decreasing. Modified path algorithms for the FLSA have been proposed to guarantee model selection consistency regardless of the stair-case block. In this paper, we provide a comprehensive review of the path algorithms for the FLSA and prove the properties of the recently modified path algorithms' hitting times. Specifically, we reinterpret the modified path algorithm as the path algorithm for local FLSA problems and reveal the condition that the hitting time for the fusion of the modified path algorithm is not monotone in a tuning parameter. To recover the monotonicity of the solution path, we propose a pathwise adaptive FLSA having monotonicity with similar performance as the modified solution path algorithm. Finally, we apply the proposed method to the number of daily-confirmed cases of COVID-19 in Korea to identify the change points of its spread.
Collapse
Affiliation(s)
- Won Son
- Department of Information StatisticsDankook UniversityGyeonggi‐doKorea
| | - Johan Lim
- Department of StatisticsSeoul National UniversitySeoulKorea
| | - Donghyeon Yu
- Department of StatisticsInha UniversityIncheonKorea
| |
Collapse
|
24
|
A new active zero set descent algorithm for least absolute deviation with generalized LASSO penalty. J Korean Stat Soc 2022. [DOI: 10.1007/s42952-022-00192-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
25
|
Chen Y, Jewell S, Witten D. More Powerful Selective Inference for the Graph Fused Lasso. J Comput Graph Stat 2022; 32:577-587. [PMID: 38250478 PMCID: PMC10798806 DOI: 10.1080/10618600.2022.2097246] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Accepted: 06/28/2022] [Indexed: 10/17/2022]
Abstract
The graph fused lasso-which includes as a special case the one-dimensional fused lasso-is widely used to reconstruct signals that are piecewise constant on a graph, meaning that nodes connected by an edge tend to have identical values. We consider testing for a difference in the means of two connected components estimated using the graph fused lasso. A naive procedure such as a z-test for a difference in means will not control the selective Type I error, since the hypothesis that we are testing is itself a function of the data. In this work, we propose a new test for this task that controls the selective Type I error, and conditions on less information than existing approaches, leading to substantially higher power. We illustrate our approach in simulation and on datasets of drug overdose death rates and teenage birth rates in the contiguous United States. Our approach yields more discoveries on both datasets. Supplementary materials for this article are available online.
Collapse
Affiliation(s)
- Yiqun Chen
- Department of Biostatistics, University of Washington, Seattle, WA
| | - Sean Jewell
- Department of Statistics, University of Washington, Seattle, WA
| | - Daniela Witten
- Department of Biostatistics, University of Washington, Seattle, WA
- Department of Statistics, University of Washington, Seattle, WA
| |
Collapse
|
26
|
Thresholding tests based on affine LASSO to achieve non-asymptotic nominal level and high power under sparse and dense alternatives in high dimension. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
27
|
Identification of microbial features in multivariate regression under false discovery rate control. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
28
|
Penalized polygram regression. J Korean Stat Soc 2022. [DOI: 10.1007/s42952-022-00181-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
29
|
Zhou J, Hoen AG, Mcritchie S, Pathmasiri W, Viles WD, Nguyen QP, Madan JC, Dade E, Karagas MR, Gui J. Information enhanced model selection for Gaussian graphical model with application to metabolomic data. Biostatistics 2022; 23:926-948. [PMID: 33720330 PMCID: PMC9608647 DOI: 10.1093/biostatistics/kxab006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 01/21/2021] [Accepted: 01/22/2021] [Indexed: 11/12/2022] Open
Abstract
In light of the low signal-to-noise nature of many large biological data sets, we propose a novel method to learn the structure of association networks using Gaussian graphical models combined with prior knowledge. Our strategy includes two parts. In the first part, we propose a model selection criterion called structural Bayesian information criterion, in which the prior structure is modeled and incorporated into Bayesian information criterion. It is shown that the popular extended Bayesian information criterion is a special case of structural Bayesian information criterion. In the second part, we propose a two-step algorithm to construct the candidate model pool. The algorithm is data-driven and the prior structure is embedded into the candidate model automatically. Theoretical investigation shows that under some mild conditions structural Bayesian information criterion is a consistent model selection criterion for high-dimensional Gaussian graphical model. Simulation studies validate the superiority of the proposed algorithm over the existing ones and show the robustness to the model misspecification. Application to relative concentration data from infant feces collected from subjects enrolled in a large molecular epidemiological cohort study validates that metabolic pathway involvement is a statistically significant factor for the conditional dependence between metabolites. Furthermore, new relationships among metabolites are discovered which can not be identified by the conventional methods of pathway analysis. Some of them have been widely recognized in biological literature.
Collapse
Affiliation(s)
- Jie Zhou
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, 3 Rope Ferry Road, Hanover, NH 03755, USA
| | - Anne G Hoen
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA and Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, 3 Rope Ferry Road, Hanover, NH 03755, USA
| | - Susan Mcritchie
- Nutrition Research Institute, Department of Nutrition, School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, 500 Laureate Way, Kannapolis, NC 28081, USA
| | - Wimal Pathmasiri
- Nutrition Research Institute, Department of Nutrition, School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, 500 Laureate Way, Kannapolis, NC 28081, USA
| | - Weston D Viles
- Department of Mathematics and Statistics, University of Southern Maine, 96 Falmouth St, Portland, ME 04103, USA
| | - Quang P Nguyen
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA and Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Juliette C Madan
- Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Erika Dade
- Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Margaret R Karagas
- Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Jiang Gui
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| |
Collapse
|
30
|
Extraction of Continuous and Discrete Spatial Heterogeneities: Fusion Model of Spatially Varying Coefficient Model and Sparse Modelling. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2022. [DOI: 10.3390/ijgi11070358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Geospatial phenomena often have spatial heterogeneity, which is caused by differences in the data generation process from place to place. There are two types of spatial heterogeneity: continuous and discrete, and there has been much discussion about how to analyze one type of spatial heterogeneity. Although geospatial phenomena can have both types of spatial heterogeneities, previous studies have not sufficiently discussed how to consider these two different types of spatial heterogeneity simultaneously and how to detect them separately, which may lead to biased estimates and the wrong interpretation of geospatial phenomena. This study proposes a new approach for the analysis of spatial data with both heterogeneities by combining the eigenvector spatial filtering-based spatially varying coefficient (ESF-SVC) model, which assumes the continuous spatial heterogeneity and generalized lasso (GL) estimation, which assumes discrete spatial heterogeneity and proposes the ESF-GL-SVC model. The performance of ESF-GL-SVC was evaluated through experiments based on a Monte Carlo simulation and confirms that the ESF-GL-SVC showed better performance in estimating coefficients with both types of spatial heterogeneity than the previous two models. The application of the apartment rent data showed that the ESF-GL-SVC outputs the result with the smallest BIC value, and the estimated coefficients depict continuous and discrete spatial heterogeneity in the dataset. Reasonable coefficients were estimated using the ESF-GL-SVC, although some coefficients by ESF-SVC were not.
Collapse
|
31
|
Matsushima Y, Naito K. Improvement on LASSO-type estimator in nonparametric regression. J Nonparametr Stat 2022. [DOI: 10.1080/10485252.2022.2085700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Yuki Matsushima
- Graduate School of Natural Science and Technology, Shimane University, Matsue, Japan
| | - Kanta Naito
- Department of Mathematics and Informatics, Chiba University, Chiba, Japan
| |
Collapse
|
32
|
Dallakyan A, Pourahmadi M. Fused-Lasso Regularized Cholesky Factors of Large Nonstationary Covariance Matrices of Replicated Time Series. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2090367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
33
|
|
34
|
Herman JA, Arora S, Carter L, Zhu J, Biggins S, Paddison PJ. Functional dissection of human mitotic genes using CRISPR-Cas9 tiling screens. Genes Dev 2022; 36:495-510. [PMID: 35483740 PMCID: PMC9067404 DOI: 10.1101/gad.349319.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 04/12/2022] [Indexed: 12/03/2022]
Abstract
In this Resource/Methodology, Herman et al. developed a method that leverages CRISPR–Cas9-induced mutations across protein-coding genes for the a priori identification of functional regions at the sequence level. As a test case, they applied this method to 48 human mitotic genes, revealing hundreds of regions required for cell proliferation, including domains that were experimentally characterized, ones that were predicted based on homology, and novel ones. The identity of human protein-coding genes is well known, yet our in-depth knowledge of their molecular functions and domain architecture remains limited by shortcomings in homology-based predictions and experimental approaches focused on whole-gene depletion. To bridge this knowledge gap, we developed a method that leverages CRISPR–Cas9-induced mutations across protein-coding genes for the a priori identification of functional regions at the sequence level. As a test case, we applied this method to 48 human mitotic genes, revealing hundreds of regions required for cell proliferation, including domains that were experimentally characterized, ones that were predicted based on homology, and novel ones. We validated screen outcomes for 15 regions, including amino acids 387–402 of Mad1, which were previously uncharacterized but contribute to Mad1 kinetochore localization and chromosome segregation fidelity. Altogether, we demonstrate that CRISPR–Cas9-based tiling mutagenesis identifies key functional domains in protein-coding genes de novo, which elucidates separation of function mutants and allows functional annotation across the human proteome.
Collapse
Affiliation(s)
- Jacob A Herman
- Howard Hughes Medical Institute, Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA.,Human Biology Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA
| | - Sonali Arora
- Human Biology Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA
| | - Lucas Carter
- Human Biology Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA
| | - Jun Zhu
- Department of Genetics and Genomic Sciences, Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| | - Sue Biggins
- Howard Hughes Medical Institute, Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA
| | - Patrick J Paddison
- Human Biology Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA
| |
Collapse
|
35
|
Chen X, Zeng Y, Kang S, Jin R. INN: An Interpretable Neural Network for AI Incubation in Manufacturing. ACM T INTEL SYST TEC 2022. [DOI: 10.1145/3519313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
Abstract
Both artificial intelligence (AI) and domain knowledge from human experts play an important role in manufacturing decision-making. While smart manufacturing emphasizes a fully automated data-driven decision-making, the AI incubation process involves human experts to enhance AI systems by integrating domain knowledge for modeling, data collection and annotation, and feature extraction. Such an AI incubation process will not only enhance the domain knowledge discovery, but also improve the interpretability and trustworthiness of AI methods. In this paper, we focus on the knowledge transfer from human experts to a supervised learning problem by learning domain knowledge as interpretable features and rules, which can be used to construct rule-based systems to support manufacturing decision-making, such as process modeling and quality inspection. Although many advanced statistical and machine learning methods have shown promising modeling accuracy and efficiency, rule-based systems are still highly preferred and widely adopted due to their interpretability for human experts to comprehend. However, most of the existing rule-based systems are constructed based on deterministic human-crafted rules, whose parameters, e.g., thresholds of decision rules, are suboptimal. On the other hand, the machine learning methods, such as tree models or neural networks, can learn a decision-rule based structure without much interpretation or agreement with domain knowledge. Therefore, the traditional machine learning models and human experts’ domain knowledge cannot be directly improved by learning from data. In this research, we propose an interpretable neural network (INN) model with a center-adjustable Sigmoid activation function to efficiently optimize the rule-based systems. Using the rule-based system from domain knowledge to regulate the INN architecture will not only improve the prediction accuracy with optimized parameters, but also ensure the interpretability by adopting the interpretable rule-based systems from domain knowledge. The proposed INN will be effective for supervised learning problems when rule-based systems are available. The merits of INN model are demonstrated via a simulation study and a real case study in the quality modeling of a semiconductor manufacturing process. The source code of this paper is hosted here: https://github.com/XiaoyuChenUofL/Interpretable-Neural-Network.
Collapse
Affiliation(s)
- Xiaoyu Chen
- Department of Industrial Engineering, University of Louisville, USA
| | - Yingyan Zeng
- Grado Department of Industrial and Systems Engineering, Virginia Tech, USA
| | - Sungku Kang
- Civil and Environmental Engineering, Northeastern University, USA
| | - Ran Jin
- Grado Department of Industrial and Systems Engineering, Virginia Tech, USA
| |
Collapse
|
36
|
Wang B, Caffo BS, Luo X, Liu C, Faria AV, Miller MI, Zhao Y. Regularized regression on compositional trees with application to MRI analysis. J R Stat Soc Ser C Appl Stat 2022; 71:541-561. [DOI: 10.1111/rssc.12545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Affiliation(s)
- Bingkai Wang
- Department of BiostatisticsJohns Hopkins Bloomberg School of Public Health BaltimoreMarylandUSA
| | - Brian S. Caffo
- Department of BiostatisticsJohns Hopkins Bloomberg School of Public Health BaltimoreMarylandUSA
| | - Xi Luo
- Department of Biostatistics and Data ScienceThe University of Texas Health Science Center at Houston HoustonTexasUSA
| | - Chin‐Fu Liu
- Center for Imaging Science, Biomedical EngineeringJohns Hopkins University BaltimoreMarylandUSA
| | - Andreia V. Faria
- Department of RadiologyJohns Hopkins University School of Medicine BaltimoreMarylandUSA
| | - Michael I. Miller
- Center for Imaging Science, Biomedical EngineeringJohns Hopkins University BaltimoreMarylandUSA
| | - Yi Zhao
- Department of BiostatisticsIndiana University School of Medicine and for the Alzheimer's Disease Neuroimaging Initiative IndianapolisIndianaUSA
| | | |
Collapse
|
37
|
Point Event Cluster Detection via the Bayesian Generalized Fused Lasso. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2022. [DOI: 10.3390/ijgi11030187] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Spatial cluster detection is one of the focus areas of spatial analysis, whose objective is the identification of clusters from spatial distributions of point events aggregated in districts with small areas. Choi et al. (2018) formulated cluster detection as a parameter estimation problem to leverage the parameter selection capability of the sparse modeling method called the generalized fused lasso. Although this work is superior to conventional methods for detecting multiple clusters, its estimation results are limited to point estimates. This study therefore extended the above work as a Bayesian cluster detection method to describe the probabilistic variations of clustering results. The proposed method combines multiple sparsity-inducing priors and encourages sparse solutions induced by the generalized fused lasso. Evaluations were performed with simulated and real-world distributions of point events to demonstrate that the proposed method provides new information on the quantified reliabilities of clustering results at the district level while achieving comparable detection performances to that of the previous work.
Collapse
|
38
|
Pavón-Vázquez CJ, Brennan IG, Skeels A, Keogh JS. Competition and geography underlie speciation and morphological evolution in Indo-Australasian monitor lizards. Evolution 2022; 76:476-495. [PMID: 34816437 DOI: 10.1111/evo.14403] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Revised: 10/06/2021] [Accepted: 10/16/2021] [Indexed: 01/21/2023]
Abstract
How biotic and abiotic factors act together to shape biological diversity is a major question in evolutionary biology. The recent availability of large datasets and development of new methodological approaches provide new tools to evaluate the predicted effects of ecological interactions and geography on lineage diversification and phenotypic evolution. Here, we use a near complete phylogenomic-scale phylogeny and a comprehensive morphological dataset comprising more than a thousand specimens to assess the role of biotic and abiotic processes in the diversification of monitor lizards (Varanidae). This charismatic group of lizards shows striking variation in species richness among its clades and multiple instances of endemic radiation in Indo-Australasia (i.e., the Indo-Australian Archipelago and Australia), one of Earth's most biogeographically complex regions. We found heterogeneity in diversification dynamics across the family. Idiosyncratic biotic and geographic conditions appear to have driven diversification and morphological evolution in three endemic Indo-Australasian radiations. Furthermore, incumbency effects partially explain patterns in the biotic exchange between Australia and New Guinea. Our results offer insight into the dynamic history of Indo-Australasia, the evolutionary significance of competition, and the long-term consequences of incumbency effects.
Collapse
Affiliation(s)
- Carlos J Pavón-Vázquez
- Division of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, ACT 2601, Australia.,Current Address: Department of Biological Sciences, New York City College of Technology, City University of New York, Brooklyn, New York, 11201
| | - Ian G Brennan
- Division of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| | - Alexander Skeels
- Landscape Ecology, Department of Environmental Systems Science, Institute of Terrestrial Ecosystems, ETH Zürich, Zürich, CH-8092, Switzerland.,Swiss Federal Research Institute for Forest, Snow and Landscape Research (WSL), Birmensdorf, CH-8903, Switzerland
| | - J Scott Keogh
- Division of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| |
Collapse
|
39
|
Zhu J, Wang H, Li H, Zhang Q. Fast multi-view twin hypersphere support vector machine with consensus and complementary principles. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02986-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
40
|
Wang A, Du J, Zhang X, Shi J. Ranking Features to Promote Diversity: An Approach Based on Sparse Distance Correlation. Technometrics 2022. [DOI: 10.1080/00401706.2021.2020171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Andi Wang
- The Polytechnic School, Arizona State University, Mesa, AZ
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou, China
| | - Juan Du
- The Polytechnic School, Arizona State University, Mesa, AZ
- The Hong Kong University of Science and Technology, Guangzhou, China
| | - Xi Zhang
- Peking University, Beijing, China
| | - Jianjun Shi
- Georgia Institute of Technology, Atlanta, GA
| |
Collapse
|
41
|
Feng L, Bi X, Zhang H. Brain Regions Identified as Being Associated with Verbal Reasoning through the Use of Imaging Regression via Internal Variation. J Am Stat Assoc 2021; 116:144-158. [PMID: 34955572 DOI: 10.1080/01621459.2020.1766468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Brain-imaging data have been increasingly used to understand intellectual disabilities. Despite significant progress in biomedical research, the mechanisms for most of the intellectual disabilities remain unknown. Finding the underlying neurological mechanisms has been proved difficult, especially in children due to the rapid development of their brains. We investigate verbal reasoning, which is a reliable measure of individuals' general intellectual abilities, and develop a class of high-order imaging regression models to identify brain subregions which might be associated with this specific intellectual ability. A key novelty of our method is to take advantage of spatial brain structures, and specifically the piecewise smooth nature of most imaging coefficients in the form of high-order tensors. Our approach provides an effective and urgently needed method for identifying brain subregions potentially underlying certain intellectual disabilities. The idea behind our approach is a carefully constructed concept called Internal Variation (IV). The IV employs tensor decomposition and provides a computationally feasible substitution for Total Variation (TV), which has been considered in the literature to deal with similar problems but is problematic in high order tensor regression. Before applying our method to analyze the real data, we conduct comprehensive simulation studies to demonstrate the validity of our method in imaging signal identification. Then, we present our results from the analysis of a dataset based on the Philadelphia Neurodevelopmental Cohort for which we preprocessed the data including re-orienting, bias-field correcting, extracting, normalizing and registering the magnetic resonance images from 978 individuals. Our analysis identified a subregion across the cingulate cortex and the corpus callosum as being associated with individuals' verbal reasoning ability, which, to the best of our knowledge, is a novel region that has not been reported in the literature. This finding is useful in further investigation of functional mechansims for verbal reasoning.
Collapse
Affiliation(s)
- Long Feng
- Department of Biostatistics, Yale University
| | - Xuan Bi
- Information and Decision Sciences, Carlson School of Management, University of Minnesota
| | | |
Collapse
|
42
|
Hernan Madrid Padilla O, Chen Y. Graphon estimation via nearest‐neighbour algorithm and two‐dimensional fused‐lasso denoising. CAN J STAT 2021. [DOI: 10.1002/cjs.11676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
| | - Yanzhen Chen
- Department of ISOM Hong Kong University of Science and Technology Kowloon Hong Kong
| |
Collapse
|
43
|
Efficient Proximal Gradient Algorithms for Joint Graphical Lasso. ENTROPY 2021; 23:e23121623. [PMID: 34945929 PMCID: PMC8700157 DOI: 10.3390/e23121623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 11/27/2021] [Accepted: 11/27/2021] [Indexed: 11/17/2022]
Abstract
We consider learning as an undirected graphical model from sparse data. While several efficient algorithms have been proposed for graphical lasso (GL), the alternating direction method of multipliers (ADMM) is the main approach taken concerning joint graphical lasso (JGL). We propose proximal gradient procedures with and without a backtracking option for the JGL. These procedures are first-order methods and relatively simple, and the subproblems are solved efficiently in closed form. We further show the boundedness for the solution of the JGL problem and the iterates in the algorithms. The numerical results indicate that the proposed algorithms can achieve high accuracy and precision, and their efficiency is competitive with state-of-the-art algorithms.
Collapse
|
44
|
Liu Y, Shang F, Liu H, Kong L, Jiao L, Lin Z. Accelerated Variance Reduction Stochastic ADMM for Large-Scale Machine Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:4242-4255. [PMID: 32750780 DOI: 10.1109/tpami.2020.3000512] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Recently, many stochastic variance reduced alternating direction methods of multipliers (ADMMs) (e.g., SAG-ADMM and SVRG-ADMM) have made exciting progress such as linear convergence rate for strongly convex (SC) problems. However, their best-known convergence rate for non-strongly convex (non-SC) problems is O(1/T) as opposed to O(1/T2) of accelerated deterministic algorithms, where T is the number of iterations. Thus, there remains a gap in the convergence rates of existing stochastic ADMM and deterministic algorithms. To bridge this gap, we introduce a new momentum acceleration trick into stochastic variance reduced ADMM, and propose a novel accelerated SVRG-ADMM method (called ASVRG-ADMM) for the machine learning problems with the constraint Ax + By = c. Then we design a linearized proximal update rule and a simple proximal one for the two classes of ADMM-style problems with B = τI and B ≠ τI, respectively, where I is an identity matrix and τ is an arbitrary bounded constant. Note that our linearized proximal update rule can avoid solving sub-problems iteratively. Moreover, we prove that ASVRG-ADMM converges linearly for SC problems. In particular, ASVRG-ADMM improves the convergence rate from O(1/T) to O(1/T2) for non-SC problems. Finally, we apply ASVRG-ADMM to various machine learning problems, e.g., graph-guided fused Lasso, graph-guided logistic regression, graph-guided SVM, generalized graph-guided fused Lasso and multi-task learning, and show that ASVRG-ADMM consistently converges faster than the state-of-the-art methods.
Collapse
|
45
|
Mehrotra S, Maity A. Simultaneous variable selection, clustering, and smoothing in function‐on‐scalar regression. CAN J STAT 2021. [DOI: 10.1002/cjs.11668] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Suchit Mehrotra
- Department of Statistics North Carolina State University Raleigh NC U.S.A
| | - Arnab Maity
- Department of Statistics North Carolina State University Raleigh NC U.S.A
| |
Collapse
|
46
|
Harada K, Fujisawa H. Sparse estimation of Linear Non-Gaussian Acyclic Model for Causal Discovery. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.06.083] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
47
|
Escribe C, Lu T, Keller-Baruch J, Forgetta V, Xiao B, Richards JB, Bhatnagar S, Oualkacha K, Greenwood CMT. Block coordinate descent algorithm improves variable selection and estimation in error-in-variables regression. Genet Epidemiol 2021; 45:874-890. [PMID: 34468045 PMCID: PMC9292988 DOI: 10.1002/gepi.22430] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 07/19/2021] [Accepted: 08/12/2021] [Indexed: 11/13/2022]
Abstract
Medical research increasingly includes high‐dimensional regression modeling with a need for error‐in‐variables methods. The Convex Conditioned Lasso (CoCoLasso) utilizes a reformulated Lasso objective function and an error‐corrected cross‐validation to enable error‐in‐variables regression, but requires heavy computations. Here, we develop a Block coordinate Descent Convex Conditioned Lasso (BDCoCoLasso) algorithm for modeling high‐dimensional data that are only partially corrupted by measurement error. This algorithm separately optimizes the estimation of the uncorrupted and corrupted features in an iterative manner to reduce computational cost, with a specially calibrated formulation of cross‐validation error. Through simulations, we show that the BDCoCoLasso algorithm successfully copes with much larger feature sets than CoCoLasso, and as expected, outperforms the naïve Lasso with enhanced estimation accuracy and consistency, as the intensity and complexity of measurement errors increase. Also, a new smoothly clipped absolute deviation penalization option is added that may be appropriate for some data sets. We apply the BDCoCoLasso algorithm to data selected from the UK Biobank. We develop and showcase the utility of covariate‐adjusted genetic risk scores for body mass index, bone mineral density, and lifespan. We demonstrate that by leveraging more information than the naïve Lasso in partially corrupted data, the BDCoCoLasso may achieve higher prediction accuracy. These innovations, together with an R package, BDCoCoLasso, make error‐in‐variables adjustments more accessible for high‐dimensional data sets. We posit the BDCoCoLasso algorithm has the potential to be widely applied in various fields, including genomics‐facilitated personalized medicine research.
Collapse
Affiliation(s)
- Célia Escribe
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Québec, Canada.,Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA, United States
| | - Tianyuan Lu
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Québec, Canada.,Quantitative Life Sciences Program, McGill University, Montreal, Québec, Canada
| | - Julyan Keller-Baruch
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Québec, Canada.,Department of Human Genetics, McGill University, Montreal, Québec, Canada
| | - Vincenzo Forgetta
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Québec, Canada
| | - Bowei Xiao
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Québec, Canada.,Quantitative Life Sciences Program, McGill University, Montreal, Québec, Canada
| | - J Brent Richards
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Québec, Canada.,Department of Human Genetics, McGill University, Montreal, Québec, Canada.,Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Québec, Canada.,Department of Twin Research and Genetic Epidemiology, King's College London, London, United Kingdom
| | - Sahir Bhatnagar
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Québec, Canada.,Department of Diagnostic Radiology, McGill University, Montreal, Québec, Canada
| | - Karim Oualkacha
- Département de Mathématiques, Université du Québec à Montréal, Montreal, Québec, Canada
| | - Celia M T Greenwood
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Québec, Canada.,Department of Human Genetics, McGill University, Montreal, Québec, Canada.,Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Québec, Canada.,Gerald Bronfman Department of Oncology, McGill University, Montreal, Québec, Canada
| |
Collapse
|
48
|
Cai Li BY, Zhang H. TENSOR QUANTILE REGRESSION WITH APPLICATION TO ASSOCIATION BETWEEN NEUROIMAGES AND HUMAN INTELLIGENCE. Ann Appl Stat 2021; 15:1455-1477. [PMID: 34567336 PMCID: PMC8462802 DOI: 10.1214/21-aoas1475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Human intelligence is usually measured by well-established psychometric tests through a series of problem solving. The recorded cognitive scores are continuous but usually heavy-tailed with potential outliers and violating the normality assumption. Meanwhile, magnetic resonance imaging (MRI) provides an unparalleled opportunity to study brain structures and cognitive ability. Motivated by association studies between MRI images and human intelligence, we propose a tensor quantile regression model, which is a general and robust alternative to the commonly used scalar-on-image linear regression. Moreover, we take into account rich spatial information of brain structures, incorporating low-rankness and piece-wise smoothness of imaging coefficients into a regularized regression framework. We formulate the optimization problem as a sequence of penalized quantile regressions with a generalized Lasso penalty based on tensor decomposition, and develop a computationally efficient alternating direction method of multipliers algorithm (ADMM) to estimate the model components. Extensive numerical studies are conducted to examine the empirical performance of the proposed method and its competitors. Finally, we apply the proposed method to a large-scale important dataset: the Human Connectome Project. We find that the tensor quantile regression can serve as a prognostic tool to assess future risk of cognitive impairment progression. More importantly, with the proposed method, we are able to identify the most activated brain subregions associated with quantiles of human intelligence. The prefrontal and anterior cingulate cortex are found to be mostly associated with lower and upper quantile of fluid intelligence. The insular cortex associated with median of fluid intelligence is a rarely reported region.
Collapse
Affiliation(s)
- B Y Cai Li
- Department of Biostatistics, Yale University
| | | |
Collapse
|
49
|
Wang W, Zhu Z. Group structure detection for a high‐dimensional panel data model. CAN J STAT 2021. [DOI: 10.1002/cjs.11646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Wu Wang
- Center for Applied Statistics and School of Statistics Renmin University of China Beijing China
| | - Zhongyi Zhu
- Department of Statistics Fudan University Shanghai China
| |
Collapse
|
50
|
Sass D, Li B, Reich BJ. Flexible and Fast Spatial Return Level Estimation Via a Spatially Fused Penalty. J Comput Graph Stat 2021; 30:1124-1142. [PMID: 36186917 PMCID: PMC9524507 DOI: 10.1080/10618600.2021.1938584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Accepted: 05/29/2021] [Indexed: 10/21/2022]
Abstract
Spatial extremes are common for climate data as the observations are usually referenced by geographic locations and dependent when they are nearby. An important goal of extremes modeling is to estimate the T-year return level. Among the methods suitable for modeling spatial extremes, perhaps the simplest and fastest approach is the spatial generalized extreme value (GEV) distribution and the spatial generalized Pareto distribution (GPD) that assume marginal independence and only account for dependence through the parameters. Despite the simplicity, simulations have shown that return level estimation using the spatial GEV and spatial GPD still provides satisfactory results compared to max-stable processes, which are asymptotically justified models capable of representing spatial dependence among extremes. However, the linear functions used to model the spatially varying coefficients are restrictive and may be violated. We propose a flexible and fast approach based on the spatial GEV and spatial GPD by introducing fused lasso and fused ridge penalty for parameter regularization. This enables improved return level estimation for large spatial extremes compared to the existing methods.
Collapse
Affiliation(s)
| | - Bo Li
- University of Illinois at Urbana-Champaign
| | | |
Collapse
|