1
|
Sherwood B, Price BS. On the Use of Minimum Penalties in Statistical Learning. J Comput Graph Stat 2023; 33:138-151. [PMID: 38706715 PMCID: PMC11065433 DOI: 10.1080/10618600.2023.2210174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 04/27/2023] [Indexed: 05/07/2024]
Abstract
Modern multivariate machine learning and statistical methodologies estimate parameters of interest while leveraging prior knowledge of the association between outcome variables. The methods that do allow for estimation of relationships do so typically through an error covariance matrix in multivariate regression which does not generalize to other types of models. In this article we proposed the MinPen framework to simultaneously estimate regression coefficients associated with the multivariate regression model and the relationships between outcome variables using common assumptions. The MinPen framework utilizes a novel penalty based on the minimum function to simultaneously detect and exploit relationships between responses. An iterative algorithm is proposed as a solution to the non-convex optimization. Theoretical results such as high dimensional convergence rates, model selection consistency, and a framework for post selection inference are provided. We extend the proposed MinPen framework to other exponential family loss functions, with a specific focus on multiple binomial responses. Tuning parameter selection is also addressed. Finally, simulations and two data examples are presented to show the finite sample properties of this framework. Supplemental material providing proofs, additional simulations, code, and data sets are available online.
Collapse
Affiliation(s)
| | - Bradley S. Price
- Management Information Systems Department, West Virginia University
| |
Collapse
|
2
|
el Bouhaddani S, Uh H, Jongbloed G, Houwing‐Duistermaat J. Statistical integration of heterogeneous omics data: Probabilistic two‐way partial least squares (PO2PLS). J R Stat Soc Ser C Appl Stat 2022. [DOI: 10.1111/rssc.12583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Said el Bouhaddani
- Department of Data Science and Biostatistics UMC Utrecht UtrechtThe Netherlands
| | - Hae‐Won Uh
- Department of Data Science and Biostatistics UMC Utrecht UtrechtThe Netherlands
| | - Geurt Jongbloed
- Delft Institute of Applied Mathematics TU Delft Delft The Netherlands
| | - Jeanine Houwing‐Duistermaat
- Department of Data Science and Biostatistics UMC Utrecht UtrechtThe Netherlands
- Department of Statistics University of Leeds Leeds UK
- Department of Statistical Sciences University of Bologna Bologna Italy
| |
Collapse
|
3
|
Fan Y, Sun J. Subsampling from features in large regression to find “winning features”. Stat Anal Data Min 2021. [DOI: 10.1002/sam.11499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Yiying Fan
- Department of Mathematics and Statistics Cleveland State University Cleveland Ohio USA
| | - Jiayang Sun
- Department of Statistics George Mason University Fairfax Virginia USA
| |
Collapse
|
4
|
Park Y, Su Z, Zhu H. Groupwise envelope models for imaging genetic analysis. Biometrics 2017; 73:1243-1253. [PMID: 28323341 PMCID: PMC5608647 DOI: 10.1111/biom.12689] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Revised: 02/01/2017] [Accepted: 02/01/2017] [Indexed: 11/28/2022]
Abstract
Motivated by searching for associations between genetic variants and brain imaging phenotypes, the aim of this article is to develop a groupwise envelope model for multivariate linear regression in order to establish the association between both multivariate responses and covariates. The groupwise envelope model allows for both distinct regression coefficients and distinct error structures for different groups. Statistically, the proposed envelope model can dramatically improve efficiency of tests and of estimation. Theoretical properties of the proposed model are established. Numerical experiments as well as the analysis of an imaging genetic data set obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study show the effectiveness of the model in efficient estimation. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database.
Collapse
Affiliation(s)
- Yeonhee Park
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, U.S.A
| | - Zhihua Su
- Department of Statistics, University of Florida, Gainesville, FL 32611, U.S.A
| | - Hongtu Zhu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, U.S.A
| |
Collapse
|
5
|
Affiliation(s)
- Xin Zhang
- Department of Statistics, Florida State University, Tallahassee, FL
| | - Lexin Li
- Division of Biostatistics, University of California, Berkeley, CA
| |
Collapse
|
6
|
Weiner MW, Veitch DP, Aisen PS, Beckett LA, Cairns NJ, Green RC, Harvey D, Jack CR, Jagust W, Morris JC, Petersen RC, Saykin AJ, Shaw LM, Toga AW, Trojanowski JQ. Recent publications from the Alzheimer's Disease Neuroimaging Initiative: Reviewing progress toward improved AD clinical trials. Alzheimers Dement 2017; 13:e1-e85. [PMID: 28342697 PMCID: PMC6818723 DOI: 10.1016/j.jalz.2016.11.007] [Citation(s) in RCA: 179] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2016] [Revised: 11/21/2016] [Accepted: 11/28/2016] [Indexed: 01/31/2023]
Abstract
INTRODUCTION The Alzheimer's Disease Neuroimaging Initiative (ADNI) has continued development and standardization of methodologies for biomarkers and has provided an increased depth and breadth of data available to qualified researchers. This review summarizes the over 400 publications using ADNI data during 2014 and 2015. METHODS We used standard searches to find publications using ADNI data. RESULTS (1) Structural and functional changes, including subtle changes to hippocampal shape and texture, atrophy in areas outside of hippocampus, and disruption to functional networks, are detectable in presymptomatic subjects before hippocampal atrophy; (2) In subjects with abnormal β-amyloid deposition (Aβ+), biomarkers become abnormal in the order predicted by the amyloid cascade hypothesis; (3) Cognitive decline is more closely linked to tau than Aβ deposition; (4) Cerebrovascular risk factors may interact with Aβ to increase white-matter (WM) abnormalities which may accelerate Alzheimer's disease (AD) progression in conjunction with tau abnormalities; (5) Different patterns of atrophy are associated with impairment of memory and executive function and may underlie psychiatric symptoms; (6) Structural, functional, and metabolic network connectivities are disrupted as AD progresses. Models of prion-like spreading of Aβ pathology along WM tracts predict known patterns of cortical Aβ deposition and declines in glucose metabolism; (7) New AD risk and protective gene loci have been identified using biologically informed approaches; (8) Cognitively normal and mild cognitive impairment (MCI) subjects are heterogeneous and include groups typified not only by "classic" AD pathology but also by normal biomarkers, accelerated decline, and suspected non-Alzheimer's pathology; (9) Selection of subjects at risk of imminent decline on the basis of one or more pathologies improves the power of clinical trials; (10) Sensitivity of cognitive outcome measures to early changes in cognition has been improved and surrogate outcome measures using longitudinal structural magnetic resonance imaging may further reduce clinical trial cost and duration; (11) Advances in machine learning techniques such as neural networks have improved diagnostic and prognostic accuracy especially in challenges involving MCI subjects; and (12) Network connectivity measures and genetic variants show promise in multimodal classification and some classifiers using single modalities are rivaling multimodal classifiers. DISCUSSION Taken together, these studies fundamentally deepen our understanding of AD progression and its underlying genetic basis, which in turn informs and improves clinical trial design.
Collapse
Affiliation(s)
- Michael W Weiner
- Department of Veterans Affairs Medical Center, Center for Imaging of Neurodegenerative Diseases, San Francisco, CA, USA; Department of Radiology, University of California, San Francisco, CA, USA; Department of Medicine, University of California, San Francisco, CA, USA; Department of Psychiatry, University of California, San Francisco, CA, USA; Department of Neurology, University of California, San Francisco, CA, USA.
| | - Dallas P Veitch
- Department of Veterans Affairs Medical Center, Center for Imaging of Neurodegenerative Diseases, San Francisco, CA, USA
| | - Paul S Aisen
- Alzheimer's Therapeutic Research Institute, University of Southern California, San Diego, CA, USA
| | - Laurel A Beckett
- Division of Biostatistics, Department of Public Health Sciences, University of California, Davis, CA, USA
| | - Nigel J Cairns
- Knight Alzheimer's Disease Research Center, Washington University School of Medicine, Saint Louis, MO, USA; Department of Neurology, Washington University School of Medicine, Saint Louis, MO, USA
| | - Robert C Green
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Danielle Harvey
- Division of Biostatistics, Department of Public Health Sciences, University of California, Davis, CA, USA
| | | | - William Jagust
- Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, CA, USA
| | - John C Morris
- Alzheimer's Therapeutic Research Institute, University of Southern California, San Diego, CA, USA
| | | | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA; Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Leslie M Shaw
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Arthur W Toga
- Laboratory of Neuroimaging, Institute of Neuroimaging and Informatics, Keck School of Medicine of University of Southern California, Los Angeles, CA, USA
| | - John Q Trojanowski
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Institute on Aging, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Alzheimer's Disease Core Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Udall Parkinson's Research Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
7
|
Kim J, Pan W. Adaptive testing for multiple traits in a proportional odds model with applications to detect SNP-brain network associations. Genet Epidemiol 2017; 41:259-277. [PMID: 28191669 DOI: 10.1002/gepi.22033] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Revised: 10/07/2016] [Accepted: 10/31/2016] [Indexed: 12/15/2022]
Abstract
There has been increasing interest in developing more powerful and flexible statistical tests to detect genetic associations with multiple traits, as arising from neuroimaging genetic studies. Most of existing methods treat a single trait or multiple traits as response while treating an SNP as a predictor coded under an additive inheritance mode. In this paper, we follow an earlier approach in treating an SNP as an ordinal response while treating traits as predictors in a proportional odds model (POM). In this way, it is not only easier to handle mixed types of traits, e.g., some quantitative and some binary, but it is also potentially more robust to the commonly adopted additive inheritance mode. More importantly, we develop an adaptive test in a POM so that it can maintain high power across many possible situations. Compared to the existing methods treating multiple traits as responses, e.g., in a generalized estimating equation (GEE) approach, the proposed method can be applied to a high dimensional setting where the number of phenotypes (p) can be larger than the sample size (n), in addition to a usual small P setting. The promising performance of the proposed method was demonstrated with applications to the Alzheimer's Disease Neuroimaging Initiative (ADNI) data, in which either structural MRI driven phenotypes or resting-state functional MRI (rs-fMRI) derived brain functional connectivity measures were used as phenotypes. The applications led to the identification of several top SNPs of biological interest. Furthermore, simulation studies showed competitive performance of the new method, especially for p>n.
Collapse
Affiliation(s)
- Junghi Kim
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | -
- Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http: //adni.loni.usc.edu/wp-content/uploads/how to apply/ADNI Acknowledgement List.pdf
| |
Collapse
|
8
|
Kim J, Pan W. Highly adaptive tests for group differences in brain functional connectivity. NEUROIMAGE-CLINICAL 2015; 9:625-39. [PMID: 26740916 PMCID: PMC4644249 DOI: 10.1016/j.nicl.2015.10.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/26/2015] [Revised: 09/14/2015] [Accepted: 10/05/2015] [Indexed: 01/06/2023]
Abstract
Resting-state functional magnetic resonance imaging (rs-fMRI) and other technologies have been offering evidence and insights showing that altered brain functional networks are associated with neurological illnesses such as Alzheimer's disease. Exploring brain networks of clinical populations compared to those of controls would be a key inquiry to reveal underlying neurological processes related to such illnesses. For such a purpose, group-level inference is a necessary first step in order to establish whether there are any genuinely disrupted brain subnetworks. Such an analysis is also challenging due to the high dimensionality of the parameters in a network model and high noise levels in neuroimaging data. We are still in the early stage of method development as highlighted by Varoquaux and Craddock (2013) that “there is currently no unique solution, but a spectrum of related methods and analytical strategies” to learn and compare brain connectivity. In practice the important issue of how to choose several critical parameters in estimating a network, such as what association measure to use and what is the sparsity of the estimated network, has not been carefully addressed, largely because the answers are unknown yet. For example, even though the choice of tuning parameters in model estimation has been extensively discussed in the literature, as to be shown here, an optimal choice of a parameter for network estimation may not be optimal in the current context of hypothesis testing. Arbitrarily choosing or mis-specifying such parameters may lead to extremely low-powered tests. Here we develop highly adaptive tests to detect group differences in brain connectivity while accounting for unknown optimal choices of some tuning parameters. The proposed tests combine statistical evidence against a null hypothesis from multiple sources across a range of plausible tuning parameter values reflecting uncertainty with the unknown truth. These highly adaptive tests are not only easy to use, but also high-powered robustly across various scenarios. The usage and advantages of these novel tests are demonstrated on an Alzheimer's disease dataset and simulated data. Rigorous testing for genuinely altered functional networks between two groups The proposed tests are high powered and general across a wide range of scenarios. Data-driven penalized network estimation Data-driven choice between correlations and partial correlations to describe association Some key differences between network estimation and testing are highlighted.
Collapse
Affiliation(s)
- Junghi Kim
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA
| | | |
Collapse
|
9
|
Huang M, Nichols T, Huang C, Yang Y, Lu Z, Feng Q, Knickmeyer RC, Zhu H. FVGWAS: Fast voxelwise genome wide association analysis of large-scale imaging genetic data. Neuroimage 2015; 118:613-27. [PMID: 26025292 PMCID: PMC4554832 DOI: 10.1016/j.neuroimage.2015.05.043] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2015] [Revised: 04/09/2015] [Accepted: 05/16/2015] [Indexed: 01/17/2023] Open
Abstract
More and more large-scale imaging genetic studies are being widely conducted to collect a rich set of imaging, genetic, and clinical data to detect putative genes for complexly inherited neuropsychiatric and neurodegenerative disorders. Several major big-data challenges arise from testing genome-wide (NC>12 million known variants) associations with signals at millions of locations (NV~10(6)) in the brain from thousands of subjects (n~10(3)). The aim of this paper is to develop a Fast Voxelwise Genome Wide Association analysiS (FVGWAS) framework to efficiently carry out whole-genome analyses of whole-brain data. FVGWAS consists of three components including a heteroscedastic linear model, a global sure independence screening (GSIS) procedure, and a detection procedure based on wild bootstrap methods. Specifically, for standard linear association, the computational complexity is O (nNVNC) for voxelwise genome wide association analysis (VGWAS) method compared with O ((NC+NV)n(2)) for FVGWAS. Simulation studies show that FVGWAS is an efficient method of searching sparse signals in an extremely large search space, while controlling for the family-wise error rate. Finally, we have successfully applied FVGWAS to a large-scale imaging genetic data analysis of ADNI data with 708 subjects, 193,275voxels in RAVENS maps, and 501,584 SNPs, and the total processing time was 203,645s for a single CPU. Our FVGWAS may be a valuable statistical toolbox for large-scale imaging genetic analysis as the field is rapidly advancing with ultra-high-resolution imaging and whole-genome sequencing.
Collapse
Affiliation(s)
- Meiyan Huang
- School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China
| | - Thomas Nichols
- Department of Statistics, University of Warwick, Coventry, UK
| | - Chao Huang
- Department of Biostatistics and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yu Yang
- Department of Statistics and Operation Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Zhaohua Lu
- Department of Biostatistics and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Qianjing Feng
- School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China
| | - Rebecca C Knickmeyer
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Hongtu Zhu
- Department of Biostatistics and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | | |
Collapse
|
10
|
Abstract
Since the launch in 2003 of the Alzheimer's Disease Neuroimaging Initiative (ADNI) in the USA, ever growing, similarly oriented consortia have been organized and assembled around the world. The various accomplishments of ADNI have contributed substantially to a better understanding of the underlying physiopathology of aging and Alzheimer's disease (AD). These accomplishments are basically predicated in the trinity of multimodality, standardization and sharing. This multimodality approach can now better identify those subjects with AD-specific traits that are more likely to present cognitive decline in the near future and that might represent the best candidates for smaller but more efficient therapeutic trials - trials that, through gained and shared knowledge, can be more focused on a specific target or a specific stage of the disease process. In summary, data generated from ADNI have helped elucidate some of the pathophysiological mechanisms underpinning aging and AD pathology, while contributing to the international effort in setting the groundwork for biomarker discovery and establishing standards for early diagnosis of AD.
Collapse
Affiliation(s)
- Victor L Villemagne
- Department of Nuclear Medicine and Centre for PET, Austin Health, 145 Studley Road, Heidelberg 3084, VIC, Australia
- The Florey Institute for Neurosciences and Mental Health, The University of Melbourne, 30 Royal Parade, Melbourne 3010, VIC, Australia
- Department of Medicine, The University of Melbourne, Grattan Street, Melbourne 3010, VIC, Australia
| | - Seong Yoon Kim
- Asan Medical Center, University of Ulsan Medical College, 88 Olympic-Ro 43-Gil, Songpa-Gu, Seoul, Korea
| | - Christopher C Rowe
- Department of Nuclear Medicine and Centre for PET, Austin Health, 145 Studley Road, Heidelberg 3084, VIC, Australia
- Department of Medicine, The University of Melbourne, Grattan Street, Melbourne 3010, VIC, Australia
| | - Takeshi Iwatsubo
- Department of Neuropathology, School of Medicine, The University of Tokyo, 7-3-1, Hongo, Bunkyo-ku 113-0033, Tokyo, Japan
| |
Collapse
|