1
|
Jiang S, Cao J, Colditz GA. Identifying regions of interest in mammogram images. Stat Methods Med Res 2023; 32:895-903. [PMID: 36951095 PMCID: PMC10247406 DOI: 10.1177/09622802231160551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/24/2023]
Abstract
Screening mammography is the primary preventive strategy for early detection of breast cancer and an essential input to breast cancer risk prediction and application of prevention/risk management guidelines. Identifying regions of interest within mammogram images that are associated with 5- or 10-year breast cancer risk is therefore clinically meaningful. The problem is complicated by the irregular boundary issue posed by the semi-circular domain of the breast area within mammograms. Accommodating the irregular domain is especially crucial when identifying regions of interest, as the true signal comes only from the semi-circular domain of the breast region, and noise elsewhere. We address these challenges by introducing a proportional hazards model with imaging predictors characterized by bivariate splines over triangulation. The model sparsity is enforced with the group lasso penalty function. We apply the proposed method to the motivating Joanne Knight Breast Health Cohort to illustrate important risk patterns and show that the proposed method is able to achieve higher discriminatory performance.
Collapse
Affiliation(s)
- Shu Jiang
- Division of Public Health Sciences,
Washington University School of Medicine, St Louis, MO, USA
| | - Jiguo Cao
- Department of Statistics and Actuarial
Science, Simon Fraser University, Burnaby, BC, Canada
| | - Graham A. Colditz
- Division of Public Health Sciences,
Washington University School of Medicine, St Louis, MO, USA
| |
Collapse
|
2
|
Wang P, Zhao X, Zhong J, Zhou Y. Localization and Diagnosis of Attention-Deficit/Hyperactivity Disorder. Healthcare (Basel) 2021; 9:372. [PMID: 33801750 PMCID: PMC8066369 DOI: 10.3390/healthcare9040372] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2020] [Revised: 03/21/2021] [Accepted: 03/23/2021] [Indexed: 11/17/2022] Open
Abstract
In this paper, a random-forest-based method was proposed for the classification and localization of Attention-Deficit/Hyperactivity Disorder (ADHD), a common neurodevelopmental disorder among children. Experimental data were magnetic resonance imaging (MRI) from the public case-control dataset of 3D images for ADHD-200. Each MRI image was a 3D-tensor of 121×145×121 size. All 3D matrices (MRI) were segmented into the slices from each of three orthogonal directions. Each slice from the same position of the same direction in the training set was converted into a vector, and all these vectors were composed into a designed matrix to train the random forest classification algorithm; then, the well-trained RF classifier was exploited to give a prediction label in correspondence direction and position. Diagnosis and location results can be obtained upon the intersection of these three prediction matrices. The performance of our proposed method was illustrated on the dataset from New York University (NYU), Kennedy Krieger Institute (KKI) and full datasets; the results show that the proposed methods can archive more accuracy identification in discrimination of ADHD, and can be extended to the other practices of diagnosis. Moreover, another suspected region was found at the first time.
Collapse
Affiliation(s)
- Peng Wang
- School of Mathematics and Statistics, Lanzhou University, Lanzhou 730000, China; (P.W.); (J.Z.); (Y.Z.)
- School of Mathematics and Statistics, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Xuejing Zhao
- School of Mathematics and Statistics, Lanzhou University, Lanzhou 730000, China; (P.W.); (J.Z.); (Y.Z.)
| | - Jitao Zhong
- School of Mathematics and Statistics, Lanzhou University, Lanzhou 730000, China; (P.W.); (J.Z.); (Y.Z.)
- School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China
| | - Ying Zhou
- School of Mathematics and Statistics, Lanzhou University, Lanzhou 730000, China; (P.W.); (J.Z.); (Y.Z.)
| |
Collapse
|
3
|
Guo C, Kang J, Johnson TD. A spatial Bayesian latent factor model for image-on-image regression. Biometrics 2020; 78:72-84. [PMID: 33368210 DOI: 10.1111/biom.13420] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Revised: 12/03/2020] [Accepted: 12/10/2020] [Indexed: 11/30/2022]
Abstract
Image-on-image regression analysis, using images to predict images, is a challenging task, due to (1) the high dimensionality and (2) the complex spatial dependence structures in image predictors and image outcomes. In this work, we propose a novel image-on-image regression model, by extending a spatial Bayesian latent factor model to image data, where low-dimensional latent factors are adopted to make connections between high-dimensional image outcomes and image predictors. We assign Gaussian process priors to the spatially varying regression coefficients in the model, which can well capture the complex spatial dependence among image outcomes as well as that among the image predictors. We perform simulation studies to evaluate the out-of-sample prediction performance of our method compared with linear regression and voxel-wise regression methods for different scenarios. The proposed method achieves better prediction accuracy by effectively accounting for the spatial dependence and efficiently reduces image dimensions with latent factors. We apply the proposed method to analysis of multimodal image data in the Human Connectome Project where we predict task-related contrast maps using subcortical volumetric seed maps.
Collapse
Affiliation(s)
- Cui Guo
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Jian Kang
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Timothy D Johnson
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
4
|
Zhao Y, Li T, Zhu H. Bayesian sparse heritability analysis with high-dimensional neuroimaging phenotypes. Biostatistics 2020; 23:467-484. [PMID: 32948880 PMCID: PMC9308456 DOI: 10.1093/biostatistics/kxaa035] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 07/15/2020] [Accepted: 08/11/2020] [Indexed: 12/24/2022] Open
Abstract
Heritability analysis plays a central role in quantitative genetics to describe genetic contribution to human complex traits and prioritize downstream analyses under large-scale phenotypes. Existing works largely focus on modeling single phenotype and currently available multivariate phenotypic methods often suffer from scaling and interpretation. In this article, motivated by understanding how genetic underpinning impacts human brain variation, we develop an integrative Bayesian heritability analysis to jointly estimate heritabilities for high-dimensional neuroimaging traits. To induce sparsity and incorporate brain anatomical configuration, we impose hierarchical selection among both regional and local measurements based on brain structural network and voxel dependence. We also use a nonparametric Dirichlet process mixture model to realize grouping among single nucleotide polymorphism-associated phenotypic variations, providing biological plausibility. Through extensive simulations, we show the proposed method outperforms existing ones in heritability estimation and heritable traits selection under various scenarios. We finally apply the method to two large-scale imaging genetics datasets: the Alzheimer's Disease Neuroimaging Initiative and United Kingdom Biobank and show biologically meaningful results.
Collapse
Affiliation(s)
- Yize Zhao
- Department of Biostatistics, Yale University, 300 George Street, New Haven, CT 06511, USA
| | - Tengfei Li
- Department of Radiology, University of North Carolina at Chapel Hill, 101 Manning Dr, Chapel Hill, NC 27514, USA
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC 27514, USA
| |
Collapse
|
5
|
Zhao Y, Zhu H, Lu Z, Knickmeyer RC, Zou F. Structured Genome-Wide Association Studies with Bayesian Hierarchical Variable Selection. Genetics 2019; 212:397-415. [PMID: 31010934 PMCID: PMC6553832 DOI: 10.1534/genetics.119.301906] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 04/08/2019] [Indexed: 02/04/2023] Open
Abstract
It becomes increasingly important in using genome-wide association studies (GWAS) to select important genetic information associated with qualitative or quantitative traits. Currently, the discovery of biological association among SNPs motivates various strategies to construct SNP-sets along the genome and to incorporate such set information into selection procedure for a higher selection power, while facilitating more biologically meaningful results. The aim of this paper is to propose a novel Bayesian framework for hierarchical variable selection at both SNP-set (group) level and SNP (within group) level. We overcome a key limitation of existing posterior updating scheme in most Bayesian variable selection methods by proposing a novel sampling scheme to explicitly accommodate the ultrahigh-dimensionality of genetic data. Specifically, by constructing an auxiliary variable selection model under SNP-set level, the new procedure utilizes the posterior samples of the auxiliary model to subsequently guide the posterior inference for the targeted hierarchical selection model. We apply the proposed method to a variety of simulation studies and show that our method is computationally efficient and achieves substantially better performance than competing approaches in both SNP-set and SNP selection. Applying the method to the Alzheimers Disease Neuroimaging Initiative (ADNI) data, we identify biologically meaningful genetic factors under several neuroimaging volumetric phenotypes. Our method is general and readily to be applied to a wide range of biomedical studies.
Collapse
Affiliation(s)
- Yize Zhao
- Department of Healthcare Policy and Research, Cornell University Weill Cornell, New York, New York 10065
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599
| | - Zhaohua Lu
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, Tennessee 38105
| | - Rebecca C Knickmeyer
- Department of Pediatrics and Human Development, Michigan State University, East Lansing, Michigan 48824
| | - Fei Zou
- Department of Biostatistics, University of Florida, Gainesville, Florida 32611
| |
Collapse
|
6
|
Happ C, Greven S, Schmid VJ. The impact of model assumptions in scalar-on-image regression. Stat Med 2018; 37:4298-4317. [PMID: 30132932 DOI: 10.1002/sim.7915] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Revised: 06/20/2018] [Accepted: 06/27/2018] [Indexed: 11/11/2022]
Abstract
Complex statistical models such as scalar-on-image regression often require strong assumptions to overcome the issue of nonidentifiability. While in theory, it is well understood that model assumptions can strongly influence the results, this seems to be underappreciated, or played down, in practice. This article gives a systematic overview of the main approaches for scalar-on-image regression with a special focus on their assumptions. We categorize the assumptions and develop measures to quantify the degree to which they are met. The impact of model assumptions and the practical usage of the proposed measures are illustrated in a simulation study and in an application to neuroimaging data. The results show that different assumptions indeed lead to quite different estimates with similar predictive ability, raising the question of their interpretability. We give recommendations for making modeling and interpretation decisions in practice based on the new measures and simulations using hypothetic coefficient images and the observed data.
Collapse
Affiliation(s)
- Clara Happ
- Department of Statistics, LMU Munich, Munich, Germany
| | - Sonja Greven
- Department of Statistics, LMU Munich, Munich, Germany
| | | |
Collapse
|
7
|
Zhao Y, Kang J, Long Q. Bayesian Multiresolution Variable Selection for Ultra-High Dimensional Neuroimaging Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:537-550. [PMID: 29610102 PMCID: PMC5885321 DOI: 10.1109/tcbb.2015.2440244] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Ultra-high dimensional variable selection has become increasingly important in analysis of neuroimaging data. For example, in the Autism Brain Imaging Data Exchange (ABIDE) study, neuroscientists are interested in identifying important biomarkers for early detection of the autism spectrum disorder (ASD) using high resolution brain images that include hundreds of thousands voxels. However, most existing methods are not feasible for solving this problem due to their extensive computational costs. In this work, we propose a novel multiresolution variable selection procedure under a Bayesian probit regression framework. It recursively uses posterior samples for coarser-scale variable selection to guide the posterior inference on finer-scale variable selection, leading to very efficient Markov chain Monte Carlo (MCMC) algorithms. The proposed algorithms are computationally feasible for ultra-high dimensional data. Also, our model incorporates two levels of structural information into variable selection using Ising priors: the spatial dependence between voxels and the functional connectivity between anatomical brain regions. Applied to the resting state functional magnetic resonance imaging (R-fMRI) data in the ABIDE study, our methods identify voxel-level imaging biomarkers highly predictive of the ASD, which are biologically meaningful and interpretable. Extensive simulations also show that our methods achieve better performance in variable selection compared to existing methods.
Collapse
|
8
|
Reiss PT, Goldsmith J, Shang HL, Ogden RT. Methods for scalar-on-function regression. Int Stat Rev 2017; 85:228-249. [PMID: 28919663 PMCID: PMC5598560 DOI: 10.1111/insr.12163] [Citation(s) in RCA: 69] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Accepted: 12/28/2015] [Indexed: 01/16/2023]
Abstract
Recent years have seen an explosion of activity in the field of functional data analysis (FDA), in which curves, spectra, images, etc. are considered as basic functional data units. A central problem in FDA is how to fit regression models with scalar responses and functional data points as predictors. We review some of the main approaches to this problem, categorizing the basic model types as linear, nonlinear and nonparametric. We discuss publicly available software packages, and illustrate some of the procedures by application to a functional magnetic resonance imaging dataset.
Collapse
Affiliation(s)
- Philip T. Reiss
- Department of Child and Adolescent Psychiatry and Department of Population Health, New York University School of Medicine
- Department of Statistics, University of Haifa
| | - Jeff Goldsmith
- Department of Biostatistics, Columbia University Mailman School of Public Health
| | - Han Lin Shang
- Research School of Finance, Actuarial Studies and Statistics, Australian National University
| | - R. Todd Ogden
- Department of Biostatistics, Columbia University Mailman School of Public Health
- New York State Psychiatric Institute
| |
Collapse
|
9
|
Multimodal spatial-based segmentation framework for white matter lesions in multi-sequence magnetic resonance images. Biomed Signal Process Control 2017. [DOI: 10.1016/j.bspc.2016.06.016] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
10
|
Geenens G. Moments, errors, asymptotic normality and large deviation principle in nonparametric functional regression. Stat Probab Lett 2015. [DOI: 10.1016/j.spl.2015.09.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
11
|
Shu H, Nan B, Koeppe R. Multiple testing for neuroimaging via hidden Markov random field. Biometrics 2015; 71:741-50. [PMID: 26012881 PMCID: PMC4579542 DOI: 10.1111/biom.12329] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2014] [Revised: 03/01/2015] [Accepted: 04/01/2015] [Indexed: 11/29/2022]
Abstract
Traditional voxel-level multiple testing procedures in neuroimaging, mostly p-value based, often ignore the spatial correlations among neighboring voxels and thus suffer from substantial loss of power. We extend the local-significance-index based procedure originally developed for the hidden Markov chain models, which aims to minimize the false nondiscovery rate subject to a constraint on the false discovery rate, to three-dimensional neuroimaging data using a hidden Markov random field model. A generalized expectation-maximization algorithm for maximizing the penalized likelihood is proposed for estimating the model parameters. Extensive simulations show that the proposed approach is more powerful than conventional false discovery rate procedures. We apply the method to the comparison between mild cognitive impairment, a disease status with increased risk of developing Alzheimer's or another dementia, and normal controls in the FDG-PET imaging study of the Alzheimer's Disease Neuroimaging Initiative.
Collapse
Affiliation(s)
- Hai Shu
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A
| | - Bin Nan
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A
| | - Robert Koeppe
- Department of Radiology, University of Michigan, Ann Arbor, Michigan 48109, U.S.A
| |
Collapse
|
12
|
Reiss PT, Huo L, Zhao Y, Kelly C, Ogden RT. WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING. Ann Appl Stat 2015; 9:1076-1101. [PMID: 27330652 PMCID: PMC4912166 DOI: 10.1214/15-aoas829] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
An increasingly important goal of psychiatry is the use of brain imaging data to develop predictive models. Here we present two contributions to statistical methodology for this purpose. First, we propose and compare a set of wavelet-domain procedures for fitting generalized linear models with scalar responses and image predictors: sparse variants of principal component regression and of partial least squares, and the elastic net. Second, we consider assessing the contribution of image predictors over and above available scalar predictors, in particular via permutation tests and an extension of the idea of confounding to the case of functional or image predictors. Using the proposed methods, we assess whether maps of a spontaneous brain activity measure, derived from functional magnetic resonance imaging, can meaningfully predict presence or absence of attention deficit/hyperactivity disorder (ADHD). Our results shed light on the role of confounding in the surprising outcome of the recent ADHD-200 Global Competition, which challenged researchers to develop algorithms for automated image-based diagnosis of the disorder.
Collapse
|
13
|
Li F, Zhang T, Wang Q, Gonzalez MZ, Maresh EL, Coan JA. Spatial Bayesian variable selection and grouping for high-dimensional scalar-on-image regression. Ann Appl Stat 2015. [DOI: 10.1214/15-aoas818] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
14
|
|