1
|
Dutta D, Sen A, Satagopan JM. Identifying genes associated with disease outcomes using joint sparse canonical correlation analysis-An application in renal clear cell carcinoma. Genet Epidemiol 2024; 48:414-432. [PMID: 38751238 PMCID: PMC11589067 DOI: 10.1002/gepi.22566] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 04/04/2024] [Accepted: 04/22/2024] [Indexed: 11/27/2024]
Abstract
Somatic changes like copy number aberrations (CNAs) and epigenetic alterations like methylation have pivotal effects on disease outcomes and prognosis in cancer, by regulating gene expressions, that drive critical biological processes. To identify potential biomarkers and molecular targets and understand how they impact disease outcomes, it is important to identify key groups of CNAs, the associated methylation, and the gene expressions they impact, through a joint integrative analysis. Here, we propose a novel analysis pipeline, the joint sparse canonical correlation analysis (jsCCA), an extension of sCCA, to effectively identify an ensemble of CNAs, methylation sites and gene (expression) components in the context of disease endpoints, especially tumor characteristics. Our approach detects potentially orthogonal gene components that are highly correlated with sets of methylation sites which in turn are correlated with sets of CNA sites. It then identifies the genes within these components that are associated with the outcome. Further, we aggregate the effect of each gene expression set on tumor stage by constructing "gene component scores" and test its interaction with traditional risk factors. Analyzing clinical and genomic data on 515 renal clear cell carcinoma (ccRCC) patients from the TCGA-KIRC, we found eight gene components to be associated with methylation sites, regulated by groups of proximally located CNA sites. Association analysis with tumor stage at diagnosis identified a novel association of expression of ASAH1 gene trans-regulated by methylation of several genes including SIX5 and by CNAs in the 10q25 region including TCF7L2. Further analysis to quantify the overall effect of gene sets on tumor stage, revealed that two of the eight gene components have significant interaction with smoking in relation to tumor stage. These gene components represent distinct biological functions including immune function, inflammatory responses, and hypoxia-regulated pathways. Our findings suggest that jsCCA analysis can identify interpretable and important genes, regulatory structures, and clinically consequential pathways. Such methods are warranted for comprehensive analysis of multimodal data especially in cancer genomics.
Collapse
Affiliation(s)
- Diptavo Dutta
- Integrative Tumor Epidemiology Branch, Division of Cancer Epidemiology and GeneticsNational Cancer InstituteRockvilleUSA
| | - Ananda Sen
- Department of BiostatisticsUniversity of MichiganAnn ArborUSA
- Department of Family MedicineUniversity of MichiganAnn ArborUSA
| | - Jaya M. Satagopan
- Department of Biostatistics and EpidemiologyRutgers School of Public HealthPiscatawayUSA
| |
Collapse
|
2
|
Kim BH, Seo SW, Park YH, Kim J, Kim HJ, Jang H, Yun J, Kim M, Kim JP. Clinical application of sparse canonical correlation analysis to detect genetic associations with cortical thickness in Alzheimer's disease. Front Neurosci 2024; 18:1428900. [PMID: 39381682 PMCID: PMC11458562 DOI: 10.3389/fnins.2024.1428900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 08/19/2024] [Indexed: 10/10/2024] Open
Abstract
Introduction Alzheimer's disease (AD) is a progressive neurodegenerative disease characterized by cerebral cortex atrophy. In this study, we used sparse canonical correlation analysis (SCCA) to identify associations between single nucleotide polymorphisms (SNPs) and cortical thickness in the Korean population. We also investigated the role of the SNPs in neurological outcomes, including neurodegeneration and cognitive dysfunction. Methods We recruited 1125 Korean participants who underwent neuropsychological testing, brain magnetic resonance imaging, positron emission tomography, and microarray genotyping. We performed group-wise SCCA in Aβ negative (-) and Aβ positive (+) groups. In addition, we performed mediation, expression quantitative trait loci, and pathway analyses to determine the functional role of the SNPs. Results We identified SNPs related to cortical thickness using SCCA in Aβ negative and positive groups and identified SNPs that improve the prediction performance of cognitive impairments. Among them, rs9270580 was associated with cortical thickness by mediating Aβ uptake, and three SNPs (rs2271920, rs6859, rs9270580) were associated with the regulation of CHRNA2, NECTIN2, and HLA genes. Conclusion Our findings suggest that SNPs potentially contribute to cortical thickness in AD, which in turn leads to worse clinical outcomes. Our findings contribute to the understanding of the genetic architecture underlying cortical atrophy and its relationship with AD.
Collapse
Affiliation(s)
- Bo-Hyun Kim
- Alzheimer’s Disease Convergence Research Center, Samsung Medical Center, Seoul, Republic of Korea
| | - Sang Won Seo
- Alzheimer’s Disease Convergence Research Center, Samsung Medical Center, Seoul, Republic of Korea
- Department of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
- Neuroscience Center, Samsung Medical Center, Seoul, Republic of Korea
| | - Yu Hyun Park
- Alzheimer’s Disease Convergence Research Center, Samsung Medical Center, Seoul, Republic of Korea
| | - JiHyun Kim
- Alzheimer’s Disease Convergence Research Center, Samsung Medical Center, Seoul, Republic of Korea
| | - Hee Jin Kim
- Alzheimer’s Disease Convergence Research Center, Samsung Medical Center, Seoul, Republic of Korea
- Department of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
- Neuroscience Center, Samsung Medical Center, Seoul, Republic of Korea
| | - Hyemin Jang
- Alzheimer’s Disease Convergence Research Center, Samsung Medical Center, Seoul, Republic of Korea
- Department of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
- Neuroscience Center, Samsung Medical Center, Seoul, Republic of Korea
- Department of Neurology, Seoul National University Hospital, Seoul, Republic of Korea
| | - Jihwan Yun
- Alzheimer’s Disease Convergence Research Center, Samsung Medical Center, Seoul, Republic of Korea
- Department of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
- Neuroscience Center, Samsung Medical Center, Seoul, Republic of Korea
- Department of Neurology, Soonchunhyang University Bucheon Hospital, Gyeonggi-do, Republic of Korea
| | - Mansu Kim
- Artificial Intelligence Graduate School, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea
| | - Jun Pyo Kim
- Alzheimer’s Disease Convergence Research Center, Samsung Medical Center, Seoul, Republic of Korea
- Department of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
- Neuroscience Center, Samsung Medical Center, Seoul, Republic of Korea
| |
Collapse
|
3
|
Chung J, Kim S, Won JH, Park H. Integrating Multimodal Neuroimaging and Genetics: A Structurally-Linked Sparse Canonical Correlation Analysis Approach. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE 2024; 12:659-667. [PMID: 39464624 PMCID: PMC11505868 DOI: 10.1109/jtehm.2024.3463720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 08/16/2024] [Accepted: 09/14/2024] [Indexed: 10/29/2024]
Abstract
Neuroimaging genetics represents a multivariate approach aimed at elucidating the intricate relationships between high-dimensional genetic variations and neuroimaging data. Predominantly, existing methodologies revolve around Sparse Canonical Correlation Analysis (SCCA), a framework we expand to 1) encompass multiple imaging modalities and 2) promote the simultaneous identification of structurally linked features across imaging modalities. The structurally linked brain regions were assessed using diffusion tensor imaging, which quantifies the presence of neuronal fibers, thereby grounding our approach in biologically well-founded prior knowledge within the SCCA model. In our proposed structurally linked SCCA framework, we leverage T1-weighted MRI and functional MRI (fMRI) time series data to delineate both the structural and functional characteristics of the brain. Genetic variations, specifically single nucleotide polymorphisms (SNPs), are also incorporated as a genetic modality. Validation of our methodology was conducted using a simulated dataset and large-scale normative data from the Human Connectome Project (HCP). Our approach demonstrated superior performance compared to existing methods on simulated data and revealed interpretable gene-imaging associations in the real dataset. Thus, our methodology lays the groundwork for elucidating the genetic underpinnings of brain structure and function, thereby providing novel insights into the field of neuroscience. Our code is available at https://github.com/mungegg.
Collapse
Affiliation(s)
- Jiwon Chung
- Department of Electrical and Computer EngineeringSungkyunkwan UniversitySuwon16419Republic of Korea
| | - Sunghun Kim
- Department of Electrical and Computer EngineeringSungkyunkwan UniversitySuwon16419Republic of Korea
| | - Ji Hye Won
- Department of Computer Engineering and Artificial IntelligencePukyong National UniversityBusan48513Republic of Korea
| | - Hyunjin Park
- Department of Electrical and Computer EngineeringSungkyunkwan UniversitySuwon16419Republic of Korea
- Center for Neuroscience Imaging ResearchInstitute for Basic ScienceSuwon16419Republic of Korea
| |
Collapse
|
4
|
Mondal S, Maji P. Multi-Task Learning and Sparse Discriminant Canonical Correlation Analysis for Identification of Diagnosis-Specific Genotype-Phenotype Association. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1390-1402. [PMID: 38587960 DOI: 10.1109/tcbb.2024.3386406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/10/2024]
Abstract
The primary objective of imaging genetics research is to investigate the complex genotype-phenotype association for the disease under study. For example, to understand the impact of genetic variations over the brain functions and structure, the genotypic data such as single nucleotide polymorphism (SNP) is integrated with the phenotypic data such as imaging quantitative traits. The sparse models, based on canonical correlation analysis (CCA), are popular in this area to find the complex bi-multivariate genotype-phenotype association, as the number of features in genotypic and/or phenotypic data is significantly higher as compared to the number of samples. However, the sparse CCA based methods are, in general, unsupervised in nature, and fail to identify the diagnose-specific features those play an important role for the diagnosis and prognosis of the disease under study. In this regard, a new supervised model is proposed to study the complex genotype-phenotype association, by judiciously integrating the merits of CCA, linear discriminant analysis (LDA) and multi-task learning. The proposed model can identify the diagnose-specific as well as the diagnose-consistent features with significantly lower computational complexity. The performance of the proposed method, along with a comparison with the state-of-the-art methods, is evaluated on several synthetic data sets and one real imaging genetics data collected from Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. In the current study, the SNP as genetic data and resting state functional MRI ( fMRI) as imaging data are integrated to find the complex genotype-phenotype association. An important finding is that the proposed method has better correlation value, improved noise resistance and stability, and also has better feature selection ability. All the results illustrate the power and capability of the proposed method to find the diagnostic group-specific imaging genetic association, which may help to understand the neurodegenerative disorder in a more comprehensive way.
Collapse
|
5
|
Zhou Z, Tarzanagh DA, Hou B, Tong B, Xu J, Feng Y, Long Q, Shen L. Fair Canonical Correlation Analysis. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2023; 36:3675-3705. [PMID: 38665178 PMCID: PMC11040228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]
Abstract
This paper investigates fairness and bias in Canonical Correlation Analysis (CCA), a widely used statistical technique for examining the relationship between two sets of variables. We present a framework that alleviates unfairness by minimizing the correlation disparity error associated with protected attributes. Our approach enables CCA to learn global projection matrices from all data points while ensuring that these matrices yield comparable correlation levels to group-specific projection matrices. Experimental evaluation on both synthetic and real-world datasets demonstrates the efficacy of our method in reducing correlation disparity error without compromising CCA accuracy.
Collapse
Affiliation(s)
| | | | | | | | - Jia Xu
- University of Pennsylvania
| | | | | | | |
Collapse
|
6
|
Kong W, Xu Y, Wang S, Wei K, Wen G, Yu Y, Zhu Y. A Novel Longitudinal Phenotype-Genotype Association Study Based on Deep Feature Extraction and Hypergraph Models for Alzheimer's Disease. Biomolecules 2023; 13:biom13050728. [PMID: 37238598 DOI: 10.3390/biom13050728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 03/30/2023] [Accepted: 04/18/2023] [Indexed: 05/28/2023] Open
Abstract
Traditional image genetics primarily uses linear models to investigate the relationship between brain image data and genetic data for Alzheimer's disease (AD) and does not take into account the dynamic changes in brain phenotype and connectivity data across time between different brain areas. In this work, we proposed a novel method that combined Deep Subspace reconstruction with Hypergraph-Based Temporally-constrained Group Sparse Canonical Correlation Analysis (DS-HBTGSCCA) to discover the deep association between longitudinal phenotypes and genotypes. The proposed method made full use of dynamic high-order correlation between brain regions. In this method, the deep subspace reconstruction technique was applied to retrieve the nonlinear properties of the original data, and hypergraphs were used to mine the high-order correlation between two types of rebuilt data. The molecular biological analysis of the experimental findings demonstrated that our algorithm was capable of extracting more valuable time series correlation from the real data obtained by the AD neuroimaging program and finding AD biomarkers across multiple time points. Additionally, we used regression analysis to verify the close relationship between the extracted top brain areas and top genes and found the deep subspace reconstruction approach with a multi-layer neural network was helpful in enhancing clustering performance.
Collapse
Affiliation(s)
- Wei Kong
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai 201306, China
| | - Yufang Xu
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai 201306, China
| | - Shuaiqun Wang
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai 201306, China
| | - Kai Wei
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Gen Wen
- Department of Orthopedic Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai 200233, China
| | - Yaling Yu
- Department of Orthopedic Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai 200233, China
- Institute of Microsurgery on Extremities, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai 200233, China
| | - Yuemin Zhu
- CREATIS UMR 5220, U1294, CNRS, Inserm, INSA Lyon, University Lyon, 69621 Lyon, France
| |
Collapse
|
7
|
Zhang X, Hao Y, Zhang J, Ji Y, Zou S, Zhao S, Xie S, Du L. A multi-task SCCA method for brain imaging genetics and its application in neurodegenerative diseases. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 232:107450. [PMID: 36905750 DOI: 10.1016/j.cmpb.2023.107450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Revised: 02/24/2023] [Accepted: 02/24/2023] [Indexed: 06/18/2023]
Abstract
BACKGROUND AND OBJECTIVES In brain imaging genetics, multi-task sparse canonical correlation analysis (MTSCCA) is effective to study the bi-multivariate associations between genetic variations such as single nucleotide polymorphisms (SNPs) and multi-modal imaging quantitative traits (QTs). However, most existing MTSCCA methods are neither supervised nor capable of distinguishing the shared patterns of multi-modal imaging QTs from the specific patterns. METHODS A new diagnosis-guided MTSCCA (DDG-MTSCCA) with parameter decomposition and graph-guided pairwise group lasso penalty was proposed. Specifically, the multi-tasking modeling paradigm enables us to comprehensively identify risk genetic loci by jointly incorporating multi-modal imaging QTs. The regression sub-task was raised to guide the selection of diagnosis-related imaging QTs. To reveal the diverse genetic mechanisms, the parameter decomposition and different constraints were utilized to facilitate the identification of modality-consistent and -specific genotypic variations. Besides, a network constraint was added to find out meaningful brain networks. The proposed method was applied to synthetic data and two real neuroimaging data sets respectively from Alzheimer's disease neuroimaging initiative (ADNI) and Parkinson's progression marker initiative (PPMI) databases. RESULTS Compared with the competitive methods, the proposed method exhibited higher or comparable canonical correlation coefficients (CCCs) and better feature selection results. In particular, in the simulation study, DDG-MTSCCA showed the best anti-noise ability and achieved the highest average hit rate, about 25% higher than MTSCCA. On the real data of Alzheimer's disease (AD) and Parkinson's disease (PD), our method obtained the highest average testing CCCs, about 40% ∼ 50% higher than MTSCCA. Especially, our method could select more comprehensive feature subsets, and the top five SNPs and imaging QTs were all disease-related. The ablation experimental results also demonstrated the significance of each component in the model, i.e., the diagnosis guidance, parameter decomposition, and network constraint. CONCLUSIONS These results on simulated data, ADNI and PPMI cohorts suggested the effectiveness and generalizability of our method in identifying meaningful disease-related markers. DDG-MTSCCA could be a powerful tool in brain imaging genetics, worthy of in-depth study.
Collapse
Affiliation(s)
- Xin Zhang
- Institute of Medical Research, Northwestern Polytechnical University, Xi'an, Shannxi 710072, China
| | - Yipeng Hao
- Institute of Medical Research, Northwestern Polytechnical University, Xi'an, Shannxi 710072, China
| | - Jin Zhang
- School of Automation, Northwestern Polytechnical University, Xi'an, Shannxi 710072, China
| | - Yanuo Ji
- Institute of Medical Research, Northwestern Polytechnical University, Xi'an, Shannxi 710072, China
| | - Shihong Zou
- Institute of Medical Research, Northwestern Polytechnical University, Xi'an, Shannxi 710072, China
| | - Shijie Zhao
- School of Automation, Northwestern Polytechnical University, Xi'an, Shannxi 710072, China
| | - Songyun Xie
- School of Electronics and Information, Northwestern Polytechnical University, Xi'an, Shannxi 710072, China
| | - Lei Du
- School of Automation, Northwestern Polytechnical University, Xi'an, Shannxi 710072, China.
| |
Collapse
|
8
|
Song X, Li R, Wang K, Bai Y, Xiao Y, Wang YP. Joint Sparse Collaborative Regression on Imaging Genetics Study of Schizophrenia. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1137-1146. [PMID: 35503837 PMCID: PMC10321021 DOI: 10.1109/tcbb.2022.3172289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The imaging genetics approach generates large amount of high dimensional and multi-modal data, providing complementary information for comprehensive study of Schizophrenia, a complex mental disease. However, at the same time, the variety of these data in structures, resolutions, and formats makes their integrative study a forbidding task. In this paper, we propose a novel model called Joint Sparse Collaborative Regression (JSCoReg), which can extract class-specific features from different health conditions/disease classes. We first evaluate the performance of feature selection in terms of Receiver operating characteristic curve and the area under the ROC curve in the simulation experiment. We demonstrate that the JSCoReg model can achieve higher accuracy compared with similar models including Joint Sparse Canonical Correlation Analysis and Sparse Collaborative Regression. We then applied the JSCoReg model to the analysis of schizophrenia dataset collected from the Mind Clinical Imaging Consortium. The JSCoReg enables us to better identify biomarkers associated with schizophrenia, which are verified to be both biologically and statistically significant.
Collapse
Affiliation(s)
- Xueli Song
- School of Sciences, Chang’an University, Xi’an, 710064, China
| | - Rongpeng Li
- School of Sciences, Chang’an University, Xi’an, 710064, China
| | - Kaiming Wang
- School of Sciences, Chang’an University, Xi’an, 710064, China
| | - Yuntong Bai
- Biomedical Engineering Department, Tulane University, New Orleans, LA 70118, USA
| | - Yuzhu Xiao
- School of Sciences, Chang’an University, Xi’an, 710064, China
| | - Yu-ping Wang
- Biomedical Engineering Department, Tulane University, New Orleans, LA 70118, USA
| |
Collapse
|
9
|
Chen J, Han G, Xu A, Akutsu T, Cai H. Identifying miRNA-Gene Common and Specific Regulatory Modules for Cancer Subtyping by a High-Order Graph Matching Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:421-431. [PMID: 35320104 DOI: 10.1109/tcbb.2022.3161635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Identifying regulatory modules between miRNAs and genes is crucial in cancer research. It promotes a comprehensive understanding of the molecular mechanisms of cancer. The genomic data collected from subjects usually relate to different cancer statuses, such as different TNM Classifications of Malignant Tumors (TNM) or histological subtypes. Simple integrated analyses generally identify the core of the tumorigenesis (common modules) but miss the subtype-specific regulatory mechanisms (specific modules). In contrast, separate analyses can only report the differences and ignore important common modules. Therefore, there is an urgent need to develop a novel method to jointly analyze miRNA and gene data of different cancer statuses to identify common and specific modules. To that end, we developed a High-Order Graph Matching model to identify Common and Specific modules (HOGMCS) between miRNA and gene data of different cancer statuses. We first demonstrate the superiority of HOGMCS through a comparison with four state-of-the-art techniques using a set of simulated data. Then, we apply HOGMCS on stomach adenocarcinoma data with four TNM stages and two histological types, and breast invasive carcinoma data with four PAM50 subtypes. The experimental results demonstrate that HOGMCS can accurately extract common and subtype-specific miRNA-gene regulatory modules, where many identified miRNA-gene interactions have been confirmed in several public databases.
Collapse
|
10
|
Zhang Y, Zhang H, Xiao L, Bai Y, Calhoun VD, Wang YP. Multi-Modal Imaging Genetics Data Fusion via a Hypergraph-Based Manifold Regularization: Application to Schizophrenia Study. IEEE TRANSACTIONS ON MEDICAL IMAGING 2022; 41:2263-2272. [PMID: 35320094 PMCID: PMC9661879 DOI: 10.1109/tmi.2022.3161828] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Recent studies show that multi-modal data fusion techniques combine information from diverse sources for comprehensive diagnosis and prognosis of complex brain disorder, often resulting in improved accuracy compared to single-modality approaches. However, many existing data fusion methods extract features from homogeneous networs, ignoring heterogeneous structural information among multiple modalities. To this end, we propose a Hypergraph-based Multi-modal data Fusion algorithm, namely HMF. Specifically, we first generate a hypergraph similarity matrix to represent the high-order relationships among subjects, and then enforce the regularization term based upon both the inter- and intra-modality relationships of the subjects. Finally, we apply HMF to integrate imaging and genetics datasets. Validation of the proposed method is performed on both synthetic data and real samples from schizophrenia study. Results show that our algorithm outperforms several competing methods, and reveals significant interactions among risk genes, environmental factors and abnormal brain regions.
Collapse
|
11
|
Peng P, Zhang Y, Ju Y, Wang K, Li G, Calhoun VD, Wang YP. Group Sparse Joint Non-Negative Matrix Factorization on Orthogonal Subspace for Multi-Modal Imaging Genetics Data Analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:479-490. [PMID: 32750856 PMCID: PMC7758677 DOI: 10.1109/tcbb.2020.2999397] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
With the development of multi-model neuroimaging technology and gene detection technology, the efforts of integrating multi-model imaging genetics data to explore the virulence factors of schizophrenia (SZ) are still limited. To address this issue, we propose a novel algorithm called group sparse of joint non-negative matrix factorization on orthogonal subspace (GJNMFO). Our algorithm fuses single nucleotide polymorphism (SNP) data, function magnetic resonance imaging (fMRI) data and epigenetic factors (DNA methylation) by projecting three-model data into a common basis matrix and three different coefficient matrices to identify risk genes, epigenetic factors and abnormal brain regions associated with SZ. Specifically, we introduce orthogonal constraints on the basis matrix to discard unimportant features in the row of coefficient matrices. Since imaging genetics data have rich group information, we draw into group sparse on three coefficient matrices to make the extracted features more accurate. Both the simulated and real Mind Clinical Imaging Consortium (MCIC) datasets are performed to validate our approach. Simulation results show that our algorithm works better than other competing methods. Through the experiments of MCIC datasets, GJNMFO reveals a set of risk genes, epigenetic factors and abnormal brain functional regions, which have been verified to be both statistically and biologically significant.
Collapse
|
12
|
Associating brain imaging phenotypes and genetic in Alzheimer's disease via JSCCA approach with autocorrelation constraints. Med Biol Eng Comput 2021; 60:95-108. [PMID: 34714488 DOI: 10.1007/s11517-021-02439-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2020] [Accepted: 09/02/2021] [Indexed: 10/20/2022]
Abstract
Imaging genetics research can explore the potential correlation between imaging and genomics. Most association analysis methods cannot effectively use the prior knowledge of the original data. In this respect, we add the prior knowledge of each original data to mine more effective biomarkers. The study of imaging genetics based on the sparse canonical correlation analysis (SCCA) is helpful to mine the potential biomarkers of neurological diseases. To improve the performance and interpretability of SCCA, we proposed a penalty method based on the autocorrelation matrix for discovering the possible biological mechanism between single nucleotide polymorphisms (SNP) variations and brain regions changes of Alzheimer's disease (AD). The addition of the penalty allows the proposed algorithm to analyze the correlation between different modal features. The proposed algorithm obtains more biologically interpretable ROIs and SNPs that are significantly related to AD, which has better anti-noise performance. Compared with other SCCA-based algorithms (JCB-SCCA, JSNMNMF), the proposed algorithm can still maintain a stronger correlation with ground truth even when the noise is larger. Then, we put the regions of interest (ROI) selected by the three algorithms into the SVM classifier. The proposed algorithm has higher classification accuracy. Also, we use ridge regression with SNPs selected by three algorithms and four AD risk ROIs. The proposed algorithm has a smaller root mean square error (RMSE). It shows that proposed algorithm has a good ability in association recognition and feature selection. Furthermore, it selects important features more stably, improving the clinical diagnosis of new potential biomarkers.
Collapse
|
13
|
Identifying Biomarkers of Alzheimer's Disease via a Novel Structured Sparse Canonical Correlation Analysis Approach. J Mol Neurosci 2021; 72:323-335. [PMID: 34570360 DOI: 10.1007/s12031-021-01915-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 09/09/2021] [Indexed: 02/05/2023]
Abstract
Using correlation analysis to study the potential connection between brain genetics and imaging has become an effective method to understand neurodegenerative diseases. Sparse canonical correlation analysis (SCCA) makes it possible to study high-dimensional genetic information. The traditional SCCA methods can only process single-modal genetic and image data, which to some extent weaken the close connection of the brain's biological network. In some recently proposed multimodal SCCA methods, due to the limitations of penalty items, the pre-processed data needs to be further filtered to make the dimensions uniform, which may destroy the potential association of data in the same modal. In this research, in order to combine data between different modalities and to ensure that the chain relationship or graph network relationship within the same modality will not be destroyed, the original generalized fused lasso penalty was replaced with the fused pairwise group lasso (FGL) and the graph-guided pairwise group lasso (GGL) based on the method of joint sparse canonical correlation analysis (JSCCA). We used prior knowledge to construct a supervised bivariate learning model and use linear regression to select quantitative traits (QTs) of images that are strongly correlated with the Mini-mental State Examination (MMSE) scores. Compared with FGL-SCCA, the model we constructed obtained a higher gene-ROI correlation coefficient and identified more significant biomarkers, providing a theoretical basis for further understanding the complex pathology of neurodegenerative diseases.
Collapse
|
14
|
Zhang A, Fang J, Hu W, Calhoun VD, Wang YP. A Latent Gaussian Copula Model for Mixed Data Analysis in Brain Imaging Genetics. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1350-1360. [PMID: 31689199 PMCID: PMC7756188 DOI: 10.1109/tcbb.2019.2950904] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Recent advances in imaging genetics make it possible to combine different types of data including medical images like functional magnetic resonance imaging (fMRI) and genetic data like single nucleotide polymorphisms (SNPs) for comprehensive diagnosis of mental disorders. Understanding complex interactions among these heterogeneous data may give rise to a new perspective, while at the same time demand statistical models for their integration. Various graphical models have been proposed for the study of interaction or association networks with continuous, binary, and count data as well as the mixture of them. However, limited efforts have been made for the multinomial case, for instance, SNP data. Our goal is therefore to fill the void by developing a graphical model for the integration of fMRI image and SNP data, which can provide deeper understanding of the unknown neurogenetic mechanism. In this article, we propose a latent Gaussian copula model for mixed data containing multinomial components. We assume that the discrete variable is obtained by discretizing a latent (unobserved) continuous variable and then create a semi-rank based estimator of the graph structure. The simulation results demonstrate that the proposed latent correlation has more steady and accurate performance than several existing methods in detecting graph structure. When applying to a real schizophrenia data consisting of SNP array and fMRI image collected by the Mind Clinical Imaging Consortium (MCIC), the proposed method reveals a set of distinct SNP-brain associations, which are verified to be biologically significant. The proposed model is statistically promising in handling mixed types of data including multinomial components, which can find widespread applications. To promote reproducible research, the R code is available at https://github.com/Aiying0512/LGCM.
Collapse
|
15
|
Wang M, Shao W, Hao X, Shen L, Zhang D. Identify Consistent Cross-Modality Imaging Genetic Patterns via Discriminant Sparse Canonical Correlation Analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1549-1561. [PMID: 31581090 DOI: 10.1109/tcbb.2019.2944825] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Sparse canonical correlation analysis (SCCA) is a bi-multivariate technique used in imaging genetics to identify complex multi-SNP-multi-QT associations. However, the traditional SCCA algorithm has been designed to seek a linear correlation between the SNP genotype and brain imaging phenotype, ignoring the discriminant similarity information between within-class subjects in brain imaging genetics association analysis. In addition, multi-modality brain imaging phenotypes are extracted from different perspectives and imaging markers from the same region consistently showing up in multimodalities may provide more insights for the mechanistic understanding of diseases. In this paper, a novel multi-modality discriminant SCCA algorithm (MD-SCCA) is proposed to overcome these limitations as well as to improve learning results by incorporating valuable discriminant similarity information into the SCCA algorithm. Specifically, we first extract the discriminant similarity information between within-class subjects by the sparse representation. Second, the discriminant similarity information is enforced within SCCA to construct a discriminant SCCA algorithm (D-SCCA). At last, the MD-SCCA algorithm is adopted to fully explore the relationships among different modalities of different subjects. In experiments, both synthetic dataset and real data from the Alzheimer's Disease Neuroimaging Initiative database are used to test the performance of our algorithm. The empirical results have demonstrated that the proposed algorithm not only produces improved cross-validation performances but also identifies consistent cross-modality imaging genetic biomarkers.
Collapse
|
16
|
Wang M, Shao W, Hao X, Zhang D. Identify Complex Imaging Genetic Patterns via Fusion Self-Expressive Network Analysis. IEEE TRANSACTIONS ON MEDICAL IMAGING 2021; 40:1673-1686. [PMID: 33661732 DOI: 10.1109/tmi.2021.3063785] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In the brain imaging genetic studies, it is a challenging task to estimate the association between quantitative traits (QTs) extracted from neuroimaging data and genetic markers such as single-nucleotide polymorphisms (SNPs). Most of the existing association studies are based on the extensions of sparse canonical correlation analysis (SCCA) for the identification of complex bi-multivariate associations, which can take the specific structure and group information into consideration. However, they often take the original data as input without considering its underlying complex multi-subspace structure, which will deteriorate the performance of the following integrative analysis. Accordingly, in this paper, the self-expressive property is exploited for the reconstruction of the original data before the association analysis, which can well describe the similarity structure. Specifically, we first apply the within-class similarity information to construct self-expressive networks by sparse representation. Then, we use the fusion method to iteratively fuse the self-expressive networks from multi-modality brain phenotypes into one network. Finally, we calculate the imaging genetic association based on the fused self-expressive network. We conduct the experiments on both single-modality and multi-modality phenotype data. Related experimental results validate that our method can not only better estimate the potential association between genetic markers and quantitative traits but also identify consistent multi-modality imaging genetic biomarkers to guide the interpretation of Alzheimer's disease.
Collapse
|
17
|
Zhang Y, Xiao L, Zhang G, Cai B, Stephen JM, Wilson TW, Calhoun VD, Wang YP. Multi-Paradigm fMRI Fusion via Sparse Tensor Decomposition in Brain Functional Connectivity Study. IEEE J Biomed Health Inform 2021; 25:1712-1723. [PMID: 32841133 PMCID: PMC7904970 DOI: 10.1109/jbhi.2020.3019421] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Functional magnetic resonance imaging (fMRI) is a powerful technique with the potential to estimate individual variations in behavioral and cognitive traits. Joint learning of multiple datasets can utilize their complementary information so as to improve learning performance, but it also gives rise to the challenge for data fusion to effectively integrate brain patterns elicited by multiple fMRI data. However, most of the current data fusion methods analyze each single dataset separately and further infer the relationship among them, which fail to utilize the multidimensional structure inherent across modalities and may ignore complex but important interactions. To address this issue, we propose a novel sparse tensor decomposition method to integrate multiple task-stimulus (paradigm) fMRI data. Seeing each paradigm fMRI as one modality, our proposed method considers the relationships across subjects and modalities simultaneously. In specific, a third-order tensor is first modeled by using the functional network connectivity (FNC) of subjects in multiple fMRI paradigms. A novel sparse tensor decomposition with the regularization terms is designed to factorize the tensor into a series of rank-one components, which can extract the shared components across modalities as the embedded features. The L2,1-norm regularizer (i.e., group sparsity) is enforced to select a few common features among multiple subjects. Validation of the proposed method is performed on realistic three paradigm fMRI datasets from the Philadelphia Neurodevelopmental Cohort (PNC) study, for the study of the relationship between the FNC and human cognitive abilities. Experimental results show our method outperforms several other competing methods in the prediction of individuals with different cognitive behaviors via the wide range achievement test (WRAT). Furthermore, our method discovers the FNC related to the cognitive behaviors, such as the connectivity associated with the default mode network (DMN) for three paradigms, and the connectivity between DMN and visual (VIS) domains within the emotion task.
Collapse
|
18
|
Du L, Liu F, Liu K, Yao X, Risacher SL, Han J, Saykin AJ, Shen L. Associating Multi-Modal Brain Imaging Phenotypes and Genetic Risk Factors via a Dirty Multi-Task Learning Method. IEEE TRANSACTIONS ON MEDICAL IMAGING 2020; 39:3416-3428. [PMID: 32746095 PMCID: PMC7705646 DOI: 10.1109/tmi.2020.2995510] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Brain imaging genetics becomes more and more important in brain science, which integrates genetic variations and brain structures or functions to study the genetic basis of brain disorders. The multi-modal imaging data collected by different technologies, measuring the same brain distinctly, might carry complementary information. Unfortunately, we do not know the extent to which the phenotypic variance is shared among multiple imaging modalities, which further might trace back to the complex genetic mechanism. In this paper, we propose a novel dirty multi-task sparse canonical correlation analysis (SCCA) to study imaging genetic problems with multi-modal brain imaging quantitative traits (QTs) involved. The proposed method takes advantages of the multi-task learning and parameter decomposition. It can not only identify the shared imaging QTs and genetic loci across multiple modalities, but also identify the modality-specific imaging QTs and genetic loci, exhibiting a flexible capability of identifying complex multi-SNP-multi-QT associations. Using the state-of-the-art multi-view SCCA and multi-task SCCA, the proposed method shows better or comparable canonical correlation coefficients and canonical weights on both synthetic and real neuroimaging genetic data. In addition, the identified modality-consistent biomarkers, as well as the modality-specific biomarkers, provide meaningful and interesting information, demonstrating the dirty multi-task SCCA could be a powerful alternative method in multi-modal brain imaging genetics.
Collapse
Affiliation(s)
- Lei Du
- School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
| | - Fang Liu
- School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
| | - Kefei Liu
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Xiaohui Yao
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Shannon L. Risacher
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Junwei Han
- School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
| | - Andrew J. Saykin
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Li Shen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| |
Collapse
|
19
|
Rodosthenous T, Shahrezaei V, Evangelou M. Integrating multi-OMICS data through sparse canonical correlation analysis for the prediction of complex traits: a comparison study. Bioinformatics 2020; 36:4616-4625. [PMID: 32437529 PMCID: PMC7750936 DOI: 10.1093/bioinformatics/btaa530] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Revised: 04/22/2020] [Accepted: 05/16/2020] [Indexed: 01/08/2023] Open
Abstract
Motivation Recent developments in technology have enabled researchers to collect multiple OMICS datasets for the same individuals. The conventional approach for understanding the relationships between the collected datasets and the complex trait of interest would be through the analysis of each OMIC dataset separately from the rest, or to test for associations between the OMICS datasets. In this work we show that integrating multiple OMICS datasets together, instead of analysing them separately, improves our understanding of their in-between relationships as well as the predictive accuracy for the tested trait. Several approaches have been proposed for the integration of heterogeneous and high-dimensional (p≫n) data, such as OMICS. The sparse variant of canonical correlation analysis (CCA) approach is a promising one that seeks to penalize the canonical variables for producing sparse latent variables while achieving maximal correlation between the datasets. Over the last years, a number of approaches for implementing sparse CCA (sCCA) have been proposed, where they differ on their objective functions, iterative algorithm for obtaining the sparse latent variables and make different assumptions about the original datasets. Results Through a comparative study we have explored the performance of the conventional CCA proposed by Parkhomenko et al., penalized matrix decomposition CCA proposed by Witten and Tibshirani and its extension proposed by Suo et al. The aforementioned methods were modified to allow for different penalty functions. Although sCCA is an unsupervised learning approach for understanding of the in-between relationships, we have twisted the problem as a supervised learning one and investigated how the computed latent variables can be used for predicting complex traits. The approaches were extended to allow for multiple (more than two) datasets where the trait was included as one of the input datasets. Both ways have shown improvement over conventional predictive models that include one or multiple datasets. Availability and implementation https://github.com/theorod93/sCCA. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Vahid Shahrezaei
- Department of Mathematics, Imperial College London, London SW7 2AZ, UK
| | - Marina Evangelou
- Department of Mathematics, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
20
|
Zhuang X, Yang Z, Cordes D. A technical review of canonical correlation analysis for neuroscience applications. Hum Brain Mapp 2020; 41:3807-3833. [PMID: 32592530 PMCID: PMC7416047 DOI: 10.1002/hbm.25090] [Citation(s) in RCA: 90] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Accepted: 05/23/2020] [Indexed: 12/11/2022] Open
Abstract
Collecting comprehensive data sets of the same subject has become a standard in neuroscience research and uncovering multivariate relationships among collected data sets have gained significant attentions in recent years. Canonical correlation analysis (CCA) is one of the powerful multivariate tools to jointly investigate relationships among multiple data sets, which can uncover disease or environmental effects in various modalities simultaneously and characterize changes during development, aging, and disease progressions comprehensively. In the past 10 years, despite an increasing number of studies have utilized CCA in multivariate analysis, simple conventional CCA dominates these applications. Multiple CCA-variant techniques have been proposed to improve the model performance; however, the complicated multivariate formulations and not well-known capabilities have delayed their wide applications. Therefore, in this study, a comprehensive review of CCA and its variant techniques is provided. Detailed technical formulation with analytical and numerical solutions, current applications in neuroscience research, and advantages and limitations of each CCA-related technique are discussed. Finally, a general guideline in how to select the most appropriate CCA-related technique based on the properties of available data sets and particularly targeted neuroscience questions is provided.
Collapse
Affiliation(s)
- Xiaowei Zhuang
- Cleveland Clinic Lou Ruvo Center for Brain HealthLas VegasNevadaUSA
| | - Zhengshi Yang
- Cleveland Clinic Lou Ruvo Center for Brain HealthLas VegasNevadaUSA
| | - Dietmar Cordes
- Cleveland Clinic Lou Ruvo Center for Brain HealthLas VegasNevadaUSA
- University of ColoradoBoulderColoradoUSA
- Department of Brain HealthUniversity of NevadaLas VegasNevadaUSA
| |
Collapse
|
21
|
Lee H, Park BY, Byeon K, Won JH, Kim M, Kim SH, Park H. Multivariate association between brain function and eating disorders using sparse canonical correlation analysis. PLoS One 2020; 15:e0237511. [PMID: 32785278 PMCID: PMC7423138 DOI: 10.1371/journal.pone.0237511] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Accepted: 07/28/2020] [Indexed: 12/26/2022] Open
Abstract
Eating disorder is highly associated with obesity and it is related to brain dysfunction as well. Still, the functional substrates of the brain associated with behavioral traits of eating disorder are underexplored. Existing neuroimaging studies have explored the association between eating disorder and brain function without using all the information provided by the eating disorder related questionnaire but by adopting summary factors. Here, we aimed to investigate the multivariate association between brain function and eating disorder at fine-grained question-level information. Our study is a retrospective secondary analysis that re-analyzed resting-state functional magnetic resonance imaging of 284 participants from the enhanced Nathan Kline Institute-Rockland Sample database. Leveraging sparse canonical correlation analysis, we associated the functional connectivity of all brain regions and all questions in the eating disorder questionnaires. We found that executive- and inhibitory control-related frontoparietal networks showed positive associations with questions of restraint eating, while brain regions involved in the reward system showed negative associations. Notably, inhibitory control-related brain regions showed a positive association with the degree of obesity. Findings were well replicated in the independent validation dataset (n = 34). The results of this study might contribute to a better understanding of brain function with respect to eating disorder.
Collapse
Affiliation(s)
- Hyebin Lee
- Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, Korea
- Center for Neuroscience Imaging Research, Institute for Basic Science (IBS), Suwon, Korea
| | - Bo-yong Park
- McConnell Brain Imaging Centre, Montreal Neurological Institute and Hospital, McGill University, Montreal, Quebec, Canada
| | - Kyoungseob Byeon
- Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, Korea
- Center for Neuroscience Imaging Research, Institute for Basic Science (IBS), Suwon, Korea
| | - Ji Hye Won
- Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, Korea
- Center for Neuroscience Imaging Research, Institute for Basic Science (IBS), Suwon, Korea
| | - Mansu Kim
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Se-Hong Kim
- Department of Family Medicine, St. Vincent's Hospital, College of Medicine, The Catholic University of Korea, Suwon, Korea
| | - Hyunjin Park
- Center for Neuroscience Imaging Research, Institute for Basic Science (IBS), Suwon, Korea
- School of Electronic and Electrical Engineering, Sungkyunkwan University, Suwon, Korea
| |
Collapse
|
22
|
Deng J, Zeng W, Kong W, Shi Y, Mou X, Guo J. Multi-Constrained Joint Non-Negative Matrix Factorization With Application to Imaging Genomic Study of Lung Metastasis in Soft Tissue Sarcomas. IEEE Trans Biomed Eng 2020; 67:2110-2118. [PMID: 31751222 DOI: 10.1109/tbme.2019.2954989] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
OBJECTIVE The study of pathogenic mechanism at the genetic level by imaging genetics methods enables to effectively reveal the association of histopathology and genetics. However, there is a lack of effective and accurate tools to establish association models from macroscopic to microscopic. METHODS The multi-constrained joint non-negative matrix factorization (MCJNMF) was developed for simultaneous integration of genomic data and image data to identify common modules related to disease. Two types of data matrices were projected onto a common feature space, in which heterogeneous variables with large coefficients in the same projected direction form a common module. Meanwhile, the correlation between original data features was integrated by using regularization constraints to improve the biological relevance. Sparsity constraints and orthogonal constraints were performed on decomposition factors to minimize the redundancy between different bases and to reduce algorithm complexity. RESULTS This algorithm was successfully performed on the module identification of lung metastasis in soft tissue sarcomas (STSs) by integrating FDG-PET image and DNA methylation data features. Multilevel analysis on the top extracted modules revealed that these modules were closely related to the lung metastasis. Particularly, several genes with diagnostic potential for lung metastasis can be discovered from high score modules. CONCLUSION This method not only can be applied for the accurate identification of patterns related to pathogenic mechanism of diseases, but also has a significant implication for discovering protein biomarkers. SIGNIFICANCE This method provides avenues for further studies of identifying complex association patterns of diseases according to different types of biological data.
Collapse
|
23
|
Du L, Liu F, Liu K, Yao X, Risacher SL, Han J, Guo L, Saykin AJ, Shen L, for the Alzheimer’s Disease Neuroimaging Initiative. Identifying diagnosis-specific genotype-phenotype associations via joint multitask sparse canonical correlation analysis and classification. Bioinformatics 2020; 36:i371-i379. [PMID: 32657360 PMCID: PMC7355274 DOI: 10.1093/bioinformatics/btaa434] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
MOTIVATION Brain imaging genetics studies the complex associations between genotypic data such as single nucleotide polymorphisms (SNPs) and imaging quantitative traits (QTs). The neurodegenerative disorders usually exhibit the diversity and heterogeneity, originating from which different diagnostic groups might carry distinct imaging QTs, SNPs and their interactions. Sparse canonical correlation analysis (SCCA) is widely used to identify bi-multivariate genotype-phenotype associations. However, most existing SCCA methods are unsupervised, leading to an inability to identify diagnosis-specific genotype-phenotype associations. RESULTS In this article, we propose a new joint multitask learning method, named MT-SCCALR, which absorbs the merits of both SCCA and logistic regression. MT-SCCALR learns genotype-phenotype associations of multiple tasks jointly, with each task focusing on identifying one diagnosis-specific genotype-phenotype pattern. Meanwhile, MT-SCCALR cannot only select relevant SNPs and imaging QTs for each diagnostic group alone, but also allows the selection of those shared by multiple diagnostic groups. We derive an efficient optimization algorithm whose convergence to a local optimum is guaranteed. Compared with two state-of-the-art methods, MT-SCCALR yields better or similar canonical correlation coefficients and classification performances. In addition, it owns much better discriminative canonical weight patterns of great interest than competitors. This demonstrates the power and capability of MTSCCAR in identifying diagnostically heterogeneous genotype-phenotype patterns, which would be helpful to understand the pathophysiology of brain disorders. AVAILABILITY AND IMPLEMENTATION The software is publicly available at https://github.com/dulei323/MTSCCALR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lei Du
- Department of intelligent science and technology, School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
| | - Fang Liu
- Department of intelligent science and technology, School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
| | - Kefei Liu
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Xiaohui Yao
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Shannon L Risacher
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Junwei Han
- Department of intelligent science and technology, School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
| | - Lei Guo
- Department of intelligent science and technology, School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
| | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Li Shen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | | |
Collapse
|
24
|
PENG PENG, JU YONGFENG, ZHANG YIPU, WANG KAIMING, JIANG SUYING, WANG YUPING. Sparse representation and dictionary learning model incorporating group sparsity and incoherence to extract abnormal brain regions associated with schizophrenia. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 8:104396-104406. [PMID: 33747675 PMCID: PMC7971409 DOI: 10.1109/access.2020.2999513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Schizophrenia is a complex mental illness, the mechanism of which is currently unclear. Using sparse representation and dictionary learning (SDL) model to analyze functional magnetic resonance imaging (fMRI) dataset of schizophrenia is currently a popular method for exploring the mechanism of the disease. The SDL method decomposed the fMRI data into a sparse coding matrix X and a dictionary matrix D. However, these traditional methods overlooked group structure information in X and the coherence between the atoms in D. To address this problem, we propose a new SDL model incorporating group sparsity and incoherence, namely GS2ISDL to detect abnormal brain regions. Specifically, GS2ISDL uses the group structure information that defined by AAL anatomical template from fMRI dataset as priori to achieve inter-group sparsity in X. At the same time, L 1 - norm is enforced on X to achieve intra-group sparsity. In addition, our algorithm also imposes incoherent constraint on the dictionary matrix D to reduce the coherence between the atoms in D, which can ensure the uniqueness of X and the discriminability of the atoms. To validate our proposed model GS2ISDL, we compared it with both IK-SVD and SDL algorithm for analyzing fMRI dataset collected by Mind Clinical Imaging Consortium (MCIC). The results show that the accuracy, sensitivity, recall and MCC values of GS2ISDL are 93.75%, 95.23%, 80.50% and 88.19%, respectively, which outperforms both IK-SVD and SDL. The ROIs extracted by GS2ISDL model (such as Precentral gyrus, Hippocampus and Caudate nucleus, etc.) are further verified by the literature review on schizophrenia studies, which have significant biological significance.
Collapse
Affiliation(s)
- PENG PENG
- The school of Electronics and Control Engineering, Chang’an University, Xi’an, Shaanxi, 710049, China
| | - YONGFENG JU
- The school of Electronics and Control Engineering, Chang’an University, Xi’an, Shaanxi, 710049, China
| | - YIPU ZHANG
- The school of Electronics and Control Engineering, Chang’an University, Xi’an, Shaanxi, 710049, China
| | - KAIMING WANG
- The school of Science, Chang’an University, Xi’an, Shaanxi, 710049, China
| | - SUYING JIANG
- The school of Information Engineering, Chang’an University, Xi’an, Shaanxi, 710049, China
| | - YUPING WANG
- Department of Biomedical Engineering, Tulane University, New Orleans, LA, 70118, USA
| |
Collapse
|
25
|
Xiao L, Wang J, Kassani PH, Zhang Y, Bai Y, Stephen JM, Wilson TW, Calhoun VD, Wang YP. Multi-Hypergraph Learning-Based Brain Functional Connectivity Analysis in fMRI Data. IEEE TRANSACTIONS ON MEDICAL IMAGING 2020; 39:1746-1758. [PMID: 31796393 PMCID: PMC7376954 DOI: 10.1109/tmi.2019.2957097] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Recently, a hypergraph constructed from functional magnetic resonance imaging (fMRI) was utilized to explore brain functional connectivity networks (FCNs) for the classification of neurodegenerative diseases. Each edge of a hypergraph (called hyperedge) can connect any number of brain regions-of-interest (ROIs) instead of only two ROIs, and thus characterizes high-order relations among multiple ROIs that cannot be uncovered by a simple graph in the traditional graph based FCN construction methods. Unlike the existing hypergraph based methods where all hyperedges are assumed to have equal weights and only certain topological features are extracted from the hypergraphs, we propose a hypergraph learning based method for FCN construction in this paper. Specifically, we first generate hyperedges from fMRI time series based on sparse representation, then employ hypergraph learning to adaptively learn hyperedge weights, and finally define a hypergraph similarity matrix to represent the FCN. In our proposed method, weighting hyperedges results in better discriminative FCNs across subjects, and the defined hypergraph similarity matrix can better reveal the overall structure of brain network than using those hypergraph topological features. Moreover, we propose a multi-hypergraph learning based method by integrating multi-paradigm fMRI data, where the hyperedge weights associated with each fMRI paradigm are jointly learned and then a unified hypergraph similarity matrix is computed to represent the FCN. We validate the effectiveness of the proposed method on the Philadelphia Neurodevelopmental Cohort dataset for the classification of individuals' learning ability from three paradigms of fMRI data. Experimental results demonstrate that our proposed approach outperforms the traditional graph based methods (i.e., Pearson's correlation and partial correlation with the graphical Lasso) and the existing unweighted hypergraph based methods, which sheds light on how to optimize estimation of FCNs for cognitive and behavioral study.
Collapse
|
26
|
Kim M, Won JH, Hong J, Kwon J, Park H, Shen L. DEEP NETWORK-BASED FEATURE SELECTION FOR IMAGING GENETICS: APPLICATION TO IDENTIFYING BIOMARKERS FOR PARKINSON'S DISEASE. PROCEEDINGS. IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING 2020; 2020. [PMID: 34594479 DOI: 10.1109/isbi45749.2020.9098471] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Imaging genetics is a methodology for discovering associations between imaging and genetic variables. Many studies adopted sparse models such as sparse canonical correlation analysis (SCCA) for imaging genetics. These methods are limited to modeling the linear imaging genetics relationship and cannot capture the non-linear high-level relationship between the explored variables. Deep learning approaches are underexplored in imaging genetics, compared to their great successes in many other biomedical domains such as image segmentation and disease classification. In this work, we proposed a deep learning model to select genetic features that can explain the imaging features well. Our empirical study on simulated and real datasets demonstrated that our method outperformed the widely used SCCA method and was able to select important genetic features in a robust fashion. These promising results indicate our deep learning model has the potential to reveal new biomarkers to improve mechanistic understanding of the studied brain disorders.
Collapse
Affiliation(s)
- Mansu Kim
- Department of Electrical and Computer Engineering, Sungkyunkwan University, Korea.,Center for Neuroscience Imaging Research, Institute for Basic Science, Korea.,Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, USA
| | - Ji Hye Won
- Department of Electrical and Computer Engineering, Sungkyunkwan University, Korea.,Center for Neuroscience Imaging Research, Institute for Basic Science, Korea
| | - Jisu Hong
- Department of Electrical and Computer Engineering, Sungkyunkwan University, Korea.,Center for Neuroscience Imaging Research, Institute for Basic Science, Korea
| | - Junmo Kwon
- Department of Electrical and Computer Engineering, Sungkyunkwan University, Korea.,Center for Neuroscience Imaging Research, Institute for Basic Science, Korea
| | - Hyunjin Park
- Center for Neuroscience Imaging Research, Institute for Basic Science, Korea.,School of Electronic and Electrical Engineering, Sungkyunkwan University, Korea
| | - Li Shen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, USA
| |
Collapse
|
27
|
Zhang Y, Peng P, Ju Y, Li G, Calhoun VD, Wang YP. Canonical Correlation Analysis of Imaging Genetics Data Based on Statistical Independence and Structural Sparsity. IEEE J Biomed Health Inform 2020; 24:2621-2629. [PMID: 32071012 DOI: 10.1109/jbhi.2020.2972581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Current developments of neuroimaging and genetics promote an integrative and compressive study of schizophrenia. However, it is still difficult to explore how gene mutations are related to brain abnormalities due to the high dimension but low sample size of these data. Conventional approaches reduce the dimension of dataset separately and then calculate the correlation, but ignore the effects of the response variables and the structure of data. To improve the identification of risk genes and abnormal brain regions on schizophrenia, in this paper, we propose a novel method called Independence and Structural sparsity Canonical Correlation Analysis (ISCCA). ISCCA combines independent component analysis (ICA) and Canonical Correlation Analysis (CCA) to reduce the collinear effects, which also incorporate graph structure of the data into the model to improve the accuracy of feature selection. The results from simulation studies demonstrate its higher accuracy in discovering correlations compared with other competing methods. Moreover, applying ISCCA to a real imaging genetics dataset collected by Mind Clinical Imaging Consortium (MCIC), a set of distinct gene-ROI interactions are identified, which are verified to be both statistically and biologically significant.
Collapse
|
28
|
Elsheikh SSM, Chimusa ER, Mulder NJ, Crimi A. Genome-Wide Association Study of Brain Connectivity Changes for Alzheimer's Disease. Sci Rep 2020; 10:1433. [PMID: 31996736 PMCID: PMC6989662 DOI: 10.1038/s41598-020-58291-1] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 12/30/2019] [Indexed: 01/09/2023] Open
Abstract
Variations in the human genome have been found to be an essential factor that affects susceptibility to Alzheimer's disease. Genome-wide association studies (GWAS) have identified genetic loci that significantly contribute to the risk of Alzheimers. The availability of genetic data, coupled with brain imaging technologies have opened the door for further discoveries, by using data integration methodologies and new study designs. Although methods have been proposed for integrating image characteristics and genetic information for studying Alzheimers, the measurement of disease is often taken at a single time point, therefore, not allowing the disease progression to be taken into consideration. In longitudinal settings, we analyzed neuroimaging and single nucleotide polymorphism datasets obtained from the Alzheimer's Disease Neuroimaging Initiative for three clinical stages of the disease, including healthy control, early mild cognitive impairment and Alzheimer's disease subjects. We conducted a GWAS regressing the absolute change of global connectivity metrics on the genetic variants, and used the GWAS summary statistics to compute the gene and pathway scores. We observed significant associations between the change in structural brain connectivity defined by tractography and genes, which have previously been reported to biologically manipulate the risk and progression of certain neurodegenerative disorders, including Alzheimer's disease.
Collapse
Affiliation(s)
- Samar S M Elsheikh
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, 7925, South Africa.
| | - Emile R Chimusa
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, 7925, South Africa
| | - Nicola J Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, 7925, South Africa
| | - Alessandro Crimi
- University Hospital of Zürich, Zürich, 8091, Switzerland
- African Institute for Mathematical Sciences, Biriwa, Ghana
| |
Collapse
|
29
|
Kim M, Won JH, Youn J, Park H. Joint-Connectivity-Based Sparse Canonical Correlation Analysis of Imaging Genetics for Detecting Biomarkers of Parkinson's Disease. IEEE TRANSACTIONS ON MEDICAL IMAGING 2020; 39:23-34. [PMID: 31144631 DOI: 10.1109/tmi.2019.2918839] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Imaging genetics is a method used to detect associations between imaging and genetic variables. Some researchers have used sparse canonical correlation analysis (SCCA) for imaging genetics. This study was conducted to improve the efficiency and interpretability of SCCA. We propose a connectivity-based penalty for incorporating biological prior information. Our proposed approach, named joint connectivity-based SCCA (JCB-SCCA), includes the proposed penalty and can handle multi-modal neuroimaging datasets. Different neuroimaging techniques provide distinct information on the brain and have been used to investigate various neurological disorders, including Parkinson's disease (PD). We applied our algorithm to simulated and real imaging genetics datasets for performance evaluation. Our algorithm was able to select important features in a more robust manner compared with other multivariate methods. The algorithm revealed promising features of single-nucleotide polymorphisms and brain regions related to PD by using a real imaging genetic dataset. The proposed imaging genetics model can be used to improve clinical diagnosis in the form of novel potential biomarkers. We hope to apply our algorithm to cohorts such as Alzheimer's patients or healthy subjects to determine the generalizability of our algorithm.
Collapse
|
30
|
Shen L, Thompson PM. Brain Imaging Genomics: Integrated Analysis and Machine Learning. PROCEEDINGS OF THE IEEE. INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS 2020; 108:125-162. [PMID: 31902950 PMCID: PMC6941751 DOI: 10.1109/jproc.2019.2947272] [Citation(s) in RCA: 98] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Brain imaging genomics is an emerging data science field, where integrated analysis of brain imaging and genomics data, often combined with other biomarker, clinical and environmental data, is performed to gain new insights into the phenotypic, genetic and molecular characteristics of the brain as well as their impact on normal and disordered brain function and behavior. It has enormous potential to contribute significantly to biomedical discoveries in brain science. Given the increasingly important role of statistical and machine learning in biomedicine and rapidly growing literature in brain imaging genomics, we provide an up-to-date and comprehensive review of statistical and machine learning methods for brain imaging genomics, as well as a practical discussion on method selection for various biomedical applications.
Collapse
Affiliation(s)
- Li Shen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
| | - Paul M Thompson
- Imaging Genetics Center, Mark & Mary Stevens Institute for Neuroimaging & Informatics, Keck School of Medicine, University of Southern California, Los Angeles, CA 90232, USA
| |
Collapse
|
31
|
Xiao L, Stephen JM, Wilson TW, Calhoun VD, Wang YP. Alternating Diffusion Map Based Fusion of Multimodal Brain Connectivity Networks for IQ Prediction. IEEE Trans Biomed Eng 2019; 66:2140-2151. [PMID: 30507492 PMCID: PMC6541561 DOI: 10.1109/tbme.2018.2884129] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
OBJECTIVE To explain individual differences in development, behavior, and cognition, most previous studies focused on projecting resting-state functional MRI (fMRI) based functional connectivity (FC) data into a low-dimensional space via linear dimensionality reduction techniques, followed by executing analysis operations. However, linear dimensionality analysis techniques may fail to capture the nonlinearity of brain neuroactivity. Moreover, besides resting-state FC, the FC based on task fMRI can be expected to provide complementary information. Motivated by these considerations, we nonlinearly fuse resting-state and task-based FC networks (FCNs) to seek a better representation in this paper. METHODS We propose a framework based on alternating diffusion map (ADM), which extracts geometry-preserving low-dimensional embeddings that successfully parameterize the intrinsic variables driving the phenomenon of interest. Specifically, we first separately build resting-state and task-based FCNs by symmetric positive definite matrices using sparse inverse covariance estimation for each subject, and then utilize the ADM to fuse them in order to extract significant low-dimensional embeddings, which are used as fingerprints to identify individuals. RESULTS The proposed framework is validated on the Philadelphia Neurodevelopmental Cohort data, where we conduct extensive experimental study on resting-state and fractal n-back task fMRI for the classification of intelligence quotient (IQ). The fusion of resting-state and n-back task fMRI by the proposed framework achieves better classification accuracy than any single fMRI, and the proposed framework is shown to outperform several other data fusion methods. CONCLUSION AND SIGNIFICANCE To our knowledge, this paper is the first to demonstrate a successful extension of the ADM to fuse resting-state and task-based fMRI data for accurate prediction of IQ.
Collapse
Affiliation(s)
- Li Xiao
- Department of Biomedical Engineering, Tulane University, New Orleans, LA 70118
| | | | - Tony W. Wilson
- Department of Neurological Sciences, University of Nebraska Medical Center, Omaha, NE 68198
| | - Vince D. Calhoun
- Mind Research Network, Albuquerque, NM 87106. Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, NM 87131
| | - Yu-Ping Wang
- Department of Biomedical Engineering, Tulane University, New Orleans, LA 70118, ()
| |
Collapse
|
32
|
Hu W, Zhang A, Cai B, Calhoun V, Wang YP. Distance canonical correlation analysis with application to an imaging-genetic study. J Med Imaging (Bellingham) 2019; 6:026501. [PMID: 31001569 DOI: 10.1117/1.jmi.6.2.026501] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Accepted: 03/22/2019] [Indexed: 12/15/2022] Open
Abstract
Distance correlation is a measure that can detect both linear and nonlinear associations. However, applying distance correlation to imaging genetic studies often needs multiple testing correction due to the large number of multiple inferences. As a result, the sensitivity of its detection may be low. We propose a new model, distance canonical correlation analysis (DCCA), which overcomes this problem by searching a combination of features with the highest distance correlation. This is achieved by constructing a distance kernel function followed by solving a subsequent optimization problem. The ability to detect both linear and nonlinear associations makes DCCA suitable for analyzing complex multimodal and imaging-genetic associations. When applied to a brain imaging-genetic study from the Philadelphia Neurodevelopmental Cohort (PNC), DCCA detected several mental disorder-related gene pathways and brain networks. Experiments on brain connectivity found that the default mode network had strong nonlinear connections with other brain networks. When applied to the study of age effects, DCCA revealed that the connections of brain networks were relatively weak in younger groups but became stronger at older age stages. It indicates that adolescence is a vital stage for brain development. DCCA thus reveals a number of interesting findings and demonstrates a powerful new approach for analyzing multimodal brain imaging data.
Collapse
Affiliation(s)
- Wenxing Hu
- Tulane University, Department of Biomedical Engineering, New Orleans, Louisiana, United States
| | - Aiying Zhang
- Tulane University, Department of Biomedical Engineering, New Orleans, Louisiana, United States
| | - Biao Cai
- Tulane University, Department of Biomedical Engineering, New Orleans, Louisiana, United States
| | - Vince Calhoun
- University of New Mexico, Mind Research Network and Department of ECE, Albuquerque, New Mexico, United States
| | - Yu-Ping Wang
- Tulane University, Department of Biomedical Engineering, New Orleans, Louisiana, United States
| |
Collapse
|
33
|
Zille P, Calhoun VD, Wang YP. Enforcing Co-Expression Within a Brain-Imaging Genomics Regression Framework. IEEE TRANSACTIONS ON MEDICAL IMAGING 2018; 37:2561-2571. [PMID: 28678703 PMCID: PMC6415768 DOI: 10.1109/tmi.2017.2721301] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Among the challenges arising in brain imaging genetic studies, estimating the potential links between neurological and genetic variability within a population is key. In this paper, we propose a multivariate, multimodal formulation for variable selection that leverages co-expression patterns across various data modalities. Our approach is based on an intuitive combination of two widely used statistical models: sparse regression and canonical correlation analysis (CCA). While the former seeks multivariate linear relationships between a given phenotype and associated observations, the latter searches to extract co-expression patterns between sets of variables belonging to different modalities. In the following, we propose to rely on a "CCA-type" formulation in order to regularize the classical multimodal sparse regression problem (essentially incorporating both CCA and regression models within a unified formulation). The underlying motivation is to extract discriminative variables that are also co-expressed across modalities. We first show that the simplest formulation of such model can be expressed as a special case of collaborative learning methods. After discussing its limitation, we propose an extended, more flexible formulation, and introduce a simple and efficient alternating minimization algorithm to solve the associated optimization problem. We explore the parameter space and provide some guidelines regarding parameter selection. Both the original and extended versions are then compared on a simple toy data set and a more advanced simulated imaging genomics data set in order to illustrate the benefits of the latter. Finally, we validate the proposed formulation using single nucleotide polymorphisms data and functional magnetic resonance imaging data from a population of adolescents ( subjects, age 16.9 ± 1.9 years from the Philadelphia Neurodevelopmental Cohort) for the study of learning ability. Furthermore, we carry out a significance analysis of the resulting features that allow us to carefully extract brain regions and genes linked to learning and cognitive ability.
Collapse
|
34
|
Zille P, Calhoun VD, Stephen JM, Wilson TW, Wang YP. Fused Estimation of Sparse Connectivity Patterns From Rest fMRI-Application to Comparison of Children and Adult Brains. IEEE TRANSACTIONS ON MEDICAL IMAGING 2018; 37:2165-2175. [PMID: 28682248 PMCID: PMC5785555 DOI: 10.1109/tmi.2017.2721640] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
In this paper, we consider the problem of estimating multiple sparse, co-activated brain regions from functional magnetic resonance imaging (fMRI) observations belonging to different classes. More precisely, we propose a method to analyze similarities and differences in functional connectivity between children and young adults. Often, analysis is conducted on each class separately, and differences across classes are identified with an additional postprocessing step using adequate statistical tools. Here, we propose to rely on a generalized fused Lasso penalty, which allows us to make use of the entire data set in order to estimate connectivity patterns that are either shared across classes, or specific to a given group. By using the entire population during the estimation, we hope to increase the power of our analysis. The proposed model falls in the category of population-wise matrix decomposition, and a simple and efficient alternating direction method of multipliers algorithm is introduced to solve the associated optimization problem. After validating our approach on simulated data, experiments are performed on resting-state fMRI imaging from the Philadelphia neurodevelopmental cohort data set, comprised of normally developing children from ages 8 to 21. Developmental differences were observed in various brain regions, as a total of three class-specific resting-state components were identified. Statistical analysis of the estimated subject-specific features, as well as classification results (based on age groups, up to 81% accuracy, samples) related to these components demonstrate that the proposed method is able to properly extract meaningful shared and class-specific sub-networks.
Collapse
|
35
|
Hao X, Li C, Yan J, Yao X, Risacher SL, Saykin AJ, Shen L, Zhang D. Identification of associations between genotypes and longitudinal phenotypes via temporally-constrained group sparse canonical correlation analysis. Bioinformatics 2018; 33:i341-i349. [PMID: 28881979 PMCID: PMC5870577 DOI: 10.1093/bioinformatics/btx245] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Motivation Neuroimaging genetics identifies the relationships between genetic variants (i.e., the single nucleotide polymorphisms) and brain imaging data to reveal the associations from genotypes to phenotypes. So far, most existing machine-learning approaches are widely used to detect the effective associations between genetic variants and brain imaging data at one time-point. However, those associations are based on static phenotypes and ignore the temporal dynamics of the phenotypical changes. The phenotypes across multiple time-points may exhibit temporal patterns that can be used to facilitate the understanding of the degenerative process. In this article, we propose a novel temporally constrained group sparse canonical correlation analysis (TGSCCA) framework to identify genetic associations with longitudinal phenotypic markers. Results The proposed TGSCCA method is able to capture the temporal changes in brain from longitudinal phenotypes by incorporating the fused penalty, which requires that the differences between two consecutive canonical weight vectors from adjacent time-points should be small. A new efficient optimization algorithm is designed to solve the objective function. Furthermore, we demonstrate the effectiveness of our algorithm on both synthetic and real data (i.e., the Alzheimer’s Disease Neuroimaging Initiative cohort, including progressive mild cognitive impairment, stable MCI and Normal Control participants). In comparison with conventional SCCA, our proposed method can achieve strong associations and discover phenotypic biomarkers across multiple time-points to guide disease-progressive interpretation. Availability and implementation The Matlab code is available at https://sourceforge.net/projects/ibrain-cn/files/.
Collapse
Affiliation(s)
- Xiaoke Hao
- School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
| | - Chanxiu Li
- School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
| | - Jingwen Yan
- Department of Radiology and Imaging Sciences, School of Medicine, Indiana University, Indianapolis, IN, USA.,School of Informatics and Computing, Indiana University, Indianapolis, IN, USA
| | - Xiaohui Yao
- Department of Radiology and Imaging Sciences, School of Medicine, Indiana University, Indianapolis, IN, USA.,School of Informatics and Computing, Indiana University, Indianapolis, IN, USA
| | - Shannon L Risacher
- Department of Radiology and Imaging Sciences, School of Medicine, Indiana University, Indianapolis, IN, USA
| | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, School of Medicine, Indiana University, Indianapolis, IN, USA
| | - Li Shen
- Department of Radiology and Imaging Sciences, School of Medicine, Indiana University, Indianapolis, IN, USA.,School of Informatics and Computing, Indiana University, Indianapolis, IN, USA
| | - Daoqiang Zhang
- School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
| | | |
Collapse
|
36
|
Hu W, Lin D, Cao S, Liu J, Chen J, Calhoun VD, Wang YP. Adaptive Sparse Multiple Canonical Correlation Analysis With Application to Imaging (Epi)Genomics Study of Schizophrenia. IEEE Trans Biomed Eng 2018; 65:390-399. [PMID: 29364120 PMCID: PMC5826588 DOI: 10.1109/tbme.2017.2771483] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Finding correlations across multiple data sets in imaging and (epi)genomics is a common challenge. Sparse multiple canonical correlation analysis (SMCCA) is a multivariate model widely used to extract contributing features from each data while maximizing the cross-modality correlation. The model is achieved by using the combination of pairwise covariances between any two data sets. However, the scales of different pairwise covariances could be quite different and the direct combination of pairwise covariances in SMCCA is unfair. The problem of "unfair combination of pairwise covariances" restricts the power of SMCCA for feature selection. In this paper, we propose a novel formulation of SMCCA, called adaptive SMCCA, to overcome the problem by introducing adaptive weights when combining pairwise covariances. Both simulation and real-data analysis show the outperformance of adaptive SMCCA in terms of feature selection over conventional SMCCA and SMCCA with fixed weights. Large-scale numerical experiments show that adaptive SMCCA converges as fast as conventional SMCCA. When applying it to imaging (epi)genetics study of schizophrenia subjects, we can detect significant (epi)genetic variants and brain regions, which are consistent with other existing reports. In addition, several significant brain-development related pathways, e.g., neural tube development, are detected by our model, demonstrating imaging epigenetic association may be overlooked by conventional SMCCA. All these results demonstrate that adaptive SMCCA are well suited for detecting three-way or multiway correlations and thus can find widespread applications in multiple omics and imaging data integration.
Collapse
Affiliation(s)
- Wenxing Hu
- Biomedical Engineering Department, Tulane University, New Orleans, LA 70118, USA
| | - Dongdong Lin
- Mind Research Network and Dept. of ECE, University of New Mexico, Albuquerque, NM, 87106
| | - Shaolong Cao
- Department of Bioinformatics & Computational Biology, UT MD Anderson Cancer Center, Houston, TX
| | - Jingyu Liu
- Mind Research Network and Dept. of ECE, University of New Mexico, Albuquerque, NM, 87106
| | - Jiayu Chen
- Mind Research Network and Dept. of ECE, University of New Mexico, Albuquerque, NM, 87106
| | - Vince D. Calhoun
- Mind Research Network and Dept. of ECE, University of New Mexico, Albuquerque, NM, 87106
| | - Yu-Ping Wang
- Biomedical Engineering Department, Tulane University, New Orleans, LA 70118, USA
| |
Collapse
|
37
|
Fang J, Zhang JG, Deng HW, Wang YP. Joint Detection of Associations between DNA Methylation and Gene Expression from Multiple Cancers. IEEE J Biomed Health Inform 2017; 22:1960-1969. [PMID: 29990049 DOI: 10.1109/jbhi.2017.2784621] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
DNA methylation plays an important role in the development of various cancers mainly through the regulation on gene expression. Hence, the study on the relation between DNA methylation and gene expression is of particular interest to understand cancers. Recently, an increasing number of datasets are available from multiple cancers, which makes it possible to study both the similarity and difference of genomic alterations across multiple tumor types. However, most of the existing pan-cancer analysis methods perform simple aggregations, which may overlook the heterogeneity of the interactions. In this paper, we propose a novel method to jointly detect complex associations between DNA methylation and gene expression levels from multiple cancers. The main idea is to apply joint sparse canonical correlation analysis to detect a small set of methylated sites, which are associated with another set of genes either shared across cancers or specific to a particular group (group-specific) of cancers. These methylated sites and genes form a complex module with strong multivariate correlations. We further introduced a joint sparse precision matrix estimation method to identify driver methylation-gene pairs in the module. These pairs are characterized by significant partial correlations, which may imply high functional impacts and contribute to complementary information to the main step. We apply our method to The Cancer Genome Atlas(TCGA) datasets with 1166 samples from four cancers. The results reveal significant shared and groupspecific interactions between DNA methylation and gene expression levels. To promote reproducible research, the Matlab code is available at https://sites.google.com/site/jianfang86/jointTCGA.
Collapse
|