1
|
Mondal S, Maji P. Multi-Task Learning and Sparse Discriminant Canonical Correlation Analysis for Identification of Diagnosis-Specific Genotype-Phenotype Association. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1390-1402. [PMID: 38587960 DOI: 10.1109/tcbb.2024.3386406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/10/2024]
Abstract
The primary objective of imaging genetics research is to investigate the complex genotype-phenotype association for the disease under study. For example, to understand the impact of genetic variations over the brain functions and structure, the genotypic data such as single nucleotide polymorphism (SNP) is integrated with the phenotypic data such as imaging quantitative traits. The sparse models, based on canonical correlation analysis (CCA), are popular in this area to find the complex bi-multivariate genotype-phenotype association, as the number of features in genotypic and/or phenotypic data is significantly higher as compared to the number of samples. However, the sparse CCA based methods are, in general, unsupervised in nature, and fail to identify the diagnose-specific features those play an important role for the diagnosis and prognosis of the disease under study. In this regard, a new supervised model is proposed to study the complex genotype-phenotype association, by judiciously integrating the merits of CCA, linear discriminant analysis (LDA) and multi-task learning. The proposed model can identify the diagnose-specific as well as the diagnose-consistent features with significantly lower computational complexity. The performance of the proposed method, along with a comparison with the state-of-the-art methods, is evaluated on several synthetic data sets and one real imaging genetics data collected from Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. In the current study, the SNP as genetic data and resting state functional MRI ( fMRI) as imaging data are integrated to find the complex genotype-phenotype association. An important finding is that the proposed method has better correlation value, improved noise resistance and stability, and also has better feature selection ability. All the results illustrate the power and capability of the proposed method to find the diagnostic group-specific imaging genetic association, which may help to understand the neurodegenerative disorder in a more comprehensive way.
Collapse
|
2
|
Martinez-Garcia M, Olmos PM. Handling Ill-Conditioned Omics Data With Deep Probabilistic Models. IEEE J Biomed Health Inform 2023; 27:4601-4610. [PMID: 37224378 DOI: 10.1109/jbhi.2023.3279493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The advent of high-throughput technologies has produced an increase in the dimensionality of omics datasets, which limits the application of machine learning methods due to the great unbalance between the number of observations and features. In this scenario, dimensionality reduction is essential to extract the relevant information within these datasets and project it in a low-dimensional space, and probabilistic latent space models are becoming popular given their capability to capture the underlying structure of the data as well as the uncertainty in the information. This article aims to provide a general classification and dimensionality reduction method based on deep latent space models that tackles two of the main problems that arise in omics datasets: the presence of missing data and the limited number of observations against the number of features. We propose a semi-supervised Bayesian latent space model that infers a low-dimensional embedding driven by the target label: the Deep Bayesian Logistic Regression (DBLR) model. During inference, the model also learns a global vector of weights that allows it to make predictions given the low-dimensional embedding of the observations. Since this kind of dataset is prone to overfitting, we introduce an additional probabilistic regularization method based on the semi-supervised nature of the model. We compared the performance of the DBLR against several state-of-the-art methods for dimensionality reduction, both in synthetic and real datasets with different data types. The proposed model provides more informative low-dimensional representations, outperforms the baseline methods in classification, and can naturally handle missing entries.
Collapse
|
3
|
Markov NT, Lindbergh CA, Staffaroni AM, Perez K, Stevens M, Nguyen K, Murad NF, Fonseca C, Campisi J, Kramer J, Furman D. Age-related brain atrophy is not a homogenous process: Different functional brain networks associate differentially with aging and blood factors. Proc Natl Acad Sci U S A 2022; 119:e2207181119. [PMID: 36459652 PMCID: PMC9894212 DOI: 10.1073/pnas.2207181119] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 10/04/2022] [Indexed: 12/04/2022] Open
Abstract
Aging is characterized by a progressive loss of brain volume at an estimated rate of 5% per decade after age 40. While these morphometric changes, especially those affecting gray matter and atrophy of the temporal lobe, are predictors of cognitive performance, the strong association with aging obscures the potential parallel, but more specific role, of individual subject physiology. Here, we studied a cohort of 554 human subjects who were monitored using structural MRI scans and blood immune protein concentrations. Using machine learning, we derived a cytokine clock (CyClo), which predicted age with good accuracy (Mean Absolute Error = 6 y) based on the expression of a subset of immune proteins. These proteins included, among others, Placenta Growth Factor (PLGF) and Vascular Endothelial Growth Factor (VEGF), both involved in angiogenesis, the chemoattractant vascular cell adhesion molecule 1 (VCAM-1), the canonical inflammatory proteins interleukin-6 (IL-6) and tumor necrosis factor alpha (TNFα), the chemoattractant IP-10 (CXCL10), and eotaxin-1 (CCL11), previously involved in brain disorders. Age, sex, and the CyClo were independently associated with different functionally defined cortical networks in the brain. While age was mostly correlated with changes in the somatomotor system, sex was associated with variability in the frontoparietal, ventral attention, and visual networks. Significant canonical correlation was observed for the CyClo and the default mode, limbic, and dorsal attention networks, indicating that immune circulating proteins preferentially affect brain processes such as focused attention, emotion, memory, response to social stress, internal evaluation, and access to consciousness. Thus, we identified immune biomarkers of brain aging which could be potential therapeutic targets for the prevention of age-related cognitive decline.
Collapse
Affiliation(s)
- Nikola T. Markov
- Buck AI Platform, Buck Institute for Research on Aging, Novato, CA94945
| | - Cutter A. Lindbergh
- Department of Neurology, Memory and Aging Center, University of California San Francisco, Weill Institute for Neurosciences, San Francisco, CA94158
- Department of Psychiatry, University of Connecticut School of Medicine, Farmington, CT06030
| | - Adam M. Staffaroni
- Department of Neurology, Memory and Aging Center, University of California San Francisco, Weill Institute for Neurosciences, San Francisco, CA94158
| | - Kevin Perez
- Buck AI Platform, Buck Institute for Research on Aging, Novato, CA94945
- University of Lausanne, LausanneCH-1015, Switzerland
| | - Michael Stevens
- Buck AI Platform, Buck Institute for Research on Aging, Novato, CA94945
| | - Khiem Nguyen
- Buck AI Platform, Buck Institute for Research on Aging, Novato, CA94945
- Nguyen Tat Thanh Hi-Tech Institute, Nguyen Tat Thanh University, Ho Chi Minh City70000, Vietnam
| | - Natalia F. Murad
- Buck AI Platform, Buck Institute for Research on Aging, Novato, CA94945
| | - Corrina Fonseca
- Department of Neurology, Memory and Aging Center, University of California San Francisco, Weill Institute for Neurosciences, San Francisco, CA94158
| | - Judith Campisi
- Buck AI Platform, Buck Institute for Research on Aging, Novato, CA94945
| | - Joel Kramer
- Department of Neurology, Memory and Aging Center, University of California San Francisco, Weill Institute for Neurosciences, San Francisco, CA94158
| | - David Furman
- Buck AI Platform, Buck Institute for Research on Aging, Novato, CA94945
- Instituto de Investigaciones en Medicina Traslacional, Universidad Austral, Consejo Nacional de Investigaciones Científicas y Técnicas, Pilar1629, Argentina
- Stanford 1000 Immunomes Project, Stanford University School of Medicine, Stanford, CA94305
| |
Collapse
|
4
|
Identifying Biomarkers of Alzheimer's Disease via a Novel Structured Sparse Canonical Correlation Analysis Approach. J Mol Neurosci 2021; 72:323-335. [PMID: 34570360 DOI: 10.1007/s12031-021-01915-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 09/09/2021] [Indexed: 02/05/2023]
Abstract
Using correlation analysis to study the potential connection between brain genetics and imaging has become an effective method to understand neurodegenerative diseases. Sparse canonical correlation analysis (SCCA) makes it possible to study high-dimensional genetic information. The traditional SCCA methods can only process single-modal genetic and image data, which to some extent weaken the close connection of the brain's biological network. In some recently proposed multimodal SCCA methods, due to the limitations of penalty items, the pre-processed data needs to be further filtered to make the dimensions uniform, which may destroy the potential association of data in the same modal. In this research, in order to combine data between different modalities and to ensure that the chain relationship or graph network relationship within the same modality will not be destroyed, the original generalized fused lasso penalty was replaced with the fused pairwise group lasso (FGL) and the graph-guided pairwise group lasso (GGL) based on the method of joint sparse canonical correlation analysis (JSCCA). We used prior knowledge to construct a supervised bivariate learning model and use linear regression to select quantitative traits (QTs) of images that are strongly correlated with the Mini-mental State Examination (MMSE) scores. Compared with FGL-SCCA, the model we constructed obtained a higher gene-ROI correlation coefficient and identified more significant biomarkers, providing a theoretical basis for further understanding the complex pathology of neurodegenerative diseases.
Collapse
|