1
|
Aspects of robust canonical correlation analysis, principal components and association. TEST-SPAIN 2023. [DOI: 10.1007/s11749-023-00846-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
|
2
|
Lee AJ, Ma Y, Yu L, Dawe RJ, McCabe C, Arfanakis K, Mayeux R, Bennett DA, Klein HU, De Jager PL. Multi-region brain transcriptomes uncover two subtypes of aging individuals with differences in Alzheimer risk and the impact of APOEε4. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.25.524961. [PMID: 36747803 PMCID: PMC9900823 DOI: 10.1101/2023.01.25.524961] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The heterogeneity of the older population suggests the existence of subsets of individuals which share certain brain molecular features and respond differently to risk factors for Alzheimer's disease, but this population structure remains poorly defined. Here, we performed an unsupervised clustering of individuals with multi-region brain transcriptomes to assess whether a broader approach, simultaneously considering data from multiple regions involved in cognition would uncover such subsets. We implemented a canonical correlation-based analysis in a Discovery cohort of 459 participants from two longitudinal studies of cognitive aging that have RNA sequence profiles in three brain regions. 690 additional participants that have data in only one or two of these regions were used in the Replication effort. These clustering analyses identified two meta-clusters, MC-1 and MC-2. The two sets of participants differ primarily in their trajectories of cognitive decline, with MC-2 having a delay of 3 years to the median age of incident dementia. This is due, in part, to a greater impact of tau pathology on neuronal chromatin architecture and to broader brain changes including greater loss of white matter integrity in MC-1. Further evidence of biological differences includes a significantly larger impact of APOEε4 risk on cognitive decline in MC-1. These findings suggest that our proposed population structure captures an aspect of the more distributed molecular state of the aging brain that either enhances the effect of risk factors in MC-1 or of protective effects in MC-2. These observations may inform the design of therapeutic development efforts and of trials as both become increasingly more targeted molecularly. One Sentence Summary: There are two types of aging brains, with one being more vulnerable to APOEε4 and subsequent neuronal dysfunction and cognitive loss.
Collapse
|
3
|
Zhang X, Wang Y, Zhu L, Chen H, Li H, Wu L. Robust variable structure discovery based on tilted empirical risk minimization. APPL INTELL 2023. [DOI: 10.1007/s10489-022-04409-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
4
|
Palzer EF, Wendt CH, Bowler RP, Hersh CP, Safo SE, Lock EF. sJIVE: Supervised Joint and Individual Variation Explained. Comput Stat Data Anal 2022; 175:107547. [PMID: 36119152 PMCID: PMC9481062 DOI: 10.1016/j.csda.2022.107547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Analyzing multi-source data, which are multiple views of data on the same subjects, has become increasingly common in molecular biomedical research. Recent methods have sought to uncover underlying structure and relationships within and/or between the data sources, and other methods have sought to build a predictive model for an outcome using all sources. However, existing methods that do both are presently limited because they either (1) only consider data structure shared by all datasets while ignoring structures unique to each source, or (2) they extract underlying structures first without consideration to the outcome. The proposed method, supervised joint and individual variation explained (sJIVE), can simultaneously (1) identify shared (joint) and source-specific (individual) underlying structure and (2) build a linear prediction model for an outcome using these structures. These two components are weighted to compromise between explaining variation in the multi-source data and in the outcome. Simulations show sJIVE to outperform existing methods when large amounts of noise are present in the multi-source data. An application to data from the COPDGene study explores gene expression and proteomic patterns associated with lung function.
Collapse
Affiliation(s)
- Elise F. Palzer
- Division of Biostatistics, University of Minnesota, Minneapolis, 55455, USA
| | - Christine H. Wendt
- Division of Pulmonary, Allergy and Critical Care, University of Minnesota, Minneapolis, 55455, USA
| | - Russell P. Bowler
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, National Jewish Health, Denver, CO, USA
| | - Craig P. Hersh
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Sandra E. Safo
- Division of Biostatistics, University of Minnesota, Minneapolis, 55455, USA
| | - Eric F. Lock
- Division of Biostatistics, University of Minnesota, Minneapolis, 55455, USA
| |
Collapse
|
5
|
Cerón‐Rojas JJ, Crossa J. The statistical theory of linear selection indices from phenotypic to genomic selection. CROP SCIENCE 2022; 62:537-563. [PMID: 35911794 PMCID: PMC9305178 DOI: 10.1002/csc2.20676] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 11/27/2021] [Indexed: 06/15/2023]
Abstract
A linear selection index (LSI) can be a linear combination of phenotypic values, marker scores, and genomic estimated breeding values (GEBVs); phenotypic values and marker scores; or phenotypic values and GEBVs jointly. The main objective of the LSI is to predict the net genetic merit (H), which is a linear combination of unobservable individual traits' breeding values, weighted by the trait economic values; thus, the target of LSI is not a parameter but rather the unobserved random H values. The LSI can be single-stage or multi-stage, where the latter are methods for selecting one or more individual traits available at different times or stages of development in both plants and animals. Likewise, LSIs can be either constrained or unconstrained. A constrained LSI imposes predetermined genetic gain on expected genetic gain per trait and includes the unconstrained LSI as particular cases. The main LSI parameters are the selection response, the expected genetic gain per trait, and its correlation with H. When the population mean is zero, the selection response and expected genetic gain per trait are, respectively, the conditional mean of H and the genotypic values, given the LSI values. The application of LSI theory is rapidly diversifying; however, because LSIs are based on the best linear predictor and on the canonical correlation theory, the LSI theory can be explained in a simple form. We provided a review of the statistical theory of the LSI from phenotypic to genomic selection showing their relationships, advantages, and limitations, which should allow breeders to use the LSI theory confidently in breeding programs.
Collapse
Affiliation(s)
- J. Jesus Cerón‐Rojas
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT)Km 45 Carretera Mexico‐Veracruz, Edo. de MexicoMexico DFCP 52640Mexico
| | - Jose Crossa
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT)Km 45 Carretera Mexico‐Veracruz, Edo. de MexicoMexico DFCP 52640Mexico
| |
Collapse
|
6
|
Ralph AP, Webb R, Moreland NJ, McGregor R, Bosco A, Broadhurst D, Lassmann T, Barnett TC, Benothman R, Yan J, Remenyi B, Bennett J, Wilson N, Mayo M, Pearson G, Kollmann T, Carapetis JR. Searching for a technology-driven acute rheumatic fever test: the START study protocol. BMJ Open 2021; 11:e053720. [PMID: 34526345 PMCID: PMC8444258 DOI: 10.1136/bmjopen-2021-053720] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
INTRODUCTION The absence of a diagnostic test for acute rheumatic fever (ARF) is a major impediment in managing this serious childhood condition. ARF is an autoimmune condition triggered by infection with group A Streptococcus. It is the precursor to rheumatic heart disease (RHD), a leading cause of health inequity and premature mortality for Indigenous peoples of Australia, New Zealand and internationally. METHODS AND ANALYSIS: 'Searching for a Technology-Driven Acute Rheumatic Fever Test' (START) is a biomarker discovery study that aims to detect and test a biomarker signature that distinguishes ARF cases from non-ARF, and use systems biology and serology to better understand ARF pathogenesis. Eligible participants with ARF diagnosed by an expert clinical panel according to the 2015 Revised Jones Criteria, aged 5-30 years, will be recruited from three hospitals in Australia and New Zealand. Age, sex and ethnicity-matched individuals who are healthy or have non-ARF acute diagnoses or RHD, will be recruited as controls. In the discovery cohort, blood samples collected at baseline, and during convalescence in a subset, will be interrogated by comprehensive profiling to generate possible diagnostic biomarker signatures. A biomarker validation cohort will subsequently be used to test promising combinations of biomarkers. By defining the first biomarker signatures able to discriminate between ARF and other clinical conditions, the START study has the potential to transform the approach to ARF diagnosis and RHD prevention. ETHICS AND DISSEMINATION The study has approval from the Northern Territory Department of Health and Menzies School of Health Research ethics committee and the New Zealand Health and Disability Ethics Committee. It will be conducted according to ethical standards for research involving Indigenous Australians and New Zealand Māori and Pacific Peoples. Indigenous investigators and governance groups will provide oversight of study processes and advise on cultural matters.
Collapse
Affiliation(s)
- Anna P Ralph
- Global and Tropical Health, Menzies School of Health Research, Charles Darwin University, Darwin, Northern Territory, Australia
- Royal Darwin Hospital, Darwin, Northern Territory, Australia
| | - Rachel Webb
- KidzFirst Hospital, Counties Manukau District Health Board, Auckland, New Zealand
- Starship Children's Hospital, Auckland, New Zealand
- Department of Paediatrics; Child and Youth Health, University of Auckland, Auckland, New Zealand
| | - Nicole J Moreland
- School of Medical Sciences and Maurice Wilkins Centre, The University of Auckland, Auckland, New Zealand
| | - Reuben McGregor
- School of Medical Sciences and Maurice Wilkins Centre, The University of Auckland, Auckland, New Zealand
| | - Anthony Bosco
- Wesfarmers Centre for Vaccines and Infectious Diseases, Telethon Kids Institute, Perth, Western Australia, Australia
| | - David Broadhurst
- Centre for Integrative Metabolomics and Computational Biology, Edith Cowan University, Perth, Western Australia, Australia
| | - Timo Lassmann
- Wesfarmers Centre for Vaccines and Infectious Diseases, Telethon Kids Institute, Perth, Western Australia, Australia
| | - Timothy C Barnett
- Wesfarmers Centre for Vaccines and Infectious Diseases, Telethon Kids Institute, Perth, Western Australia, Australia
| | - Rym Benothman
- Wesfarmers Centre for Vaccines and Infectious Diseases, Telethon Kids Institute, Perth, Western Australia, Australia
| | - Jennifer Yan
- Global and Tropical Health, Menzies School of Health Research, Charles Darwin University, Darwin, Northern Territory, Australia
- Royal Darwin Hospital, Darwin, Northern Territory, Australia
| | - Bo Remenyi
- Global and Tropical Health, Menzies School of Health Research, Charles Darwin University, Darwin, Northern Territory, Australia
- Royal Darwin Hospital, Darwin, Northern Territory, Australia
| | - Julie Bennett
- Department of Public Health, University of Otago, Wellington, New Zealand
| | - Nigel Wilson
- Starship Children's Hospital, Auckland, New Zealand
| | - Mark Mayo
- Global and Tropical Health, Menzies School of Health Research, Charles Darwin University, Darwin, Northern Territory, Australia
| | - Glenn Pearson
- Wesfarmers Centre for Vaccines and Infectious Diseases, Telethon Kids Institute, Perth, Western Australia, Australia
| | - Tobias Kollmann
- Wesfarmers Centre for Vaccines and Infectious Diseases, Telethon Kids Institute, Perth, Western Australia, Australia
| | - Jonathan R Carapetis
- Wesfarmers Centre for Vaccines and Infectious Diseases, Telethon Kids Institute, Perth, Western Australia, Australia
- Department of Infectious Diseases, Perth Children's Hospital, Perth, Western Australia, Australia
- School of Medicine, University of Western Australia, Perth, Western Australia, Australia
| |
Collapse
|
7
|
Statistical Integration of 'Omics Data Increases Biological Knowledge Extracted from Metabolomics Data: Application to Intestinal Exposure to the Mycotoxin Deoxynivalenol. Metabolites 2021; 11:metabo11060407. [PMID: 34205708 PMCID: PMC8233929 DOI: 10.3390/metabo11060407] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 06/07/2021] [Accepted: 06/15/2021] [Indexed: 12/18/2022] Open
Abstract
The effects of low doses of toxicants are often subtle and information extracted from metabolomic data alone may not always be sufficient. As end products of enzymatic reactions, metabolites represent the final phenotypic expression of an organism and can also reflect gene expression changes caused by this exposure. Therefore, the integration of metabolomic and transcriptomic data could improve the extracted biological knowledge on these toxicants induced disruptions. In the present study, we applied statistical integration tools to metabolomic and transcriptomic data obtained from jejunal explants of pigs exposed to the food contaminant, deoxynivalenol (DON). Canonical correlation analysis (CCA) and self-organizing map (SOM) were compared for the identification of correlated transcriptomic and metabolomic features, and O2-PLS was used to model the relationship between exposure and selected features. The integration of both 'omics data increased the number of discriminant metabolites discovered (39) by about 10 times compared to the analysis of the metabolomic dataset alone (3). Besides the disturbance of energy metabolism previously reported, assessing correlations between both functional levels revealed several other types of damage linked to the intestinal exposure to DON, including the alteration of protein synthesis, oxidative stress, and inflammasome activation. This confirms the added value of integration to enrich the biological knowledge extracted from metabolomics.
Collapse
|
8
|
Hawinkel S, Bijnens L, Cao KAL, Thas O. Model-based joint visualization of multiple compositional omics datasets. NAR Genom Bioinform 2020; 2:lqaa050. [PMID: 33575602 PMCID: PMC7671331 DOI: 10.1093/nargab/lqaa050] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Revised: 05/20/2020] [Accepted: 07/05/2020] [Indexed: 12/26/2022] Open
Abstract
Abstract
The integration of multiple omics datasets measured on the same samples is a challenging task: data come from heterogeneous sources and vary in signal quality. In addition, some omics data are inherently compositional, e.g. sequence count data. Most integrative methods are limited in their ability to handle covariates, missing values, compositional structure and heteroscedasticity. In this article we introduce a flexible model-based approach to data integration to address these current limitations: COMBI. We combine concepts, such as compositional biplots and log-ratio link functions with latent variable models, and propose an attractive visualization through multiplots to improve interpretation. Using real data examples and simulations, we illustrate and compare our method with other data integration techniques. Our algorithm is available in the R-package combi.
Collapse
Affiliation(s)
- Stijn Hawinkel
- Department of Data Analysis and Mathematical Modelling, Ghent University, 9000 Ghent, Belgium
| | - Luc Bijnens
- Quantitative Sciences, Janssen Pharmaceutical companies of Johnson and Johnson, 2340 Beerse, Belgium
- Data Science Institute, I-BioStat, Hasselt University, 3500 Hasselt, Belgium
| | - Kim-Anh Lê Cao
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, 3010 Melbourne, Victoria, Australia
| | - Olivier Thas
- Department of Data Analysis and Mathematical Modelling, Ghent University, 9000 Ghent, Belgium
- Data Science Institute, I-BioStat, Hasselt University, 3500 Hasselt, Belgium
- National Institute for Applied Statistics Research Australia (NIASRA), University of Wollongong, 2500 Wollongong, New South Wales, Australia
| |
Collapse
|
9
|
Polajnar E. Using elastic net restricted kernel canonical correlation analysis for cross-language information retrieval. COMMUN STAT-SIMUL C 2019. [DOI: 10.1080/03610918.2019.1704420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Emil Polajnar
- Faculty of Social Sciences, University of Ljubljana, Ljubljana, Slovenia
| |
Collapse
|
10
|
Skelly DA, Raghupathy N, Robledo RF, Graber JH, Chesler EJ. Reference Trait Analysis Reveals Correlations Between Gene Expression and Quantitative Traits in Disjoint Samples. Genetics 2019; 212:919-929. [PMID: 31113812 PMCID: PMC6614885 DOI: 10.1534/genetics.118.301865] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Accepted: 05/14/2019] [Indexed: 12/21/2022] Open
Abstract
Systems genetic analysis of complex traits involves the integrated analysis of genetic, genomic, and disease-related measures. However, these data are often collected separately across multiple study populations, rendering direct correlation of molecular features to complex traits impossible. Recent transcriptome-wide association studies (TWAS) have harnessed gene expression quantitative trait loci (eQTL) to associate unmeasured gene expression with a complex trait in genotyped individuals, but this approach relies primarily on strong eQTL. We propose a simple and powerful alternative strategy for correlating independently obtained sets of complex traits and molecular features. In contrast to TWAS, our approach gains precision by correlating complex traits through a common set of continuous phenotypes instead of genetic predictors, and can identify transcript-trait correlations for which the regulation is not genetic. In our approach, a set of multiple quantitative "reference" traits is measured across all individuals, while measures of the complex trait of interest and transcriptional profiles are obtained in disjoint subsamples. A conventional multivariate statistical method, canonical correlation analysis, is used to relate the reference traits and traits of interest to identify gene expression correlates. We evaluate power and sample size requirements of this methodology, as well as performance relative to other methods, via extensive simulation and analysis of a behavioral genetics experiment in 258 Diversity Outbred mice involving two independent sets of anxiety-related behaviors and hippocampal gene expression. After splitting the data set and hiding one set of anxiety-related traits in half the samples, we identified transcripts correlated with the hidden traits using the other set of anxiety-related traits and exploiting the highest canonical correlation (R = 0.69) between the trait data sets. We demonstrate that this approach outperforms TWAS in identifying associated transcripts. Together, these results demonstrate the validity, reliability, and power of reference trait analysis for identifying relations between complex traits and their molecular substrates.
Collapse
Affiliation(s)
| | | | | | - Joel H Graber
- The Jackson Laboratory, Bar Harbor, Maine 04609
- MDI Biological Laboratory, Bar Harbor, Maine 04609
| | | |
Collapse
|
11
|
Ye W, Long Y, Ji G, Su Y, Ye P, Fu H, Wu X. Cluster analysis of replicated alternative polyadenylation data using canonical correlation analysis. BMC Genomics 2019; 20:75. [PMID: 30669970 PMCID: PMC6343338 DOI: 10.1186/s12864-019-5433-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2018] [Accepted: 01/03/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Alternative polyadenylation (APA) has emerged as a pervasive mechanism that contributes to the transcriptome complexity and dynamics of gene regulation. The current tsunami of whole genome poly(A) site data from various conditions generated by 3' end sequencing provides a valuable data source for the study of APA-related gene expression. Cluster analysis is a powerful technique for investigating the association structure among genes, however, conventional gene clustering methods are not suitable for APA-related data as they fail to consider the information of poly(A) sites (e.g., location, abundance, number, etc.) within each gene or measure the association among poly(A) sites between two genes. RESULTS Here we proposed a computational framework, named PASCCA, for clustering genes from replicated or unreplicated poly(A) site data using canonical correlation analysis (CCA). PASCCA incorporates multiple layers of gene expression data from both the poly(A) site level and gene level and takes into account the number of replicates and the variability within each experimental group. Moreover, PASCCA characterizes poly(A) sites in various ways including the abundance and relative usage, which can exploit the advantages of 3' end deep sequencing in quantifying APA sites. Using both real and synthetic poly(A) site data sets, the cluster analysis demonstrates that PASCCA outperforms other widely-used distance measures under five performance metrics including connectivity, the Dunn index, average distance, average distance between means, and the biological homogeneity index. We also used PASCCA to infer APA-specific gene modules from recently published poly(A) site data of rice and discovered some distinct functional gene modules. We have made PASCCA an easy-to-use R package for APA-related gene expression analyses, including the characterization of poly(A) sites, quantification of association between genes, and clustering of genes. CONCLUSIONS By providing a better treatment of the noise inherent in repeated measurements and taking into account multiple layers of poly(A) site data, PASCCA could be a general tool for clustering and analyzing APA-specific gene expression data. PASCCA could be used to elucidate the dynamic interplay of genes and their APA sites among various biological conditions from emerging 3' end sequencing data to address the complex biological phenomenon.
Collapse
Affiliation(s)
- Wenbin Ye
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Innovation Center for Cell Biology, Xiamen University, Xiamen, 361005, China
| | - Yuqi Long
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Software Quality Testing Engineering Research Center, China Electronic Product Reliability and Environmental Testing Research Institute, Guangzhou, 510610, China
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Innovation Center for Cell Biology, Xiamen University, Xiamen, 361005, China
| | - Yaru Su
- College of Mathematics and Computer Science, Fuzhou University, Fuzhou, 350116, China
| | - Pengchao Ye
- Department of Automation, Xiamen University, Xiamen, 361005, China
| | - Hongjuan Fu
- Department of Automation, Xiamen University, Xiamen, 361005, China
| | - Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen, 361005, China. .,Innovation Center for Cell Biology, Xiamen University, Xiamen, 361005, China.
| |
Collapse
|
12
|
Liu D, Chu X, Wang H, Dong J, Ge SQ, Zhao ZY, Peng HL, Sun M, Wu LJ, Song MS, Guo XH, Meng Q, Wang YX, Lauc G, Wang W. The changes of immunoglobulin G N-glycosylation in blood lipids and dyslipidaemia. J Transl Med 2018; 16:235. [PMID: 30157878 PMCID: PMC6114873 DOI: 10.1186/s12967-018-1616-2] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 08/23/2018] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Alternative N-glycosylation has significant structural and functional consequences on immunoglobulin G (IgG) and can affect immune responses, acting as a switch between pro- and anti-inflammatory IgG functionality. Studies have demonstrated that IgG N-glycosylation is associated with ageing, body mass index, type 2 diabetes and hypertension. METHODS Herein, we have demonstrated patterns of IgG glycosylation that are associated with blood lipids in a cross-sectional study including 598 Han Chinese aged 20-68 years. The IgG glycome composition was analysed by ultra-performance liquid chromatography. RESULTS Blood lipids were positively correlated with glycan peak GP6, whereas they were negatively correlated with GP18 (P < 0.05/57). The canonical correlation analysis indicated that initial N-glycan structures, including GP4, GP6, GP9-12, GP14, GP17, GP18 and GP23, were significantly correlated with blood lipids, including total cholesterol, total triglycerides (TG) and low-density lipoprotein (r = 0.390, P < 0.001). IgG glycans patterns were able to distinguish patients with dyslipidaemia from the controls, with an area under the curve of 0.692 (95% confidence interval 0.644-0.740). CONCLUSIONS Our findings indicated that a possible association between blood lipids and the observed loss of galactose and sialic acid, as well as the addition of bisecting GlcNAcs, which might be related to the chronic inflammation accompanying with the development and procession of dyslipidaemia.
Collapse
Affiliation(s)
- Di Liu
- Beijing Key Laboratory of Clinical Epidemiology, School of Public Health, Capital Medical University, 10 Youanmen Xitoutiao, Beijing, 100069 China
| | - Xi Chu
- Center for Physical Examination, Xuanwu Hospital, Capital Medical University, Beijing, 100050 China
| | - Hao Wang
- Beijing Key Laboratory of Clinical Epidemiology, School of Public Health, Capital Medical University, 10 Youanmen Xitoutiao, Beijing, 100069 China
- School of Medical Sciences, Edith Cowan University, Perth, WA 6027 Australia
| | - Jing Dong
- Center for Physical Examination, Xuanwu Hospital, Capital Medical University, Beijing, 100050 China
| | - Si-Qi Ge
- Beijing Key Laboratory of Clinical Epidemiology, School of Public Health, Capital Medical University, 10 Youanmen Xitoutiao, Beijing, 100069 China
- School of Medical Sciences, Edith Cowan University, Perth, WA 6027 Australia
| | - Zhong-Yao Zhao
- Beijing Key Laboratory of Clinical Epidemiology, School of Public Health, Capital Medical University, 10 Youanmen Xitoutiao, Beijing, 100069 China
| | - Hong-Li Peng
- Beijing Key Laboratory of Clinical Epidemiology, School of Public Health, Capital Medical University, 10 Youanmen Xitoutiao, Beijing, 100069 China
| | - Ming Sun
- Beijing Key Laboratory of Clinical Epidemiology, School of Public Health, Capital Medical University, 10 Youanmen Xitoutiao, Beijing, 100069 China
| | - Li-Juan Wu
- Beijing Key Laboratory of Clinical Epidemiology, School of Public Health, Capital Medical University, 10 Youanmen Xitoutiao, Beijing, 100069 China
| | - Man-Shu Song
- Beijing Key Laboratory of Clinical Epidemiology, School of Public Health, Capital Medical University, 10 Youanmen Xitoutiao, Beijing, 100069 China
| | - Xiu-Hua Guo
- Beijing Key Laboratory of Clinical Epidemiology, School of Public Health, Capital Medical University, 10 Youanmen Xitoutiao, Beijing, 100069 China
| | - Qun Meng
- Beijing Key Laboratory of Clinical Epidemiology, School of Public Health, Capital Medical University, 10 Youanmen Xitoutiao, Beijing, 100069 China
| | - You-Xin Wang
- Beijing Key Laboratory of Clinical Epidemiology, School of Public Health, Capital Medical University, 10 Youanmen Xitoutiao, Beijing, 100069 China
- School of Medical Sciences, Edith Cowan University, Perth, WA 6027 Australia
| | - Gordan Lauc
- Genos Glycobiology Research Laboratory, 10000 Zagreb, Croatia
- Faculty of Pharmacy and Biochemistry, University of Zagreb, 10000 Zagreb, Croatia
| | - Wei Wang
- Beijing Key Laboratory of Clinical Epidemiology, School of Public Health, Capital Medical University, 10 Youanmen Xitoutiao, Beijing, 100069 China
- School of Medical Sciences, Edith Cowan University, Perth, WA 6027 Australia
| |
Collapse
|
13
|
Leonenko G, Di Florio A, Allardyce J, Forty L, Knott S, Jones L, Gordon‐Smith K, Owen MJ, Jones I, Walters J, Craddock N, O'Donovan MC, Escott‐Price V. A data-driven investigation of relationships between bipolar psychotic symptoms and schizophrenia genome-wide significant genetic loci. Am J Med Genet B Neuropsychiatr Genet 2018; 177:468-475. [PMID: 29671935 PMCID: PMC6001555 DOI: 10.1002/ajmg.b.32635] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Revised: 02/16/2018] [Accepted: 03/27/2018] [Indexed: 11/11/2022]
Abstract
The etiologies of bipolar disorder (BD) and schizophrenia include a large number of common risk alleles, many of which are shared across the disorders. BD is clinically heterogeneous and it has been postulated that the pattern of symptoms is in part determined by the particular risk alleles carried, and in particular, that risk alleles also confer liability to schizophrenia influence psychotic symptoms in those with BD. To investigate links between psychotic symptoms in BD and schizophrenia risk alleles we employed a data-driven approach in a genotyped and deeply phenotyped sample of subjects with BD. We used sparse canonical correlation analysis (sCCA) (Witten, Tibshirani, & Hastie, ) to analyze 30 psychotic symptoms, assessed with the OPerational CRITeria checklist, and 82 independent genome-wide significant single nucleotide polymorphisms (SNPs) identified by the Schizophrenia Working group of the Psychiatric Genomics Consortium for which we had data in our BD sample (3,903 subjects). As a secondary analysis, we applied sCCA to larger groups of SNPs, and also to groups of symptoms defined according to a published factor analyses of schizophrenia. sCCA analysis based on individual psychotic symptoms revealed a significant association (p = .033), with the largest weights attributed to a variant on chromosome 3 (rs11411529), chr3:180594593, build 37) and delusions of influence, bizarre behavior and grandiose delusions. sCCA analysis using the same set of SNPs supported association with the same SNP and the group of symptoms defined "factor 3" (p = .012). A significant association was also observed to the "factor 3" phenotype group when we included a greater number of SNPs that were less stringently associated with schizophrenia; although other SNPs contributed to the significant multivariate association result, the greatest weight remained assigned to rs11411529. Our results suggest that the canonical correlation is a useful tool to explore phenotype-genotype relationships. To the best of our knowledge, this is the first study to apply this approach to complex, polygenic psychiatric traits. The sparse canonical correlation approach offers the potential to include a larger number of fine-grained systematic descriptors, and to include genetic markers associated with other disorders that are genetically correlated with BD.
Collapse
Affiliation(s)
- Ganna Leonenko
- MRC Centre for Neuropsychiatric Genetics and GenomicsCardiff University Institute of Psychological Medicine and Clinical NeurosciencesCardiffUnited Kingdom
| | - Arianna Di Florio
- MRC Centre for Neuropsychiatric Genetics and GenomicsCardiff University Institute of Psychological Medicine and Clinical NeurosciencesCardiffUnited Kingdom
| | - Judith Allardyce
- MRC Centre for Neuropsychiatric Genetics and GenomicsCardiff University Institute of Psychological Medicine and Clinical NeurosciencesCardiffUnited Kingdom
| | - Liz Forty
- MRC Centre for Neuropsychiatric Genetics and GenomicsCardiff University Institute of Psychological Medicine and Clinical NeurosciencesCardiffUnited Kingdom
| | - Sarah Knott
- MRC Centre for Neuropsychiatric Genetics and GenomicsCardiff University Institute of Psychological Medicine and Clinical NeurosciencesCardiffUnited Kingdom
| | - Lisa Jones
- Department of Psychological MedicineUniversity of WorcesterWorcesterUnited Kingdom
| | | | - Michael J. Owen
- MRC Centre for Neuropsychiatric Genetics and GenomicsCardiff University Institute of Psychological Medicine and Clinical NeurosciencesCardiffUnited Kingdom
| | - Ian Jones
- MRC Centre for Neuropsychiatric Genetics and GenomicsCardiff University Institute of Psychological Medicine and Clinical NeurosciencesCardiffUnited Kingdom
| | - James Walters
- MRC Centre for Neuropsychiatric Genetics and GenomicsCardiff University Institute of Psychological Medicine and Clinical NeurosciencesCardiffUnited Kingdom
| | - Nick Craddock
- MRC Centre for Neuropsychiatric Genetics and GenomicsCardiff University Institute of Psychological Medicine and Clinical NeurosciencesCardiffUnited Kingdom
| | - Michael C. O'Donovan
- MRC Centre for Neuropsychiatric Genetics and GenomicsCardiff University Institute of Psychological Medicine and Clinical NeurosciencesCardiffUnited Kingdom
| | - Valentina Escott‐Price
- MRC Centre for Neuropsychiatric Genetics and GenomicsCardiff University Institute of Psychological Medicine and Clinical NeurosciencesCardiffUnited Kingdom
| |
Collapse
|
14
|
|
15
|
Ji G, Lin Q, Long Y, Ye C, Ye W, Wu X. PAcluster: Clustering polyadenylation site data using canonical correlation analysis. J Bioinform Comput Biol 2017; 15:1750018. [PMID: 28874086 DOI: 10.1142/s0219720017500184] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Alternative polyadenylation (APA) is a pervasive mechanism that contributes to gene regulation. Increasing sequenced poly(A) sites are placing new demands for the development of computational methods to investigate APA regulation. Cluster analysis is important to identify groups of co-expressed genes. However, clustering of poly(A) sites has not been extensively studied in APA, where most APA studies failed to consider the distribution, abundance, and variation of APA sites in each gene. Here we constructed a two-layer model based on canonical correlation analysis (CCA) to explore the underlying biological mechanisms in APA regulation. The first layer quantifies the general correlation of APA sites across various conditions between each gene and the second layer identifies genes with statistically significant correlation on their APA patterns to infer APA-specific gene clusters. Using hierarchical clustering, we comprehensively compared our method with four other widely used distance measures based on three performance indexes. Results showed that our method significantly enhanced the clustering performance for both synthetic and real poly(A) site data and could generate clusters with more biological meaning. We have implemented the CCA-based method as a publically available R package called PAcluster, which provides an efficient solution to the clustering of large APA-specific biological dataset.
Collapse
Affiliation(s)
- Guoli Ji
- * Department of Automation, Xiamen University, Xiamen, Fujian, P. R. China
| | - Qianmin Lin
- * Department of Automation, Xiamen University, Xiamen, Fujian, P. R. China
| | - Yuqi Long
- * Department of Automation, Xiamen University, Xiamen, Fujian, P. R. China
| | - Congting Ye
- † College of the Environment and Ecology, Xiamen University, Xiamen, Fujian, P. R. China
| | - Wenbin Ye
- * Department of Automation, Xiamen University, Xiamen, Fujian, P. R. China
| | - Xiaohui Wu
- * Department of Automation, Xiamen University, Xiamen, Fujian, P. R. China
| |
Collapse
|