1
|
Qin X, Hu J, Ma S, Wu M. Estimation of multiple networks with common structures in heterogeneous subgroups. J MULTIVARIATE ANAL 2024; 202:105298. [PMID: 38433779 PMCID: PMC10907012 DOI: 10.1016/j.jmva.2024.105298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2024]
Abstract
Network estimation has been a critical component of high-dimensional data analysis and can provide an understanding of the underlying complex dependence structures. Among the existing studies, Gaussian graphical models have been highly popular. However, they still have limitations due to the homogeneous distribution assumption and the fact that they are only applicable to small-scale data. For example, cancers have various levels of unknown heterogeneity, and biological networks, which include thousands of molecular components, often differ across subgroups while also sharing some commonalities. In this article, we propose a new joint estimation approach for multiple networks with unknown sample heterogeneity, by decomposing the Gaussian graphical model (GGM) into a collection of sparse regression problems. A reparameterization technique and a composite minimax concave penalty are introduced to effectively accommodate the specific and common information across the networks of multiple subgroups, making the proposed estimator significantly advancing from the existing heterogeneity network analysis based on the regularized likelihood of GGM directly and enjoying scale-invariant, tuning-insensitive, and optimization convexity properties. The proposed analysis can be effectively realized using parallel computing. The estimation and selection consistency properties are rigorously established. The proposed approach allows the theoretical studies to focus on independent network estimation only and has the significant advantage of being both theoretically and computationally applicable to large-scale data. Extensive numerical experiments with simulated data and the TCGA breast cancer data demonstrate the prominent performance of the proposed approach in both subgroup and network identifications.
Collapse
Affiliation(s)
- Xing Qin
- School of Statistics and Information, Shanghai University of International Business and Economics, Shanghai, China
| | - Jianhua Hu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| | - Shuangge Ma
- Department of Biostatistics, Yale School of Public Health, New Haven, USA
| | - Mengyun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| |
Collapse
|
2
|
Shutta KH, Balzer LB, Scholtens DM, Balasubramanian R. SpiderLearner: An ensemble approach to Gaussian graphical model estimation. Stat Med 2023; 42:2116-2133. [PMID: 37004994 DOI: 10.1002/sim.9714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 12/10/2022] [Accepted: 03/07/2023] [Indexed: 04/04/2023]
Abstract
Gaussian graphical models (GGMs) are a popular form of network model in which nodes represent features in multivariate normal data and edges reflect conditional dependencies between these features. GGM estimation is an active area of research. Currently available tools for GGM estimation require investigators to make several choices regarding algorithms, scoring criteria, and tuning parameters. An estimated GGM may be highly sensitive to these choices, and the accuracy of each method can vary based on structural characteristics of the network such as topology, degree distribution, and density. Because these characteristics are a priori unknown, it is not straightforward to establish universal guidelines for choosing a GGM estimation method. We address this problem by introducing SpiderLearner, an ensemble method that constructs a consensus network from multiple estimated GGMs. Given a set of candidate methods, SpiderLearner estimates the optimal convex combination of results from each method using a likelihood-based loss function.K $$ K $$ -fold cross-validation is applied in this process, reducing the risk of overfitting. In simulations, SpiderLearner performs better than or comparably to the best candidate methods according to a variety of metrics, including relative Frobenius norm and out-of-sample likelihood. We apply SpiderLearner to publicly available ovarian cancer gene expression data including 2013 participants from 13 diverse studies, demonstrating our tool's potential to identify biomarkers of complex disease. SpiderLearner is implemented as flexible, extensible, open-source code in the R package ensembleGGM at https://github.com/katehoffshutta/ensembleGGM.
Collapse
Affiliation(s)
- Katherine H Shutta
- Department of Biostatistics and Epidemiology, University of Massachusetts-Amherst, Amherst, Massachusetts, USA
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Laura B Balzer
- Division of Biostatistics, University of California-Berkeley, Berkeley, California, USA
| | - Denise M Scholtens
- Division of Biostatistics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Raji Balasubramanian
- Department of Biostatistics and Epidemiology, University of Massachusetts-Amherst, Amherst, Massachusetts, USA
| |
Collapse
|
3
|
Chen Y, Zhang XF, Ou-Yang L. Inferring cancer common and specific gene networks via multi-layer joint graphical model. Comput Struct Biotechnol J 2023; 21:974-990. [PMID: 36733706 PMCID: PMC9873583 DOI: 10.1016/j.csbj.2023.01.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 01/08/2023] [Accepted: 01/14/2023] [Indexed: 01/19/2023] Open
Abstract
Cancer is a complex disease caused primarily by genetic variants. Reconstructing gene networks within tumors is essential for understanding the functional regulatory mechanisms of carcinogenesis. Advances in high-throughput sequencing technologies have provided tremendous opportunities for inferring gene networks via computational approaches. However, due to the heterogeneity of the same cancer type and the similarities between different cancer types, it remains a challenge to systematically investigate the commonalities and specificities between gene networks of different cancer types, which is a crucial step towards precision cancer diagnosis and treatment. In this study, we propose a new sparse regularized multi-layer decomposition graphical model to jointly estimate the gene networks of multiple cancer types. Our model can handle various types of gene expression data and decomposes each cancer-type-specific network into three components, i.e., globally shared, partially shared and cancer-type-unique components. By identifying the globally and partially shared gene network components, our model can explore the heterogeneous similarities between different cancer types, and our identified cancer-type-unique components can help to reveal the regulatory mechanisms unique to each cancer type. Extensive experiments on synthetic data illustrate the effectiveness of our model in joint estimation of multiple gene networks. We also apply our model to two real data sets to infer the gene networks of multiple cancer subtypes or cell lines. By analyzing our estimated globally shared, partially shared, and cancer-type-unique components, we identified a number of important genes associated with common and specific regulatory mechanisms across different cancer types.
Collapse
Affiliation(s)
- Yuanxiao Chen
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), Shenzhen University, Shenzhen, China
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, China
| | - Le Ou-Yang
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), Shenzhen University, Shenzhen, China,Corresponding author.
| |
Collapse
|
4
|
Shutta KH, De Vito R, Scholtens DM, Balasubramanian R. Gaussian graphical models with applications to omics analyses. Stat Med 2022; 41:5150-5187. [PMID: 36161666 PMCID: PMC9672860 DOI: 10.1002/sim.9546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2021] [Revised: 06/06/2022] [Accepted: 07/21/2022] [Indexed: 11/06/2022]
Abstract
Gaussian graphical models (GGMs) provide a framework for modeling conditional dependencies in multivariate data. In this tutorial, we provide an overview of GGM theory and a demonstration of various GGM tools in R. The mathematical foundations of GGMs are introduced with the goal of enabling the researcher to draw practical conclusions by interpreting model results. Background literature is presented, emphasizing methods recently developed for high-dimensional applications such as genomics, proteomics, or metabolomics. The application of these methods is illustrated using a publicly available dataset of gene expression profiles from 578 participants with ovarian cancer in The Cancer Genome Atlas. Stand-alone code for the demonstration is available as an RMarkdown file at https://github.com/katehoffshutta/ggmTutorial.
Collapse
Affiliation(s)
- Katherine H. Shutta
- Department of Biostatistics and Epidemiology, University of Massachusetts - Amherst, Amherst, Massachusetts, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Roberta De Vito
- Department of Biostatistics and Data Science Initiative, Brown University, Providence, Rhode Island, USA
| | - Denise M. Scholtens
- Division of Biostatistics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Raji Balasubramanian
- Department of Biostatistics and Epidemiology, University of Massachusetts - Amherst, Amherst, Massachusetts, USA
| |
Collapse
|
5
|
Jahanmiri R, Djafarian K, Janbozorgi N, Dehghani-Firouzabadi F, Shab-Bidar S. Saturated fats network identified using Gaussian graphical models is associated with metabolic syndrome in a sample of Iranian adults. Diabetol Metab Syndr 2022; 14:123. [PMID: 36028917 PMCID: PMC9419308 DOI: 10.1186/s13098-022-00894-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 08/19/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gaussian graphical models (GGM) are an innovative method for deriving dietary networks which reflect dietary intake patterns and demonstrate how food groups are consuming in relation to each other, independently. The aim of this study was to derive dietary networks and assess their association with metabolic syndrome in a sample of the Iranian population. METHODS In this cross-sectional study, 850 apparently healthy adults were selected from referral health care centers. 168 food items food frequency questionnaire was used to assess dietary intakes. Food networks were driven by applying GGM to 40 food groups. Metabolic syndrome was defined based on the guidelines of the National Cholesterol Education Program Adult Treatment Panel III (ATP III). RESULTS Three GGM networks were identified: healthy, unhealthy and saturated fats. Results showed that adherence to saturated fats networks with the centrality of butter, was associated with higher odds of having metabolic syndrome after adjusting for potential confounders (OR = 1.81, 95% CI 1.61-2.82; P trend = 0.009) and higher odds of having hyperglycemia (P trend = 0.04). No significant association was observed between healthy and unhealthy dietary networks with metabolic syndrome, hypertension, hypertriglyceridemia and central obesity. Furthermore, metabolic syndrome components were not related to the identified networks. CONCLUSION Our findings suggested that greater adherence to the saturated fats network is associated with higher odds of having metabolic syndrome in Iranians. These findings highlight the effect of dietary intake patterns with metabolic syndrome.
Collapse
Affiliation(s)
- Reihaneh Jahanmiri
- Department of Community Nutrition, School of Nutritional Sciences and Dietetics, Tehran University of Medical Sciences (TUMS), No 44, Hojjat-dost Alley, Naderi St., Keshavarz Blvd, Tehran, Iran
| | - Kurosh Djafarian
- Department of Clinical Nutrition, School of Nutritional Sciences and Dietetics, Tehran University of Medical Sciences, Tehran, Iran
| | - Nasim Janbozorgi
- Department of Community Nutrition, School of Nutritional Sciences and Dietetics, Tehran University of Medical Sciences (TUMS), No 44, Hojjat-dost Alley, Naderi St., Keshavarz Blvd, Tehran, Iran
| | - Fatemeh Dehghani-Firouzabadi
- Department of Community Nutrition, School of Nutritional Sciences and Dietetics, Tehran University of Medical Sciences (TUMS), No 44, Hojjat-dost Alley, Naderi St., Keshavarz Blvd, Tehran, Iran
| | - Sakineh Shab-Bidar
- Department of Community Nutrition, School of Nutritional Sciences and Dietetics, Tehran University of Medical Sciences (TUMS), No 44, Hojjat-dost Alley, Naderi St., Keshavarz Blvd, Tehran, Iran.
| |
Collapse
|
6
|
Bernal V, Bischoff R, Horvatovich P, Guryev V, Grzegorczyk M. The 'un-shrunk' partial correlation in Gaussian graphical models. BMC Bioinformatics 2021; 22:424. [PMID: 34493207 PMCID: PMC8424921 DOI: 10.1186/s12859-021-04313-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 08/02/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In systems biology, it is important to reconstruct regulatory networks from quantitative molecular profiles. Gaussian graphical models (GGMs) are one of the most popular methods to this end. A GGM consists of nodes (representing the transcripts, metabolites or proteins) inter-connected by edges (reflecting their partial correlations). Learning the edges from quantitative molecular profiles is statistically challenging, as there are usually fewer samples than nodes ('high dimensional problem'). Shrinkage methods address this issue by learning a regularized GGM. However, it remains open to study how the shrinkage affects the final result and its interpretation. RESULTS We show that the shrinkage biases the partial correlation in a non-linear way. This bias does not only change the magnitudes of the partial correlations but also affects their order. Furthermore, it makes networks obtained from different experiments incomparable and hinders their biological interpretation. We propose a method, referred to as 'un-shrinking' the partial correlation, which corrects for this non-linear bias. Unlike traditional methods, which use a fixed shrinkage value, the new approach provides partial correlations that are closer to the actual (population) values and that are easier to interpret. This is demonstrated on two gene expression datasets from Escherichia coli and Mus musculus. CONCLUSIONS GGMs are popular undirected graphical models based on partial correlations. The application of GGMs to reconstruct regulatory networks is commonly performed using shrinkage to overcome the 'high-dimensional problem'. Besides it advantages, we have identified that the shrinkage introduces a non-linear bias in the partial correlations. Ignoring this type of effects caused by the shrinkage can obscure the interpretation of the network, and impede the validation of earlier reported results.
Collapse
Affiliation(s)
- Victor Bernal
- Bernoulli Institute, University of Groningen, Groningen, 9747 AG, The Netherlands.,Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy, University of Groningen, Groningen, 9713 AV, The Netherlands
| | - Rainer Bischoff
- Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy, University of Groningen, Groningen, 9713 AV, The Netherlands
| | - Peter Horvatovich
- Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy, University of Groningen, Groningen, 9713 AV, The Netherlands.
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, 9713 AV, The Netherlands.
| | - Marco Grzegorczyk
- Bernoulli Institute, University of Groningen, Groningen, 9747 AG, The Netherlands.
| |
Collapse
|
7
|
Gunathilake M, Lee JH, Choi IJ, Kim YI, Kim JS. Effect of the Interaction between Dietary Patterns and the Gastric Microbiome on the Risk of Gastric Cancer. Nutrients 2021; 13:2692. [PMID: 34444852 PMCID: PMC8401549 DOI: 10.3390/nu13082692] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Revised: 07/23/2021] [Accepted: 07/30/2021] [Indexed: 12/13/2022] Open
Abstract
We aimed to observe the combined effects of Gaussian graphical model (GGM)-derived dietary patterns and the gastric microbiome on the risk of gastric cancer (GC) in a Korean population. The study included 268 patients with GC and 288 healthy controls. Food intake was assessed using a 106-item semiquantitative food frequency questionnaire. GGMs were applied to derive dietary pattern networks. 16S rRNA gene sequencing was performed using DNA extracted from gastric biopsy samples. The fruit pattern network was inversely associated with the risk of GC for the highest vs. lowest tertiles in the total population (odds ratio (OR): 0.47; 95% confidence interval (CI): 0.28-0.77; p for trend = 0.003) and in females (OR: 0.38; 95% CI: 0.17-0.83; p for trend = 0.021). Males who had a low microbial dysbiosis index (MDI) and high vegetable and seafood pattern score showed a significantly reduced risk of GC (OR: 0.44; 95% CI: 0.22-0.91; p-interaction = 0.021). Females who had a low MDI and high dairy pattern score showed a significantly reduced risk of GC (OR: 0.23; 95% CI: 0.07-0.76; p-interaction = 0.018). Our novel findings revealed that vegetable and seafood pattern might interact with dysbiosis to attenuate the risk of GC in males, whereas the dairy pattern might interact with dysbiosis to reduce the GC risk in females.
Collapse
Affiliation(s)
- Madhawa Gunathilake
- Department of Cancer Biomedical Science, Graduate School of Cancer Science and Policy, Goyang-si 10408, Gyeonggi-do, Korea; (M.G.); (J.-H.L.)
| | - Jeong-Hee Lee
- Department of Cancer Biomedical Science, Graduate School of Cancer Science and Policy, Goyang-si 10408, Gyeonggi-do, Korea; (M.G.); (J.-H.L.)
| | - Il-Ju Choi
- Center for Gastric Cancer, National Cancer Center Hospital, National Cancer Center, Goyang-si 10408, Gyeonggi-do, Korea; (I.-J.C.); (Y.-I.K.)
| | - Young-Il Kim
- Center for Gastric Cancer, National Cancer Center Hospital, National Cancer Center, Goyang-si 10408, Gyeonggi-do, Korea; (I.-J.C.); (Y.-I.K.)
| | - Jeong-Seon Kim
- Department of Cancer Biomedical Science, Graduate School of Cancer Science and Policy, Goyang-si 10408, Gyeonggi-do, Korea; (M.G.); (J.-H.L.)
| |
Collapse
|
8
|
Schwedhelm C, Lipsky LM, Shearrer GE, Betts GM, Liu A, Iqbal K, Faith MS, Nansel TR. Using food network analysis to understand meal patterns in pregnant women with high and low diet quality. Int J Behav Nutr Phys Act 2021; 18:101. [PMID: 34301273 PMCID: PMC8306349 DOI: 10.1186/s12966-021-01172-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 07/13/2021] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Little is known about how meal-specific food intake contributes to overall diet quality during pregnancy, which is related to numerous maternal and child health outcomes. Food networks are probabilistic graphs using partial correlations to identify relationships among food groups in dietary intake data, and can be analyzed at the meal level. This study investigated food networks across meals in pregnant women and explored differences by overall diet quality classification. METHODS Women were asked to complete three 24-h dietary recalls throughout pregnancy (n = 365) within a prospective cohort study in the US. Pregnancy diet quality was evaluated using the Healthy Eating Index-2015 (HEI, range 0-100), calculated across pregnancy. Networks from 40 food groups were derived for women in the highest and lowest HEI tertiles at each participant-labeled meal (i.e., breakfast, lunch, dinner, snacks) using Gaussian graphical models. Network composition was qualitatively compared across meals and between HEI tertiles. RESULTS In both HEI tertiles, breakfast food combinations comprised ready-to-eat cereals with milk, quick breads with sweets (e.g., pancakes with syrup), and bread with cheese and meat. Vegetables were consumed at breakfast among women in the high HEI tertile only. Combinations at lunch and dinner were more varied, including vegetables with oils (e.g., salads) in the high tertile and sugary foods with nuts, fruits, and milk in the low tertile at lunch; and cooked grains with fats (e.g., pasta with oil) in the high tertile and potatoes with vegetables and meat in the low tertile at dinner. Fried potatoes, sugar-sweetened beverages, and sandwiches were consumed together at all main meals in the low tertile only. Foods were consumed individually at snacks in both tertiles; the most commonly consumed food were fruits in the high HEI tertile and cakes & cookies in the low tertile. CONCLUSIONS In this cohort of pregnant women, food network analysis indicated that food combinations differed by meal and between HEI tertiles. Meal-specific patterns that differed between diet quality tertiles suggest potential targets to improve food choices at meals; the impact of meal-based dietary modifications on intake of correlated foods and on overall diet quality should be investigated in simulations and intervention studies. TRIAL REGISTRATION PEAS was registered with number NCT02217462 in Clinicaltrials.gov on August 13, 2014.
Collapse
Affiliation(s)
- Carolina Schwedhelm
- Social and Behavioral Sciences Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA.
- Present address: Max-Delbrueck-Center for Molecular Medicine in the Helmholtz Association (MDC), Molecular Epidemiology Research Group, Robert-Rössle-Straße 10, 13125, Berlin, Germany.
| | - Leah M Lipsky
- Social and Behavioral Sciences Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | - Grace E Shearrer
- Department of Nutrition, Gillings School of Global Public Health, University of North Carolina Chapel Hill, Chapel Hill, NC, USA
| | - Grace M Betts
- Social and Behavioral Sciences Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | - Aiyi Liu
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | - Khalid Iqbal
- Department of Human Nutrition, Institute of Basic Medical Sciences, Khyber Medical University, Peshawar, Pakistan
| | - Myles S Faith
- Department of Counseling, School and Educational Psychology, University at Buffalo Graduate School of Education, Buffalo, NY, USA
| | - Tonja R Nansel
- Social and Behavioral Sciences Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
9
|
Abstract
Background: In spite of increasing evidence highlighting the role of dynamic functional connectivity (FC) in characterizing mental disorders, there is a lack of (a) reliable statistical methods to compute dynamic connectivity and (b) rigorous dynamic FC-based approaches for predicting mental health outcomes in heterogeneous disorders such as post-traumatic stress disorder (PTSD). Methods: In one of the first such efforts, we develop a reliable and accurate approach for estimating dynamic FC guided by brain structural connectivity (SC) computed using diffusion tensor imaging data and investigate the potential of the proposed multimodal dynamic FC to predict continuous mental health outcomes. We develop concrete measures of temporal network variability that are predictive of PTSD resilience, and identify regions whose temporal connectivity fluctuations are significantly related to resilience. Results: Our results illustrate that the multimodal approach is more sensitive to connectivity change points, it can clearly detect localized brain regions with the dynamic network features such as small-worldedness, clustering coefficients, and efficiency associated with resilience, and that it has superior predictive performance compared with existing static and dynamic network models when modeling PTSD resilience. Discussion: While the majority of resting-state network modeling in psychiatry has focused on static FC, our novel multimodal dynamic network analyses that are sensitive to network fluctuations allowed us to provide a model of neural correlates of resilience with high accuracy compared with existing static connectivity approaches or those that do not use brain SC information, and provided us with an expanded understanding of the neurobiological causes for PTSD. Impact statement The methods developed in this article provide reliable and accurate dynamic functional connectivity (FC) approaches by fusing multimodal imaging data that are highly predictive of continuous clinical phenotypes in heterogeneous mental disorders. Currently, there is very little theoretical work to explain how network dynamics might contribute to individual differences in behavior or psychiatric symptoms. Our analysis conclusively discovers localized brain resting-state networks, regions, and connections where variations in dynamic FC (that is estimated after incorporating brain structural connectivity information) are associated with post-traumatic stress disorder resilience, which could potentially provide valuable tools for the development of neural circuit modeling in psychiatry in the future.
Collapse
Affiliation(s)
- Suprateek Kundu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, USA
| | - Jin Ming
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, USA
| | - Jennifer Stevens
- Department of Psychiatry and Behavioral Sciences, Emory University, Atlanta, Georgia, USA
| |
Collapse
|
10
|
Dilernia A, Quevedo K, Camchong J, Lim K, Pan W, Zhang L. Penalized model-based clustering of fMRI data. Biostatistics 2021; 23:825-843. [PMID: 33527998 DOI: 10.1093/biostatistics/kxaa061] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 12/21/2020] [Indexed: 11/14/2022] Open
Abstract
Functional magnetic resonance imaging (fMRI) data have become increasingly available and are useful for describing functional connectivity (FC), the relatedness of neuronal activity in regions of the brain. This FC of the brain provides insight into certain neurodegenerative diseases and psychiatric disorders, and thus is of clinical importance. To help inform physicians regarding patient diagnoses, unsupervised clustering of subjects based on FC is desired, allowing the data to inform us of groupings of patients based on shared features of connectivity. Since heterogeneity in FC is present even between patients within the same group, it is important to allow subject-level differences in connectivity, while still pooling information across patients within each group to describe group-level FC. To this end, we propose a random covariance clustering model (RCCM) to concurrently cluster subjects based on their FC networks, estimate the unique FC networks of each subject, and to infer shared network features. Although current methods exist for estimating FC or clustering subjects using fMRI data, our novel contribution is to cluster or group subjects based on similar FC of the brain while simultaneously providing group- and subject-level FC network estimates. The competitive performance of RCCM relative to other methods is demonstrated through simulations in various settings, achieving both improved clustering of subjects and estimation of FC networks. Utility of the proposed method is demonstrated with application to a resting-state fMRI data set collected on 43 healthy controls and 61 participants diagnosed with schizophrenia.
Collapse
Affiliation(s)
- Andrew Dilernia
- Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA
| | - Karina Quevedo
- Department of Psychiatry, University of Minnesota, Minneapolis, MN, USA
| | - Jazmin Camchong
- Department of Psychiatry, University of Minnesota, Minneapolis, MN, USA
| | - Kelvin Lim
- Department of Psychiatry, University of Minnesota, Minneapolis, MN, USA
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA
| | - Lin Zhang
- Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
11
|
Kaiser EE, Poythress JC, Scheulin KM, Jurgielewicz BJ, Lazar NA, Park C, Stice SL, Ahn J, West FD. An integrative multivariate approach for predicting functional recovery using magnetic resonance imaging parameters in a translational pig ischemic stroke model. Neural Regen Res 2021; 16:842-850. [PMID: 33229718 PMCID: PMC8178783 DOI: 10.4103/1673-5374.297079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Magnetic resonance imaging (MRI) is a clinically relevant, real-time imaging modality that is frequently utilized to assess stroke type and severity. However, specific MRI biomarkers that can be used to predict long-term functional recovery are still a critical need. Consequently, the present study sought to examine the prognostic value of commonly utilized MRI parameters to predict functional outcomes in a porcine model of ischemic stroke. Stroke was induced via permanent middle cerebral artery occlusion. At 24 hours post-stroke, MRI analysis revealed focal ischemic lesions, decreased diffusivity, hemispheric swelling, and white matter degradation. Functional deficits including behavioral abnormalities in open field and novel object exploration as well as spatiotemporal gait impairments were observed at 4 weeks post-stroke. Gaussian graphical models identified specific MRI outputs and functional recovery variables, including white matter integrity and gait performance, that exhibited strong conditional dependencies. Canonical correlation analysis revealed a prognostic relationship between lesion volume and white matter integrity and novel object exploration and gait performance. Consequently, these analyses may also have the potential of predicting patient recovery at chronic time points as pigs and humans share many anatomical similarities (e.g., white matter composition) that have proven to be critical in ischemic stroke pathophysiology. The study was approved by the University of Georgia (UGA) Institutional Animal Care and Use Committee (IACUC; Protocol Number: A2014-07-021-Y3-A11 and 2018-01-029-Y1-A5) on November 22, 2017.
Collapse
Affiliation(s)
- Erin E Kaiser
- Regenerative Bioscience Center; Neuroscience, Biomedical and Health Sciences Institute; Department of Animal and Dairy Science, College of Agricultural and Environmental Sciences, University of Georgia, Athens, GA, USA
| | - J C Poythress
- Department of Statistics, Franklin College of Arts and Sciences, University of Georgia, Athens, GA, USA
| | - Kelly M Scheulin
- Regenerative Bioscience Center; Neuroscience, Biomedical and Health Sciences Institute; Department of Animal and Dairy Science, College of Agricultural and Environmental Sciences, University of Georgia, Athens, GA, USA
| | - Brian J Jurgielewicz
- Regenerative Bioscience Center; Neuroscience, Biomedical and Health Sciences Institute; Department of Animal and Dairy Science, College of Agricultural and Environmental Sciences, University of Georgia, Athens, GA, USA
| | - Nicole A Lazar
- Department of Statistics, Franklin College of Arts and Sciences, University of Georgia, Athens, GA, USA
| | - Cheolwoo Park
- Department of Statistics, Franklin College of Arts and Sciences, University of Georgia, Athens, GA, USA
| | - Steven L Stice
- Regenerative Bioscience Center; Neuroscience, Biomedical and Health Sciences Institute; Department of Animal and Dairy Science, College of Agricultural and Environmental Sciences, University of Georgia, Athens, GA, USA
| | - Jeongyoun Ahn
- Department of Statistics, Franklin College of Arts and Sciences, University of Georgia, Athens, GA, USA
| | - Franklin D West
- Regenerative Bioscience Center; Neuroscience, Biomedical and Health Sciences Institute; Department of Animal and Dairy Science, College of Agricultural and Environmental Sciences, University of Georgia, Athens, GA, USA
| |
Collapse
|
12
|
Abstract
Different from traditional intra-subject analysis, the goal of inter-subject analysis (ISA) is to explore the dependency structure between different subjects with the intra-subject dependency as nuisance. ISA has important applications in neuroscience to study the functional connectivity between brain regions under natural stimuli. We propose a modeling framework for ISA that is based on Gaussian graphical models, under which ISA can be converted to the problem of estimation and inference of a partial Gaussian graphical model. The main statistical challenge is that we do not impose sparsity constraints on the whole precision matrix and we only assume the inter-subject part is sparse. For estimation, we propose to estimate an alternative parameter to get around the nonsparse issue and it can achieve asymptotic consistency even if the intra-subject dependency is dense. For inference, we propose an "untangle and chord" procedure to de-bias our estimator. It is valid without the sparsity assumption on the inverse Hessian of the log-likelihood function. This inferential method is general and can be applied to many other statistical problems, thus it is of independent theoretical interest. Numerical experiments on both simulated and brain imaging data validate our methods and theory. Supplementary materials for this article are available online.
Collapse
Affiliation(s)
- Cong Ma
- Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ
| | - Junwei Lu
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
| | - Han Liu
- Department of Computer Science and Department of Statistics, Northwestern University, Evanston, IL
| |
Collapse
|
13
|
Chatrabgoun O, Hosseinian-Far A, Daneshkhah A. Constructing gene regulatory networks from microarray data using non-Gaussian pair-copula Bayesian networks. J Bioinform Comput Biol 2020; 18:2050023. [PMID: 32706288 DOI: 10.1142/s0219720020500237] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Many biological and biomedical research areas such as drug design require analyzing the Gene Regulatory Networks (GRNs) to provide clear insight and understanding of the cellular processes in live cells. Under normality assumption for the genes, GRNs can be constructed by assessing the nonzero elements of the inverse covariance matrix. Nevertheless, such techniques are unable to deal with non-normality, multi-modality and heavy tailedness that are commonly seen in current massive genetic data. To relax this limitative constraint, one can apply copula function which is a multivariate cumulative distribution function with uniform marginal distribution. However, since the dependency structures of different pairs of genes in a multivariate problem are very different, the regular multivariate copula will not allow for the construction of an appropriate model. The solution to this problem is using Pair-Copula Constructions (PCCs) which are decompositions of a multivariate density into a cascade of bivariate copula, and therefore, assign different bivariate copula function for each local term. In fact, in this paper, we have constructed inverse covariance matrix based on the use of PCCs when the normality assumption can be moderately or severely violated for capturing a wide range of distributional features and complex dependency structure. To learn the non-Gaussian model for the considered GRN with non-Gaussian genomic data, we apply modified version of copula-based PC algorithm in which normality assumption of marginal densities is dropped. This paper also considers the Dynamic Time Warping (DTW) algorithm to determine the existence of a time delay relation between two genes. Breast cancer is one of the most common diseases in the world where GRN analysis of its subtypes is considerably important; Since by revealing the differences in the GRNs of these subtypes, new therapies and drugs can be found. The findings of our research are used to construct GRNs with high performance, for various subtypes of breast cancer rather than simply using previous models.
Collapse
Affiliation(s)
- O Chatrabgoun
- Department of Statistics, Malayer University, Malayer, Iran
| | - A Hosseinian-Far
- Department of Business Systems & Operations, University of Northampton, NN1 5PH, UK
| | - A Daneshkhah
- Faculty of Engineering, Environment & Computing, Coventry University, CV1 5FB, UK
| |
Collapse
|
14
|
Gunathilake M, Lee J, Choi IJ, Kim YI, Kim J. Identification of Dietary Pattern Networks Associated with Gastric Cancer Using Gaussian Graphical Models: A Case-Control Study. Cancers (Basel) 2020; 12:E1044. [PMID: 32340406 PMCID: PMC7226381 DOI: 10.3390/cancers12041044] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 04/18/2020] [Accepted: 04/21/2020] [Indexed: 12/24/2022] Open
Abstract
Gaussian graphical models (GGMs) are novel approaches to deriving dietary patterns that assess how foods are consumed in relation to one another. We aimed to apply GGMs to identify dietary patterns and to investigate the associations between dietary patterns and gastric cancer (GC) risk in a Korean population. In this case-control study of 415 GC cases and 830 controls, food intake was assessed using a 106-item semiquantitative food frequency questionnaire that captured 33 food groups. The dietary pattern networks corresponding to the total population contained a main network and four subnetworks. For the vegetable and seafood network, those who were in the highest tertile of the network-specific score showed a significantly reduced risk of GC both in the total population (OR = 0.66, 95% CI = 0.47-0.93, p for trend = 0.018) and in males (OR = 0.55, 95% CI = 0.34-0.89, p for trend = 0.012). Most importantly, the fruit pattern network was inversely associated with the risk of GC for the highest tertile (OR = 0.56, 95% CI = 0.38-0.81, p for trend = 0.002). The identified vegetable and seafood network and the fruit network showed a protective effect against GC development in Koreans.
Collapse
Affiliation(s)
- Madhawa Gunathilake
- Department of Cancer Control and Population Health, Graduate School of Cancer Science and Policy, Goyang-si 10408, Gyeonggi-do, Korea;
| | - Jeonghee Lee
- Department of Cancer Biomedical Science, Graduate School of Cancer Science and Policy, Goyang-si 10408, Gyeonggi-do, Korea;
| | - Il Ju Choi
- Center for Gastric Cancer, National Cancer Center Hospital, National Cancer Center, Goyang-si 10408, Gyeonggi-do, Korea; (I.J.C.); (Y.-I.K.)
| | - Young-Il Kim
- Center for Gastric Cancer, National Cancer Center Hospital, National Cancer Center, Goyang-si 10408, Gyeonggi-do, Korea; (I.J.C.); (Y.-I.K.)
| | - Jeongseon Kim
- Department of Cancer Biomedical Science, Graduate School of Cancer Science and Policy, Goyang-si 10408, Gyeonggi-do, Korea;
| |
Collapse
|
15
|
Dyrba M, Mohammadi R, Grothe MJ, Kirste T, Teipel SJ. Gaussian Graphical Models Reveal Inter-Modal and Inter-Regional Conditional Dependencies of Brain Alterations in Alzheimer's Disease. Front Aging Neurosci 2020; 12:99. [PMID: 32372944 PMCID: PMC7186311 DOI: 10.3389/fnagi.2020.00099] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Accepted: 03/24/2020] [Indexed: 01/14/2023] Open
Abstract
Alzheimer's disease (AD) is characterized by a sequence of pathological changes, which are commonly assessed in vivo using various brain imaging modalities such as magnetic resonance imaging (MRI) and positron emission tomography (PET). Currently, the most approaches to analyze statistical associations between regions and imaging modalities rely on Pearson correlation or linear regression models. However, these models are prone to spurious correlations arising from uninformative shared variance and multicollinearity. Notably, there are no appropriate multivariate statistical models available that can easily integrate dozens of multicollinear variables derived from such data, being able to utilize the additional information provided from the combination of data sources. Gaussian graphical models (GGMs) can estimate the conditional dependency from given data, which is conceptually expected to closely reflect the underlying causal relationships between various variables. Hence, we applied GGMs to assess multimodal regional brain alterations in AD. We obtained data from N = 972 subjects from the Alzheimer's Disease Neuroimaging Initiative. The mean amyloid load (AV45-PET), glucose metabolism (FDG-PET), and gray matter volume (MRI) were calculated for each of the 108 cortical and subcortical brain regions. GGMs were estimated using a Bayesian framework for the combined multimodal data and the resulted conditional dependency networks were compared to classical covariance networks based on Pearson correlation. Additionally, graph-theoretical network statistics were calculated to determine network alterations associated with disease status. The resulting conditional dependency matrices were much sparser (≈10% density) than Pearson correlation matrices (≈50% density). Within imaging modalities, conditional dependency networks yielded clusters connecting anatomically adjacent regions. For the associations between different modalities, only few region-specific connections were detected. Network measures such as small-world coefficient were significantly altered across diagnostic groups, with a biphasic u-shape trajectory, i.e., increased small-world coefficient in early mild cognitive impairment (MCI), similar values in late MCI, and decreased values in AD dementia patients compared to cognitively normal controls. In conclusion, GGMs removed commonly shared variance among multimodal measures of regional brain alterations in MCI and AD, and yielded sparser matrices compared to correlation networks based on the Pearson coefficient. Therefore, GGMs may be used as alternative to thresholding-approaches typically applied to correlation networks to obtain the most informative relations between variables.
Collapse
Affiliation(s)
- Martin Dyrba
- German Center for Neurodegenerative Diseases (DZNE), Rostock, Germany
| | - Reza Mohammadi
- Department of Operation Management, Amsterdam Business School, University of Amsterdam, Amsterdam, Netherlands
| | - Michel J Grothe
- German Center for Neurodegenerative Diseases (DZNE), Rostock, Germany
| | - Thomas Kirste
- Mobile Multimedia Information Systems Group (MMIS), University of Rostock, Rostock, Germany
| | - Stefan J Teipel
- German Center for Neurodegenerative Diseases (DZNE), Rostock, Germany.,Clinic for Psychosomatics and Psychotherapeutic Medicine, Rostock University Medical Center, Rostock, Germany
| |
Collapse
|
16
|
Grzegorczyk M, Aderhold A, Husmeier D. Overview and Evaluation of Recent Methods for Statistical Inference of Gene Regulatory Networks from Time Series Data. Methods Mol Biol 2019; 1883:49-94. [PMID: 30547396 DOI: 10.1007/978-1-4939-8882-2_3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/14/2023]
Abstract
A challenging problem in systems biology is the reconstruction of gene regulatory networks from postgenomic data. A variety of reverse engineering methods from machine learning and computational statistics have been proposed in the literature. However, deciding on the best method to adopt for a particular application or data set might be a confusing task. The present chapter provides a broad overview of state-of-the-art methods with an emphasis on conceptual understanding rather than a deluge of mathematical details, and the pros and cons of the various approaches are discussed. Guidance on practical applications with pointers to publicly available software implementations are included. The chapter concludes with a comprehensive comparative benchmark study on simulated data and a real-work application taken from the current plant systems biology.
Collapse
Affiliation(s)
- Marco Grzegorczyk
- Johann Bernoulli Institute, University of Groningen, Groningen, The Netherlands
| | - Andrej Aderhold
- Center for Computer Science, Universidade Federal do Rio Grande, Rio Grande, Brazil
| | - Dirk Husmeier
- School of Mathematics and Statistics, University of Glasgow, Glasgow, UK.
| |
Collapse
|
17
|
Oza VH, Aicher JK, Reed LK. Random Forest Analysis of Untargeted Metabolomics Data Suggests Increased Use of Omega Fatty Acid Oxidation Pathway in Drosophila Melanogaster Larvae Fed a Medium Chain Fatty Acid Rich High-Fat Diet. Metabolites 2018; 9:metabo9010005. [PMID: 30602659 PMCID: PMC6359074 DOI: 10.3390/metabo9010005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 12/27/2018] [Accepted: 12/27/2018] [Indexed: 12/27/2022] Open
Abstract
Obesity is a complex disease, shaped by both genetic and environmental factors such as diet. In this study, we use untargeted metabolomics and Drosophila melanogaster to model how diet and genotype shape the metabolome of obese phenotypes. We used 16 distinct outbred genotypes of Drosophila larvae raised on normal (ND) and high-fat (HFD) diets, to produce three distinct phenotypic classes; genotypes that stored more triglycerides on a ND relative to the HFD, genotypes that stored more triglycerides on a HFD relative to ND, and genotypes that showed no change in triglyceride storage on either of the two diets. Using untargeted metabolomics we characterized 350 metabolites: 270 with definitive chemical IDs and 80 that were chemically unidentified. Using random forests, we determined metabolites that were important in discriminating between the HFD and ND larvae as well as between the triglyceride phenotypic classes. We found that flies fed on a HFD showed evidence of an increased use of omega fatty acid oxidation pathway, an alternative to the more commonly used beta fatty acid oxidation pathway. Additionally, we observed no correlation between the triglyceride storage phenotype and free fatty acid levels (laurate, caprate, caprylate, caproate), indicating that the distinct metabolic profile of fatty acids in high-fat diet fed Drosophila larvae does not propagate into triglyceride storage differences. However, dipeptides did show moderate differences between the phenotypic classes. We fit Gaussian graphical models (GGMs) of the metabolic profiles for HFD and ND flies to characterize changes in metabolic network structure between the two diets, finding the HFD to have a greater number of edges indicating that metabolome varies more across samples on a HFD. Taken together, these results show that, in the context of obesity, metabolomic profiles under distinct dietary conditions may not be reliable predictors of phenotypic outcomes in a genetically diverse population.
Collapse
Affiliation(s)
- Vishal H Oza
- Department of Biological Sciences, University of Alabama, Tuscaloosa, AL 35487, USA.
| | - Joseph K Aicher
- Department of Biological Sciences, University of Alabama, Tuscaloosa, AL 35487, USA.
| | - Laura K Reed
- Department of Biological Sciences, University of Alabama, Tuscaloosa, AL 35487, USA.
| |
Collapse
|
18
|
Abstract
Linear discriminant analysis (LDA) is a well-known classification technique that enjoyed great success in practical applications. Despite its effectiveness for traditional low-dimensional problems, extensions of LDA are necessary in order to classify high-dimensional data. Many variants of LDA have been proposed in the literature. However, most of these methods do not fully incorporate the structure information among predictors when such information is available. In this paper, we introduce a new high-dimensional LDA technique, namely graph-based sparse LDA (GSLDA), that utilizes the graph structure among the features. In particular, we use the regularized regression formulation for penalized LDA techniques, and propose to impose a structure-based sparse penalty on the discriminant vector β . The graph structure can be either given or estimated from the training data. Moreover, we explore the relationship between the within-class feature structure and the overall feature structure. Based on this relationship, we further propose a variant of our proposed GSLDA to utilize effectively unlabeled data, which can be abundant in the semi-supervised learning setting. With the new regularization, we can obtain a sparse estimate of β and more accurate and interpretable classifiers than many existing methods. Both the selection consistency of β estimation and the convergence rate of the classifier are established, and the resulting classifier has an asymptotic Bayes error rate. Finally, we demonstrate the competitive performance of the proposed GSLDA on both simulated and real data studies.
Collapse
Affiliation(s)
- Jianyu Liu
- Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Guan Yu
- Department of Biostatistics, University at Buffalo, Buffalo, NY 14214, USA
| | - Yufeng Liu
- Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC 27599, USA.,Department of Genetics, Department of Biostatistics, and Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, NC 27599, USA
| |
Collapse
|
19
|
Higgins IA, Kundu S, Guo Y. Integrative Bayesian analysis of brain functional networks incorporating anatomical knowledge. Neuroimage 2018; 181:263-278. [PMID: 30017786 DOI: 10.1016/j.neuroimage.2018.07.015] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2018] [Revised: 07/04/2018] [Accepted: 07/05/2018] [Indexed: 12/31/2022] Open
Abstract
Recently, there has been increased interest in fusing multimodal imaging to better understand brain organization by integrating information on both brain structure and function. In particular, incorporating anatomical knowledge leads to desirable outcomes such as increased accuracy in brain network estimates and greater reproducibility of topological features across scanning sessions. Despite the clear advantages, major challenges persist in integrative analyses including an incomplete understanding of the structure-function relationship and inaccuracies in mapping anatomical structures due to inherent deficiencies in existing imaging technology. This calls for the development of advanced network modeling tools that appropriately incorporate anatomical structure in constructing brain functional networks. We propose a hierarchical Bayesian Gaussian graphical modeling approach which models the brain functional networks via sparse precision matrices whose degree of edge specific shrinkage is a random variable that is modeled using both anatomical structure and an independent baseline component. The proposed approach adaptively shrinks functional connections and flexibly identifies functional connections supported by structural connectivity knowledge. This enables robust brain network estimation even in the presence of misspecified anatomical knowledge, while accommodating heterogeneity in the structure-function relationship. We implement the approach via an efficient optimization algorithm which yields maximum a posteriori estimates. Extensive numerical studies involving multiple functional network structures reveal the clear advantages of the proposed approach over competing methods in accurately estimating brain functional connectivity, even when the anatomical knowledge is misspecified up to a certain degree. An application of the approach to data from the Philadelphia Neurodevelopmental Cohort (PNC) study reveals gender based connectivity differences across multiple age groups, and higher reproducibility in the estimation of network metrics compared to alternative methods.
Collapse
Affiliation(s)
- Ixavier A Higgins
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, 30322, USA
| | - Suprateek Kundu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, 30322, USA.
| | - Ying Guo
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, 30322, USA
| |
Collapse
|
20
|
VINCI GIUSEPPE, VENTURA VALÉRIE, SMITH MATTHEWA, KASS ROBERTE. ADJUSTED REGULARIZATION IN LATENT GRAPHICAL MODELS: APPLICATION TO MULTIPLE-NEURON SPIKE COUNT DATA. Ann Appl Stat 2018; 12:1068-1095. [PMID: 31772696 PMCID: PMC6879176 DOI: 10.1214/18-aoas1190] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
A major challenge in contemporary neuroscience is to analyze data from large numbers of neurons recorded simultaneously across many experimental replications (trials), where the data are counts of neural firing events, and one of the basic problems is to characterize the dependence structure among such multivariate counts. Methods of estimating high-dimensional covariation based on ℓ 1-regularization are most appropriate when there are a small number of relatively large partial correlations, but in neural data there are often large numbers of relatively small partial correlations. Furthermore, the variation across trials is often confounded by Poisson-like variation within trials. To overcome these problems we introduce a comprehensive methodology that imbeds a Gaussian graphical model into a hierarchical structure: the counts are assumed Poisson, conditionally on latent variables that follow a Gaussian graphical model, and the graphical model parameters, in turn, are assumed to depend on physiologically-motivated covariates, which can greatly improve correct detection of interactions (non-zero partial correlations). We develop a Bayesian approach to fitting this covariate-adjusted generalized graphical model and we demonstrate its success in simulation studies. We then apply it to data from an experiment on visual attention, where we assess functional interactions between neurons recorded from two brain areas.
Collapse
Affiliation(s)
- GIUSEPPE VINCI
- Rice University, Department of Statistics, Duncan Hall, 6100 Main St, Houston, 77005, TX, USA
| | - VALÉRIE VENTURA
- Carnegie Mellon University, Department of Statistics, Baker Hall 132, 5000 Forbes Avenue, Pittsburgh, 15203, PA, USA
- Center for the Neural Basis of Cognition
| | - MATTHEW A. SMITH
- University of Pittsburgh, Department of Ophthalmology, Eye and Ear Institute, Room 914 203 Lothrop St., Pittsburgh, PA 15213, USA
- Center for the Neural Basis of Cognition
| | - ROBERT E. KASS
- Carnegie Mellon University, Department of Statistics, Baker Hall 132, 5000 Forbes Avenue, Pittsburgh, 15203, PA, USA
- Center for the Neural Basis of Cognition
| |
Collapse
|
21
|
Kay JW, Ince RAA. Exact Partial Information Decompositions for Gaussian Systems Based on Dependency Constraints. Entropy (Basel) 2018; 20:e20040240. [PMID: 33265331 PMCID: PMC7512755 DOI: 10.3390/e20040240] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Revised: 03/26/2018] [Accepted: 03/27/2018] [Indexed: 12/03/2022]
Abstract
The Partial Information Decomposition, introduced by Williams P. L. et al. (2010), provides a theoretical framework to characterize and quantify the structure of multivariate information sharing. A new method (Idep) has recently been proposed by James R. G. et al. (2017) for computing a two-predictor partial information decomposition over discrete spaces. A lattice of maximum entropy probability models is constructed based on marginal dependency constraints, and the unique information that a particular predictor has about the target is defined as the minimum increase in joint predictor-target mutual information when that particular predictor-target marginal dependency is constrained. Here, we apply the Idep approach to Gaussian systems, for which the marginally constrained maximum entropy models are Gaussian graphical models. Closed form solutions for the Idep PID are derived for both univariate and multivariate Gaussian systems. Numerical and graphical illustrations are provided, together with practical and theoretical comparisons of the Idep PID with the minimum mutual information partial information decomposition (Immi), which was discussed by Barrett A. B. (2015). The results obtained using Idep appear to be more intuitive than those given with other methods, such as Immi, in which the redundant and unique information components are constrained to depend only on the predictor-target marginal distributions. In particular, it is proved that the Immi method generally produces larger estimates of redundancy and synergy than does the Idep method. In discussion of the practical examples, the PIDs are complemented by the use of tests of deviance for the comparison of Gaussian graphical models.
Collapse
Affiliation(s)
- Jim W. Kay
- Department of Statistics, University of Glasgow, Glasgow G12 8QQ, UK
- Correspondence:
| | - Robin A. A. Ince
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow G12 8QQ, UK
| |
Collapse
|
22
|
Abstract
The goal of the gene regulatory network (GRN) inference is to determine the interactions between genes given heterogeneous data capturing spatiotemporal gene expression. Since transcription underlines all cellular processes, the inference of GRN is the first step in deciphering the determinants of the dynamics of biological systems. Here, we first describe the generic steps of the inference approaches that rely on similarity measures and group the similarity measures based on the computational methodology used. For each group of similarity measures, we not only review the existing approaches but also describe specifically the detailed steps of the existing state-of-the-art algorithms.
Collapse
Affiliation(s)
- Nooshin Omranian
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, Potsdam-Golm, 14476, Germany
| | - Zoran Nikoloski
- Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, Potsdam-Golm, 14476, Germany.
| |
Collapse
|
23
|
Vinciotti V, Wit EC, Jansen R, de Geus EJCN, Penninx BWJH, Boomsma DI, ’t Hoen PAC. Consistency of biological networks inferred from microarray and sequencing data. BMC Bioinformatics 2016; 17:254. [PMID: 27342572 PMCID: PMC4919861 DOI: 10.1186/s12859-016-1136-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2016] [Accepted: 06/10/2016] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Sparse Gaussian graphical models are popular for inferring biological networks, such as gene regulatory networks. In this paper, we investigate the consistency of these models across different data platforms, such as microarray and next generation sequencing, on the basis of a rich dataset containing samples that are profiled under both techniques as well as a large set of independent samples. RESULTS Our analysis shows that individual node variances can have a remarkable effect on the connectivity of the resulting network. Their inconsistency across platforms and the fact that the variability level of a node may not be linked to its regulatory role mean that, failing to scale the data prior to the network analysis, leads to networks that are not reproducible across different platforms and that may be misleading. Moreover, we show how the reproducibility of networks across different platforms is significantly higher if networks are summarised in terms of enrichment amongst functional groups of interest, such as pathways, rather than at the level of individual edges. CONCLUSIONS Careful pre-processing of transcriptional data and summaries of networks beyond individual edges can improve the consistency of network inference across platforms. However, caution is needed at this stage in the (over)interpretation of gene regulatory networks inferred from biological data.
Collapse
Affiliation(s)
| | - Ernst C. Wit
- />Johann Bernoulli Institute of Mathematics and Computer Science, University of Groningen, Groningen, The Netherlands
| | - Rick Jansen
- />VU University Medical Center, Amsterdam, The Netherlands
| | | | | | | | - Peter A. C. ’t Hoen
- />Leiden University Medical Center, Leiden University, Leiden, The Netherlands
| |
Collapse
|
24
|
Narayan M, Allen GI. Mixed Effects Models for Resampled Network Statistics Improves Statistical Power to Find Differences in Multi-Subject Functional Connectivity. Front Neurosci 2016; 10:108. [PMID: 27147940 PMCID: PMC4828454 DOI: 10.3389/fnins.2016.00108] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2015] [Accepted: 03/07/2016] [Indexed: 12/11/2022] Open
Abstract
Many complex brain disorders, such as autism spectrum disorders, exhibit a wide range of symptoms and disability. To understand how brain communication is impaired in such conditions, functional connectivity studies seek to understand individual differences in brain network structure in terms of covariates that measure symptom severity. In practice, however, functional connectivity is not observed but estimated from complex and noisy neural activity measurements. Imperfect subject network estimates can compromise subsequent efforts to detect covariate effects on network structure. We address this problem in the case of Gaussian graphical models of functional connectivity, by proposing novel two-level models that treat both subject level networks and population level covariate effects as unknown parameters. To account for imperfectly estimated subject level networks when fitting these models, we propose two related approaches-R (2) based on resampling and random effects test statistics, and R (3) that additionally employs random adaptive penalization. Simulation studies using realistic graph structures reveal that R (2) and R (3) have superior statistical power to detect covariate effects compared to existing approaches, particularly when the number of within subject observations is comparable to the size of subject networks. Using our novel models and methods to study parts of the ABIDE dataset, we find evidence of hypoconnectivity associated with symptom severity in autism spectrum disorders, in frontoparietal and limbic systems as well as in anterior and posterior cingulate cortices.
Collapse
Affiliation(s)
- Manjari Narayan
- Department of Electrical and Computer Engineering, Rice UniversityHouston, TX, USA
| | - Genevera I. Allen
- Department of Electrical and Computer Engineering, Rice UniversityHouston, TX, USA
- Department of Statistics, Rice UniversityHouston, TX, USA
- Jan and Dan Duncan Neurological Research Institute and Department of Pediatrics-Neurology at Baylor College of MedicineHouston, TX, USA
| |
Collapse
|
25
|
Iqbal K, Buijsse B, Wirth J, Schulze MB, Floegel A, Boeing H. Gaussian Graphical Models Identify Networks of Dietary Intake in a German Adult Population. J Nutr 2016; 146:646-52. [PMID: 26817715 DOI: 10.3945/jn.115.221135] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2015] [Accepted: 12/17/2015] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Data-reduction methods such as principal component analysis are often used to derive dietary patterns. However, such methods do not assess how foods are consumed in relation to each other. Gaussian graphical models (GGMs) are a set of novel methods that can address this issue. OBJECTIVE We sought to apply GGMs to derive sex-specific dietary intake networks representing consumption patterns in a German adult population. METHODS Dietary intake data from 10,780 men and 16,340 women of the European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam cohort were cross-sectionally analyzed to construct dietary intake networks. Food intake for each participant was estimated using a 148-item food-frequency questionnaire that captured the intake of 49 food groups. GGMs were applied to log-transformed intakes (grams per day) of 49 food groups to construct sex-specific food networks. Semiparametric Gaussian copula graphical models (SGCGMs) were used to confirm GGM results. RESULTS In men, GGMs identified 1 major dietary network that consisted of intakes of red meat, processed meat, cooked vegetables, sauces, potatoes, cabbage, poultry, legumes, mushrooms, soup, and whole-grain and refined breads. For women, a similar network was identified with the addition of fried potatoes. Other identified networks consisted of dairy products and sweet food groups. SGCGMs yielded results comparable to those of GGMs. CONCLUSIONS GGMs are a powerful exploratory method that can be used to construct dietary networks representing dietary intake patterns that reveal how foods are consumed in relation to each other. GGMs indicated an apparent major role of red meat intake in a consumption pattern in the studied population. In the future, identified networks might be transformed into pattern scores for investigating their associations with health outcomes.
Collapse
Affiliation(s)
| | | | | | - Matthias B Schulze
- Molecular Epidemiology, German Institute of Human Nutrition Potsdam-Rehbruecke, Nuthetal, Germany; and German Center for Diabetes Research, Neuherberg, Germany
| | | | | |
Collapse
|
26
|
Ortiz A, Munilla J, Álvarez-Illán I, Górriz JM, Ramírez J. Exploratory graphical models of functional and structural connectivity patterns for Alzheimer's Disease diagnosis. Front Comput Neurosci 2015; 9:132. [PMID: 26578945 PMCID: PMC4630314 DOI: 10.3389/fncom.2015.00132] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Accepted: 10/12/2015] [Indexed: 11/21/2022] Open
Abstract
Alzheimer's Disease (AD) is the most common neurodegenerative disease in elderly people. Its development has been shown to be closely related to changes in the brain connectivity network and in the brain activation patterns along with structural changes caused by the neurodegenerative process. Methods to infer dependence between brain regions are usually derived from the analysis of covariance between activation levels in the different areas. However, these covariance-based methods are not able to estimate conditional independence between variables to factor out the influence of other regions. Conversely, models based on the inverse covariance, or precision matrix, such as Sparse Gaussian Graphical Models allow revealing conditional independence between regions by estimating the covariance between two variables given the rest as constant. This paper uses Sparse Inverse Covariance Estimation (SICE) methods to learn undirected graphs in order to derive functional and structural connectivity patterns from Fludeoxyglucose (18F-FDG) Position Emission Tomography (PET) data and segmented Magnetic Resonance images (MRI), drawn from the ADNI database, for Control, MCI (Mild Cognitive Impairment Subjects), and AD subjects. Sparse computation fits perfectly here as brain regions usually only interact with a few other areas. The models clearly show different metabolic covariation patters between subject groups, revealing the loss of strong connections in AD and MCI subjects when compared to Controls. Similarly, the variance between GM (Gray Matter) densities of different regions reveals different structural covariation patterns between the different groups. Thus, the different connectivity patterns for controls and AD are used in this paper to select regions of interest in PET and GM images with discriminative power for early AD diagnosis. Finally, functional an structural models are combined to leverage the classification accuracy. The results obtained in this work show the usefulness of the Sparse Gaussian Graphical models to reveal functional and structural connectivity patterns. This information provided by the sparse inverse covariance matrices is not only used in an exploratory way but we also propose a method to use it in a discriminative way. Regression coefficients are used to compute reconstruction errors for the different classes that are then introduced in a SVM for classification. Classification experiments performed using 68 Controls, 70 AD, and 111 MCI images and assessed by cross-validation show the effectiveness of the proposed method.
Collapse
Affiliation(s)
- Andrés Ortiz
- Department Communications Engineering, Universidad de Málaga Málaga, Spain
| | - Jorge Munilla
- Department Communications Engineering, Universidad de Málaga Málaga, Spain
| | | | - Juan M Górriz
- Department Signal Theory, Networking and Communications, University of Granada Granada, Spain
| | - Javier Ramírez
- Department Signal Theory, Networking and Communications, University of Granada Granada, Spain
| | | |
Collapse
|
27
|
Yu D, Son W, Lim J, Xiao G. Statistical completion of a partially identified graph with applications for the estimation of gene regulatory networks. Biostatistics 2015; 16:670-85. [PMID: 25837438 DOI: 10.1093/biostatistics/kxv013] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2014] [Accepted: 03/03/2015] [Indexed: 12/15/2022] Open
Abstract
We study the estimation of a Gaussian graphical model whose dependent structures are partially identified. In a Gaussian graphical model, an off-diagonal zero entry in the concentration matrix (the inverse covariance matrix) implies the conditional independence of two corresponding variables, given all other variables. A number of methods have been proposed to estimate a sparse large-scale Gaussian graphical model or, equivalently, a sparse large-scale concentration matrix. In practice, the graph structure to be estimated is often partially identified by other sources or a pre-screening. In this paper, we propose a simple modification of existing methods to take into account this information in the estimation. We show that the partially identified dependent structure reduces the error in estimating the dependent structure. We apply the proposed method to estimating the gene regulatory network from lung cancer data, where protein-protein interactions are partially identified from the human protein reference database. The application shows that proposed method identified many important cancer genes as hub genes in the constructed lung cancer network. In addition, we validated the prognostic importance of a newly identified cancer gene, PTPN13, in four independent lung cancer datasets. The results indicate that the proposed method could facilitate studying underlying lung cancer mechanisms and identifying reliable biomarkers for lung cancer prognosis.
Collapse
Affiliation(s)
- Donghyeon Yu
- Department of Statistics, Keimyung University, Daegu, Korea
| | - Won Son
- Department of Statistics, Seoul National University, Seoul, Korea
| | - Johan Lim
- Department of Statistics, Seoul National University, Seoul, Korea
| | - Guanghua Xiao
- Department of Clinical Sciences, University of Texas Southwestern Medical Center, TX 75390, USA
| |
Collapse
|
28
|
Abstract
This paper proposes a new method for estimating sparse precision matrices in the high dimensional setting. It has been popular to study fast computation and adaptive procedures for this problem. We propose a novel approach, called Sparse Column-wise Inverse Operator, to address these two issues. We analyze an adaptive procedure based on cross validation, and establish its convergence rate under the Frobenius norm. The convergence rates under other matrix norms are also established. This method also enjoys the advantage of fast computation for large-scale problems, via a coordinate descent algorithm. Numerical merits are illustrated using both simulated and real datasets. In particular, it performs favorably on an HIV brain tissue dataset and an ADHD resting-state fMRI dataset.
Collapse
Affiliation(s)
- Weidong Liu
- Department of Mathematics and Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, CHINA
| | - Xi Luo
- Department of Biostatistics and Center for Statistical Sciences, Brown University, Providence, Rhode Island, USA
- Brown Institute for Brain Science, Brown University, Providence, Rhode Island, USA
- Initiative for Computation in Brain and Mind, Brown University, Providence, Rhode Island, USA
| |
Collapse
|
29
|
Ha MJ, Baladandayuthapani V, Do KA. Prognostic gene signature identification using causal structure learning: applications in kidney cancer. Cancer Inform 2015; 14:23-35. [PMID: 25861215 PMCID: PMC4362630 DOI: 10.4137/cin.s14873] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2014] [Revised: 07/21/2014] [Accepted: 07/21/2014] [Indexed: 12/21/2022] Open
Abstract
Identification of molecular-based signatures is one of the critical steps toward finding therapeutic targets in cancer. In this paper, we propose methods to discover prognostic gene signatures under a causal structure learning framework across the whole genome. The causal structures are represented by directed acyclic graphs (DAGs), wherein we construct gene-specific network modules that constitute a gene and its corresponding regulators. The modules are then subsequently used to correlate with survival times, thus, allowing for a network-oriented approach to gene selection to adjust for potential confounders, as opposed to univariate (gene-by-gene) approaches. Our methods are motivated by and applied to a clear cell renal cell carcinoma (ccRCC) study from The Cancer Genome Atlas (TCGA) where we find several prognostic genes associated with cancer progression - some of which are novel while others confirm existing findings.
Collapse
Affiliation(s)
- Min Jin Ha
- Department of Biostatistics, The University of Texas, MD Anderson Cancer Center, Houston, TX, USA
| | | | - Kim-Anh Do
- Department of Biostatistics, The University of Texas, MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
30
|
Abstract
The identification of predefined groups of genes ("gene-sets") which are differentially expressed between two conditions ("gene-set analysis", or GSA) is a very popular analysis in bioinformatics. GSA incorporates biological knowledge by aggregating over genes that are believed to be functionally related. This can enhance statistical power over analyses that consider only one gene at a time. However, currently available GSA approaches are based on univariate two-sample comparison of single genes. This means that they cannot test for multivariate hypotheses such as differences in covariance structure between the two conditions. Yet interplay between genes is a central aspect of biological investigation and it is likely that such interplay may differ between conditions. This paper proposes a novel approach for gene-set analysis that allows for truly multivariate hypotheses, in particular differences in gene-gene networks between conditions. Testing hypotheses concerning networks is challenging due the nature of the underlying estimation problem. Our starting point is a recent, general approach for high-dimensional two-sample testing. We refine the approach and show how it can be used to perform multivariate, network-based gene-set testing. We validate the approach in simulated examples and show results using high-throughput data from several studies in cancer biology.
Collapse
Affiliation(s)
- Nicolas Städler
- The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, Netherlands
| | - Sach Mukherjee
- The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, Netherlands
| |
Collapse
|
31
|
Do KT, Kastenmüller G, Mook-Kanamori DO, Yousri NA, Theis FJ, Suhre K, Krumsiek J. Network-based approach for analyzing intra- and interfluid metabolite associations in human blood, urine, and saliva. J Proteome Res 2014; 14:1183-94. [PMID: 25434815 DOI: 10.1021/pr501130a] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Most studies investigating human metabolomics measurements are limited to a single biofluid, most often blood or urine. An organism's biochemical pool, however, comprises complex transboundary relationships, which can only be understood by investigating metabolic interactions and physiological processes spanning multiple parts of the human body. Therefore, we here propose a data-driven network-based approach to generate an integrated picture of metabolomics associations over multiple fluids. We performed an analysis of 2251 metabolites measured in plasma, urine, and saliva, from 374 participants of the Qatar Metabolomics Study on Diabetes (QMDiab). Gaussian graphical models (GGMs) were used to estimate metabolite-metabolite interactions on different subsets of the data set. First, we compared similarities and differences of the metabolome and the association networks between the three fluids. Second, we investigated the cross-talk between the fluids by analyzing correlations occurring between them. Third, we propose a framework for the analysis of medically relevant phenotypes by integrating type 2 diabetes, sex, age, and body mass index into our networks. In conclusion, we present a generic, data-driven network-based approach for structuring and visualizing metabolite correlations within and between multiple body fluids, enabling unbiased interpretation of metabolomics multifluid data.
Collapse
Affiliation(s)
- Kieu Trinh Do
- Institute of Computational Biology and ‡Institute of Bioinformatics and Systems Biology Helmholtz-Zentrum München , D-85764 Neuherberg, Germany
| | | | | | | | | | | | | |
Collapse
|
32
|
Rosa MJ, Portugal L, Hahn T, Fallgatter AJ, Garrido MI, Shawe-Taylor J, Mourao-Miranda J. Sparse network-based models for patient classification using fMRI. Neuroimage 2014; 105:493-506. [PMID: 25463459 PMCID: PMC4275574 DOI: 10.1016/j.neuroimage.2014.11.021] [Citation(s) in RCA: 103] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2014] [Revised: 10/21/2014] [Accepted: 11/08/2014] [Indexed: 12/20/2022] Open
Abstract
Pattern recognition applied to whole-brain neuroimaging data, such as functional Magnetic Resonance Imaging (fMRI), has proved successful at discriminating psychiatric patients from healthy participants. However, predictive patterns obtained from whole-brain voxel-based features are difficult to interpret in terms of the underlying neurobiology. Many psychiatric disorders, such as depression and schizophrenia, are thought to be brain connectivity disorders. Therefore, pattern recognition based on network models might provide deeper insights and potentially more powerful predictions than whole-brain voxel-based approaches. Here, we build a novel sparse network-based discriminative modeling framework, based on Gaussian graphical models and L1-norm regularized linear Support Vector Machines (SVM). In addition, the proposed framework is optimized in terms of both predictive power and reproducibility/stability of the patterns. Our approach aims to provide better pattern interpretation than voxel-based whole-brain approaches by yielding stable brain connectivity patterns that underlie discriminative changes in brain function between the groups. We illustrate our technique by classifying patients with major depressive disorder (MDD) and healthy participants, in two (event- and block-related) fMRI datasets acquired while participants performed a gender discrimination and emotional task, respectively, during the visualization of emotional valent faces. Connectivity-based predictive models can potentially improve psychiatric diagnosis. We present a novel sparse network-based discriminative model for fMRI data. This model jointly optimizes predictive power and stability of the output patterns. Aiming at better pattern interpretation than voxel-based/whole-brain models. Model is applied to classify patients with major depression from two fMRI datasets.
Collapse
Affiliation(s)
- Maria J Rosa
- Department of Computer Science, Centre for Computational Statistics and Machine Learning, University College London, London, UK; Centre for Neuroimaging Sciences, Department of Neuroimaging, Institute of Psychiatry, King's College London, London, UK.
| | - Liana Portugal
- Department of Computer Science, Centre for Computational Statistics and Machine Learning, University College London, London, UK; LABNEC, Universidade Federal Fluminense, Rio de Janeiro, Brazil
| | - Tim Hahn
- Department of Cognitive Psychology II, Johann Wolfgang Goethe University Frankfurt am Main, Germany
| | - Andreas J Fallgatter
- University of Tuebingen, Department of Psychiatry and Psychotherapy, Tuebingen, Germany
| | - Marta I Garrido
- Queensland Brain Institute, The University of Queensland, Brisbane, Australia; Centre for Advanced Imaging, The University of Queensland, Brisbane, Australia; Australian Research Council Centre of Excellence for Integrative Brain Function, Australia
| | - John Shawe-Taylor
- Department of Computer Science, Centre for Computational Statistics and Machine Learning, University College London, London, UK
| | - Janaina Mourao-Miranda
- Department of Computer Science, Centre for Computational Statistics and Machine Learning, University College London, London, UK
| |
Collapse
|
33
|
Abstract
Revealing biological networks is one key objective in systems biology. With microarrays, researchers now routinely measure expression profiles at the genome level under various conditions, and, such data may be utilized to statistically infer gene regulation networks. Gaussian graphical models (GGMs) have proven useful for this purpose by modeling the Markovian dependence among genes. However, a single GGM may not be adequate to describe the potentially differing networks across various conditions, and hence it is more natural to infer multiple GGMs from such data. In the present study, we propose a class of nonconvex penalty functions aiming at the estimation of multiple GGMs with a flexible joint sparsity constraint. We illustrate the property of our proposed nonconvex penalty functions by simulation study. We then apply the method to a gene expression data set from the GenCord Project, and show that our method can identify prominent pathways across different conditions.
Collapse
|
34
|
Zuo Y, Yu G, Tadesse MG, Ressom HW. Biological network inference using low order partial correlation. Methods 2014; 69:266-73. [PMID: 25003577 DOI: 10.1016/j.ymeth.2014.06.010] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2014] [Revised: 06/14/2014] [Accepted: 06/19/2014] [Indexed: 11/21/2022] Open
Abstract
Biological network inference is a major challenge in systems biology. Traditional correlation-based network analysis results in too many spurious edges since correlation cannot distinguish between direct and indirect associations. To address this issue, Gaussian graphical models (GGM) were proposed and have been widely used. Though they can significantly reduce the number of spurious edges, GGM are insufficient to uncover a network structure faithfully due to the fact that they only consider the full order partial correlation. Moreover, when the number of samples is smaller than the number of variables, further technique based on sparse regularization needs to be incorporated into GGM to solve the singular covariance inversion problem. In this paper, we propose an efficient and mathematically solid algorithm that infers biological networks by computing low order partial correlation (LOPC) up to the second order. The bias introduced by the low order constraint is minimal compared to the more reliable approximation of the network structure achieved. In addition, the algorithm is suitable for a dataset with small sample size but large number of variables. Simulation results show that LOPC yields far less spurious edges and works well under various conditions commonly seen in practice. The application to a real metabolomics dataset further validates the performance of LOPC and suggests its potential power in detecting novel biomarkers for complex disease.
Collapse
|
35
|
Abstract
It is challenging to identify meaningful gene networks because biological interactions are often condition-specific and confounded with external factors. It is necessary to integrate multiple sources of genomic data to facilitate network inference. For example, one can jointly model expression datasets measured from multiple tissues with molecular marker data in so-called genetical genomic studies. In this paper, we propose a joint conditional Gaussian graphical model (JCGGM) that aims for modeling biological processes based on multiple sources of data. This approach is able to integrate multiple sources of information by adopting conditional models combined with joint sparsity regularization. We apply our approach to a real dataset measuring gene expression in four tissues (kidney, liver, heart, and fat) from recombinant inbred rats. Our approach reveals that the liver tissue has the highest level of tissue-specific gene regulations among genes involved in insulin responsive facilitative sugar transporter mediated glucose transport pathway, followed by heart and fat tissues, and this finding can only be attained from our JCGGM approach.
Collapse
Affiliation(s)
- Hyonho Chun
- Department of Statistics, Purdue University West Lafayette, IN, USA
| | - Min Chen
- Department of Mathematical Sciences, University of Texas at Dallas Dallas, TX, USA
| | - Bing Li
- Department of Statistics, The Pennsylvania State University, University Park PA, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health New Haven, CT, USA
| |
Collapse
|
36
|
Mazumder R, Hastie T. Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso. J Mach Learn Res 2012; 13:781-794. [PMID: 25392704 PMCID: PMC4225650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
We consider the sparse inverse covariance regularization problem or graphical lasso with regularization parameter λ. Suppose the sample covariance graph formed by thresholding the entries of the sample covariance matrix at λ is decomposed into connected components. We show that the vertex-partition induced by the connected components of the thresholded sample covariance graph (at λ) is exactly equal to that induced by the connected components of the estimated concentration graph, obtained by solving the graphical lasso problem for the same λ. This characterizes a very interesting property of a path of graphical lasso solutions. Furthermore, this simple rule, when used as a wrapper around existing algorithms for the graphical lasso, leads to enormous performance gains. For a range of values of λ, our proposal splits a large graphical lasso problem into smaller tractable problems, making it possible to solve an otherwise infeasible large-scale problem. We illustrate the graceful scalability of our proposal via synthetic and real-life microarray examples.
Collapse
|
37
|
Abstract
We study the problem of estimating a temporally varying coefficient and varying structure (VCVS) graphical model underlying data collected over a period of time, such as social states of interacting individuals or microarray expression profiles of gene networks, as opposed to i.i.d. data from an invariant model widely considered in current literature of structural estimation. In particular, we consider the scenario in which the model evolves in a piece-wise constant fashion. We propose a procedure that estimates the structure of a graphical model by minimizing the temporally smoothed L1 penalized regression, which allows jointly estimating the partition boundaries of the VCVS model and the coefficient of the sparse precision matrix on each block of the partition. A highly scalable proximal gradient method is proposed to solve the resultant convex optimization problem; and the conditions for sparsistent estimation and the convergence rate of both the partition boundaries and the network structure are established for the first time for such estimators.
Collapse
Affiliation(s)
- Mladen Kolar
- Machine Learning Department, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213
| | - Eric P Xing
- Machine Learning Department, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213
| |
Collapse
|
38
|
Abstract
In many applications the graph structure in a network arises from two sources: intrinsic connections and connections due to external effects. We introduce a sparse estimation procedure for graphical models that is capable of isolating the intrinsic connections by removing the external effects. Technically, this is formulated as a conditional graphical model, in which the external effects are modeled as predictors, and the graph is determined by the conditional precision matrix. We introduce two sparse estimators of this matrix using the reproduced kernel Hilbert space combined with lasso and adaptive lasso. We establish the sparsity, variable selection consistency, oracle property, and the asymptotic distributions of the proposed estimators. We also develop their convergence rate when the dimension of the conditional precision matrix goes to infinity. The methods are compared with sparse estimators for unconditional graphical models, and with the constrained maximum likelihood estimate that assumes a known graph structure. The methods are applied to a genetic data set to construct a gene network conditioning on single-nucleotide polymorphisms.
Collapse
Affiliation(s)
- Bing Li
- Professor of Statistics, The Pennsylvania State University, 326 Thomas Building, University Park, PA 16802
| | - Hyonho Chuns
- Assistant Professor of Statistics, Purdue University, 250 N. University Street, West Lafayette, IN 47907
| | - Hongyu Zhao
- Professor of Biostatistics, Yale University, Suite 503, 300 George Street, New Haven, CT 06510
| |
Collapse
|