1
|
Feng C, Jia H, Wang H, Wang J, Lin M, Hu X, Yu C, Song H, Wang L. MicroNet-MIMRF: a microbial network inference approach based on mutual information and Markov random fields. BIOINFORMATICS ADVANCES 2024; 4:vbae167. [PMID: 39526038 PMCID: PMC11549015 DOI: 10.1093/bioadv/vbae167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Revised: 10/19/2024] [Accepted: 10/25/2024] [Indexed: 11/16/2024]
Abstract
Motivation The human microbiome, comprises complex associations and communication networks among microbial communities, which are crucial for maintaining health. The construction of microbial networks is vital for elucidating these associations. However, existing microbial networks inference methods cannot solve the issues of zero-inflation and non-linear associations. Therefore, necessitating novel methods to improve the accuracy of microbial networks inference. Results In this study, we introduce the Microbial Network based on Mutual Information and Markov Random Fields (MicroNet-MIMRF) as a novel approach for inferring microbial networks. Abundance data of microbes are modeled through the zero-inflated Poisson distribution, and the discrete matrix is estimated for further calculation. Markov random fields based on mutual information are used to construct accurate microbial networks. MicroNet-MIMRF excels at estimating pairwise associations between microbes, effectively addressing zero-inflation and non-linear associations in microbial abundance data. It outperforms commonly used techniques in simulation experiments, achieving area under the curve values exceeding 0.75 for all parameters. A case study on inflammatory bowel disease data further demonstrates the method's ability to identify insightful associations. Conclusively, MicroNet-MIMRF is a powerful tool for microbial network inference that handles the biases caused by zero-inflation and overestimation of associations. Availability and implementation The MicroNet-MIMRF is provided at https://github.com/Fionabiostats/MicroNet-MIMRF.
Collapse
Affiliation(s)
- Chenqionglu Feng
- Department of Epidemiology and Health Statistics, School of Public Health, China Medical University, Shenyang 110122, China
- Department of Infectious Disease Prevention and Control, Chinese PLA Center for Disease Control and Prevention, Beijing 100071, China
| | - Huiqun Jia
- Department of Infectious Disease Prevention and Control, Chinese PLA Center for Disease Control and Prevention, Beijing 100071, China
| | - Hui Wang
- Department of Infectious Disease Prevention and Control, Chinese PLA Center for Disease Control and Prevention, Beijing 100071, China
| | - Jiaojiao Wang
- The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation Chinese Academy of Sciences, Beijing 100190, China
| | - Mengxuan Lin
- The Academy of Military Medical Sciences, Academy of Military Science of Chinese People’s Liberation Army, Beijing 100071, China
| | - Xiaoyan Hu
- Department of Infectious Disease Prevention and Control, Chinese PLA Center for Disease Control and Prevention, Beijing 100071, China
| | - Chenjing Yu
- Department of Infectious Disease Prevention and Control, Chinese PLA Center for Disease Control and Prevention, Beijing 100071, China
| | - Hongbin Song
- Department of Infectious Disease Prevention and Control, Chinese PLA Center for Disease Control and Prevention, Beijing 100071, China
| | - Ligui Wang
- Department of Infectious Disease Prevention and Control, Chinese PLA Center for Disease Control and Prevention, Beijing 100071, China
| |
Collapse
|
2
|
Xiao G, Guan R, Cao Y, Huang Z, Xu Y. KISL: knowledge-injected semi-supervised learning for biological co-expression network modules. Front Genet 2023; 14:1151962. [PMID: 37205122 PMCID: PMC10185879 DOI: 10.3389/fgene.2023.1151962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 04/11/2023] [Indexed: 05/21/2023] Open
Abstract
The exploration of important biomarkers associated with cancer development is crucial for diagnosing cancer, designing therapeutic interventions, and predicting prognoses. The analysis of gene co-expression provides a systemic perspective on gene networks and can be a valuable tool for mining biomarkers. The main objective of co-expression network analysis is to discover highly synergistic sets of genes, and the most widely used method is weighted gene co-expression network analysis (WGCNA). With the Pearson correlation coefficient, WGCNA measures gene correlation, and uses hierarchical clustering to identify gene modules. The Pearson correlation coefficient reflects only the linear dependence between variables, and the main drawback of hierarchical clustering is that once two objects are clustered together, the process cannot be reversed. Hence, readjusting inappropriate cluster divisions is not possible. Existing co-expression network analysis methods rely on unsupervised methods that do not utilize prior biological knowledge for module delineation. Here we present a method for identification of outstanding modules in a co-expression network using a knowledge-injected semi-supervised learning approach (KISL), which utilizes apriori biological knowledge and a semi-supervised clustering method to address the issue existing in the current GCN-based clustering methods. To measure the linear and non-linear dependence between genes, we introduce a distance correlation due to the complexity of the gene-gene relationship. Eight RNA-seq datasets of cancer samples are used to validate its effectiveness. In all eight datasets, the KISL algorithm outperformed WGCNA when comparing the silhouette coefficient, Calinski-Harabasz index and Davies-Bouldin index evaluation metrics. According to the results, KISL clusters had better cluster evaluation values and better gene module aggregation. Enrichment analysis of the recognition modules demonstrated their effectiveness in discovering modular structures in biological co-expression networks. In addition, as a general method, KISL can be applied to various co-expression network analyses based on similarity metrics. Source codes for the KISL and the related scripts are available online at https://github.com/Mowonhoo/KISL.git.
Collapse
Affiliation(s)
- Gangyi Xiao
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Renchu Guan
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Yangkun Cao
- School of Artificial Intelligence Jilin University, Changchun, China
| | - Zhenyu Huang
- College of Computer Science and Technology, Jilin University, Changchun, China
- *Correspondence: Ying Xu, ; Zhenyu Huang,
| | - Ying Xu
- School of Medicine, Southern University of Science and Technology, Shenzhen, Guangdong, China
- *Correspondence: Ying Xu, ; Zhenyu Huang,
| |
Collapse
|
3
|
Identification of Key Modules and Genes Associated with Major Depressive Disorder in Adolescents. Genes (Basel) 2022; 13:genes13030464. [PMID: 35328018 PMCID: PMC8949287 DOI: 10.3390/genes13030464] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Revised: 02/26/2022] [Accepted: 03/02/2022] [Indexed: 12/25/2022] Open
Abstract
Major depressive disorder (MDD) is a leading cause of disability worldwide. Adolescence is a crucial period for the occurrence and development of depression. There are essential distinctions between adolescent and adult depression patients, and the etiology of depressive disorder is unclear. The interactions of multiple genes in a co-expression network are likely to be involved in the physiopathology of MDD. In the present study, RNA-Seq data of mRNA were acquired from the peripheral blood of MDD in adolescents and healthy control (HC) subjects. Co-expression modules were constructed via weighted gene co-expression network analysis (WGCNA) to investigate the relationships between the underlying modules and MDD in adolescents. In the combined MDD and HC groups, the dynamic tree cutting method was utilized to assign genes to modules through hierarchical clustering. Moreover, functional enrichment analysis was conducted on those co-expression genes from interested modules. The results showed that eight modules were constructed by WGCNA. The blue module was significantly associated with MDD after multiple comparison adjustment. Several Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways associated with stress and inflammation were identified in this module, including histone methylation, apoptosis, NF-kappa β signaling pathway, and TNF signaling pathway. Five genes related to inflammation, immunity, and the nervous system were identified as hub genes: CNTNAP3, IL1RAP, MEGF9, UBE2W, and UBE2D1. All of these findings supported that MDD was associated with stress, inflammation, and immune responses, helping us to obtain a better understanding of the internal molecular mechanism and to explore biomarkers for the diagnosis or treatment of depression in adolescents.
Collapse
|
4
|
Hou J, Ye X, Feng W, Zhang Q, Han Y, Liu Y, Li Y, Wei Y. Distance correlation application to gene co-expression network analysis. BMC Bioinformatics 2022; 23:81. [PMID: 35193539 PMCID: PMC8862277 DOI: 10.1186/s12859-022-04609-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Accepted: 02/10/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND To construct gene co-expression networks, it is necessary to evaluate the correlation between different gene expression profiles. However, commonly used correlation metrics, including both linear (such as Pearson's correlation) and monotonic (such as Spearman's correlation) dependence metrics, are not enough to observe the nature of real biological systems. Hence, introducing a more informative correlation metric when constructing gene co-expression networks is still an interesting topic. RESULTS In this paper, we test distance correlation, a correlation metric integrating both linear and non-linear dependence, with other three typical metrics (Pearson's correlation, Spearman's correlation, and maximal information coefficient) on four different arrays (macrophage and liver) and RNA-seq (cervical cancer and pancreatic cancer) datasets. Among all the metrics, distance correlation is distribution free and can provide better performance on complex relationships and anti-outlier. Furthermore, distance correlation is applied to Weighted Gene Co-expression Network Analysis (WGCNA) for constructing a gene co-expression network analysis method which we named Distance Correlation-based Weighted Gene Co-expression Network Analysis (DC-WGCNA). Compared with traditional WGCNA, DC-WGCNA can enhance the result of enrichment analysis and improve the module stability. CONCLUSIONS Distance correlation is better at revealing complex biological relationships between gene profiles compared with other correlation metrics, which contribute to more meaningful modules when analyzing gene co-expression networks. However, due to the high time complexity of distance correlation, the implementation requires more computer memory.
Collapse
Affiliation(s)
- Jie Hou
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Nantong Street, Harbin, China.,College of Science, Heilongjiang Bayi Agricultural University, Xinfeng Road, Daqing, China
| | - Xiufen Ye
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Nantong Street, Harbin, China.
| | - Weixing Feng
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Nantong Street, Harbin, China
| | - Qiaosheng Zhang
- School of Computer Engineering, Jiangsu Ocean University, Cangwu Road, Lianyungang, China
| | - Yatong Han
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Nantong Street, Harbin, China
| | - Yusong Liu
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Nantong Street, Harbin, China
| | - Yu Li
- College of Science, Northeast Forestry University, Hexing Road, Harbin, China
| | - Yufen Wei
- College of Science, Heilongjiang Bayi Agricultural University, Xinfeng Road, Daqing, China
| |
Collapse
|
5
|
Cao D, Xu N, Chen Y, Zhang H, Li Y, Yuan Z. Construction of a Pearson- and MIC-Based Co-expression Network to Identify Potential Cancer Genes. Interdiscip Sci 2021; 14:245-257. [PMID: 34694561 DOI: 10.1007/s12539-021-00485-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 09/29/2021] [Accepted: 09/30/2021] [Indexed: 11/26/2022]
Abstract
The weighted gene co-expression network analysis (WGCNA) method constructs co-expressed gene modules based on the linear similarity between paired gene expressions. Linear correlations are the main form of similarity between genes, however, nonlinear correlations still existed and had always been ignored. We proposed a modified network analysis method, WGCNA-P + M, which combines Pearson's correlation coefficient and the maximum information coefficient (MIC) as the similarity measures to assess the linear and nonlinear correlations between genes, respectively. Taking two real datasets, GSE44861 and liver hepatocellular carcinoma (TCGA-LIHC), as examples, we compared the gene modules constructed by WGCNA-P + M and WGCNA from four perspectives: the "Usefulness" score, GO enrichment analysis on genes in the gray module, prediction performance of the top hub gene, survival analysis and literature reports on different hub genes. The results showed that the modules obtained by WGCNA-P + M are more biological meaningful, the hub genes obtained from WGCNA-P + M have more potential cancer genes.
Collapse
Affiliation(s)
- Dan Cao
- Hunan Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, Hunan, China
- College of Science, Central South University of Forestry and Technology, Changsha, 410004, Hunan, China
| | - Na Xu
- Hunan Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, Hunan, China
| | - Yuan Chen
- Hunan Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, Hunan, China
| | - Hongyan Zhang
- Hunan Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, Hunan, China
| | - Yuting Li
- Hunan Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, Hunan, China
| | - Zheming Yuan
- Hunan Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, Hunan, China.
| |
Collapse
|
6
|
Hou J, Ye X, Li C, Wang Y. K-Module Algorithm: An Additional Step to Improve the Clustering Results of WGCNA Co-Expression Networks. Genes (Basel) 2021; 12:genes12010087. [PMID: 33445666 PMCID: PMC7828115 DOI: 10.3390/genes12010087] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Revised: 12/23/2020] [Accepted: 01/08/2021] [Indexed: 12/14/2022] Open
Abstract
Among biological networks, co-expression networks have been widely studied. One of the most commonly used pipelines for the construction of co-expression networks is weighted gene co-expression network analysis (WGCNA), which can identify highly co-expressed clusters of genes (modules). WGCNA identifies gene modules using hierarchical clustering. The major drawback of hierarchical clustering is that once two objects are clustered together, it cannot be reversed; thus, re-adjustment of the unbefitting decision is impossible. In this paper, we calculate the similarity matrix with the distance correlation for WGCNA to construct a gene co-expression network, and present a new approach called the k-module algorithm to improve the WGCNA clustering results. This method can assign all genes to the module with the highest mean connectivity with these genes. This algorithm re-adjusts the results of hierarchical clustering while retaining the advantages of the dynamic tree cut method. The validity of the algorithm is verified using six datasets from microarray and RNA-seq data. The k-module algorithm has fewer iterations, which leads to lower complexity. We verify that the gene modules obtained by the k-module algorithm have high enrichment scores and strong stability. Our method improves upon hierarchical clustering, and can be applied to general clustering algorithms based on the similarity matrix, not limited to gene co-expression network analysis.
Collapse
|
7
|
Abstract
Cardiovascular diseases are the leading cause of death worldwide. Complex diseases with highly heterogenous disease progression among patient populations, cardiovascular diseases feature multifactorial contributions from both genetic and environmental stressors. Despite significant effort utilizing multiple approaches from molecular biology to genome-wide association studies, the genetic landscape of cardiovascular diseases, particularly for the nonfamilial forms of heart failure, is still poorly understood. In the past decade, systems-level approaches based on omics technologies have become an important approach for the study of complex traits in large populations. These advances create opportunities to integrate genetic variation with other biological layers to identify and prioritize candidate genes, understand pathogenic pathways, and elucidate gene-gene and gene-environment interactions. In this review, we will highlight some of the recent progress made using systems genetics approaches to uncover novel mechanisms and molecular bases of cardiovascular pathophysiological manifestations. The key technology and data analysis platforms necessary to implement systems genetics will be described, and the current major challenges and future directions will also be discussed. For complex cardiovascular diseases, such as heart failure, systems genetics represents a powerful strategy to obtain mechanistic insights and to develop individualized diagnostic and therapeutic regiments, paving the way for precision cardiovascular medicine.
Collapse
Affiliation(s)
- Christoph D. Rau
- Departments of Anesthesiology, Medicine, Physiology
- Current address: Department of Genetics, University of North Carolina School of Medicine, Chapel Hill, NC 27599
| | - Aldons J. Lusis
- Department of Human Genetics and Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA 90095
| | - Yibin Wang
- Departments of Anesthesiology, Medicine, Physiology
| |
Collapse
|
8
|
Wang Q, Zhang H, Liang Y, Jiang H, Tan S, Luo F, Yuan Z, Chen Y. A Novel Method to Efficiently Highlight Nonlinearly Expressed Genes. Front Genet 2020; 10:1410. [PMID: 32082366 PMCID: PMC7006292 DOI: 10.3389/fgene.2019.01410] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Accepted: 12/27/2019] [Indexed: 12/15/2022] Open
Abstract
For precision medicine, there is a need to identify genes that accurately distinguish the physiological state or response to a particular therapy, but this can be challenging. Many methods of analyzing differential expression have been established and applied to this problem, such as t-test, edgeR, and DEseq2. A common feature of these methods is their focus on a linear relationship (differential expression) between gene expression and phenotype. However, they may overlook nonlinear relationships due to various factors, such as the degree of disease progression, sex, age, ethnicity, and environmental factors. Maximal information coefficient (MIC) was proposed to capture a wide range of associations of two variables in both linear and nonlinear relationships. However, with MIC it is difficult to highlight genes with nonlinear expression patterns as the genes giving the most strongly supported hits are linearly expressed, especially for noisy data. It is thus important to also efficiently identify nonlinearly expressed genes in order to unravel the molecular basis of disease and to reveal new therapeutic targets. We propose a novel nonlinearity measure called normalized differential correlation (NDC) to efficiently highlight nonlinearly expressed genes in transcriptome datasets. Validation using six real-world cancer datasets revealed that the NDC method could highlight nonlinearly expressed genes that could not be highlighted by t-test, MIC, edgeR, and DEseq2, although MIC could capture nonlinear correlations. The classification accuracy indicated that analysis of these genes could adequately distinguish cancer and paracarcinoma tissue samples. Furthermore, the results of biological interpretation of the identified genes suggested that some of them were involved in key functional pathways associated with cancer progression and metastasis. All of this evidence suggests that these nonlinearly expressed genes may play a central role in regulating cancer progression.
Collapse
Affiliation(s)
- Qifei Wang
- Hunan Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha, China
| | - Haojian Zhang
- Hunan Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha, China
| | - Yuqing Liang
- Hunan Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha, China
| | - Heling Jiang
- Hunan Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha, China
| | - Siqiao Tan
- School of Information Science and Technology, Hunan Agricultural University, Changsha, China
| | - Feng Luo
- School of Computing, Clemson University, Clemson, SC, United States
| | - Zheming Yuan
- Hunan Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha, China
| | - Yuan Chen
- Hunan Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha, China
| |
Collapse
|
9
|
Transcription Factor T-bet in B Cells Modulates Germinal Center Polarization and Antibody Affinity Maturation in Response to Malaria. Cell Rep 2019; 29:2257-2269.e6. [DOI: 10.1016/j.celrep.2019.10.087] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Revised: 06/06/2019] [Accepted: 10/22/2019] [Indexed: 12/14/2022] Open
|
10
|
Lau E, Paik DT, Wu JC. Systems-Wide Approaches in Induced Pluripotent Stem Cell Models. ANNUAL REVIEW OF PATHOLOGY-MECHANISMS OF DISEASE 2018; 14:395-419. [PMID: 30379619 DOI: 10.1146/annurev-pathmechdis-012418-013046] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Human induced pluripotent stem cells (iPSCs) provide a renewable supply of patient-specific and tissue-specific cells for cellular and molecular studies of disease mechanisms. Combined with advances in various omics technologies, iPSC models can be used to profile the expression of genes, transcripts, proteins, and metabolites in relevant tissues. In the past 2 years, large panels of iPSC lines have been derived from hundreds of genetically heterogeneous individuals, further enabling genome-wide mapping to identify coexpression networks and elucidate gene regulatory networks. Here, we review recent developments in omics profiling of various molecular phenotypes and the emergence of human iPSCs as a systems biology model of human diseases.
Collapse
Affiliation(s)
- Edward Lau
- Stanford Cardiovascular Institute, and Department of Medicine, Division of Cardiology, Stanford University, Stanford, California 94305, USA;
| | - David T Paik
- Stanford Cardiovascular Institute, and Department of Medicine, Division of Cardiology, Stanford University, Stanford, California 94305, USA;
| | - Joseph C Wu
- Stanford Cardiovascular Institute, and Department of Medicine, Division of Cardiology, Stanford University, Stanford, California 94305, USA; .,Department of Radiology, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
11
|
Wang L, Ahsan MA, Chen M. A Generalized Approach for Measuring Relationships Among Genes. J Integr Bioinform 2017; 14:/j/jib.ahead-of-print/jib-2017-0026/jib-2017-0026.xml. [PMID: 28731858 PMCID: PMC6042818 DOI: 10.1515/jib-2017-0026] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 05/04/2017] [Indexed: 11/15/2022] Open
Abstract
Several methods for identifying relationships among pairs of genes have been developed. In this article, we present a generalized approach for measuring relationships between any pairs of genes, which is based on statistical prediction. We derive two particular versions of the generalized approach, least squares estimation (LSE) and nearest neighbors prediction (NNP). According to mathematical proof, LSE is equivalent to the methods based on correlation; and NNP is approximate to one popular method called the maximal information coefficient (MIC) according to the performances in simulations and real dataset. Moreover, the approach based on statistical prediction can be extended from two-genes relationships to multi-genes relationships. This application would help to identify relationships among multi-genes.
Collapse
|
12
|
Rau CD, Romay MC, Tuteryan M, Wang JJC, Santolini M, Ren S, Karma A, Weiss JN, Wang Y, Lusis AJ. Systems Genetics Approach Identifies Gene Pathways and Adamts2 as Drivers of Isoproterenol-Induced Cardiac Hypertrophy and Cardiomyopathy in Mice. Cell Syst 2017; 4:121-128.e4. [PMID: 27866946 PMCID: PMC5338604 DOI: 10.1016/j.cels.2016.10.016] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Revised: 09/09/2016] [Accepted: 10/19/2016] [Indexed: 10/20/2022]
Abstract
We previously reported a genetic analysis of heart failure traits in a population of inbred mouse strains treated with isoproterenol to mimic catecholamine-driven cardiac hypertrophy. Here, we apply a co-expression network algorithm, wMICA, to perform a systems-level analysis of left ventricular transcriptomes from these mice. We describe the features of the overall network but focus on a module identified in treated hearts that is strongly related to cardiac hypertrophy and pathological remodeling. Using the causal modeling algorithm NEO, we identified the gene Adamts2 as a putative regulator of this module and validated the predictive value of NEO using small interfering RNA-mediated knockdown in neonatal rat ventricular myocytes. Adamts2 silencing regulated the expression of the genes residing within the module and impaired isoproterenol-induced cellular hypertrophy. Our results provide a view of higher order interactions in heart failure with potential for diagnostic and therapeutic insights.
Collapse
Affiliation(s)
- Christoph D Rau
- Department of Microbiology, Immunology and Molecular Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Departments of Anesthesiology, Physiology, and Medicine, Cardiovascular Research Laboratories, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Milagros C Romay
- Department of Microbiology, Immunology and Molecular Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Mary Tuteryan
- Department of Microbiology, Immunology and Molecular Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Jessica J-C Wang
- Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Marc Santolini
- Center for Interdisciplinary Research on Complex Systems, Department of Physics, Northeastern University, Boston, MA 02115, USA
| | - Shuxun Ren
- Departments of Anesthesiology, Physiology, and Medicine, Cardiovascular Research Laboratories, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Alain Karma
- Center for Interdisciplinary Research on Complex Systems, Department of Physics, Northeastern University, Boston, MA 02115, USA
| | - James N Weiss
- Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Yibin Wang
- Departments of Anesthesiology, Physiology, and Medicine, Cardiovascular Research Laboratories, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Aldons J Lusis
- Department of Microbiology, Immunology and Molecular Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
13
|
Monte E, Rosa-Garrido M, Vondriska TM, Wang J. Undiscovered Physiology of Transcript and Protein Networks. Compr Physiol 2016; 6:1851-1872. [PMID: 27783861 PMCID: PMC10751805 DOI: 10.1002/cphy.c160003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The past two decades have witnessed a rapid evolution in our ability to measure RNA and protein from biological systems. As a result, new principles have arisen regarding how information is processed in cells, how decisions are made, and the role of networks in biology. This essay examines this technological evolution, reviewing (and critiquing) the conceptual framework that has emerged to explain how RNA and protein networks control cellular function. We identify how future investigations into transcriptomes, proteomes, and other cellular networks will enable development of more robust, quantitative models of cellular behavior whilst also providing new avenues to use knowledge of biological networks to improve human health. © 2016 American Physiological Society. Compr Physiol 6:1851-1872, 2016.
Collapse
Affiliation(s)
- Emma Monte
- Department of Anesthesiology & Perioperative Medicine, David Geffen School of Medicine, University of California, Los Angeles, USA
| | - Manuel Rosa-Garrido
- Department of Anesthesiology & Perioperative Medicine, David Geffen School of Medicine, University of California, Los Angeles, USA
| | - Thomas M. Vondriska
- Department of Anesthesiology & Perioperative Medicine, David Geffen School of Medicine, University of California, Los Angeles, USA
- Department of Medicine/Cardiology, David Geffen School of Medicine, University of California, Los Angeles, USA
- Department of Physiology, David Geffen School of Medicine, University of California, Los Angeles, USA
| | - Jessica Wang
- Department of Medicine/Cardiology, David Geffen School of Medicine, University of California, Los Angeles, USA
| |
Collapse
|
14
|
Karbassi E, Monte E, Chapski DJ, Lopez R, Rosa Garrido M, Kim J, Wisniewski N, Rau CD, Wang JJ, Weiss JN, Wang Y, Lusis AJ, Vondriska TM. Relationship of disease-associated gene expression to cardiac phenotype is buffered by genetic diversity and chromatin regulation. Physiol Genomics 2016; 48:601-15. [PMID: 27287924 DOI: 10.1152/physiolgenomics.00035.2016] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Accepted: 06/04/2016] [Indexed: 12/11/2022] Open
Abstract
Expression of a cohort of disease-associated genes, some of which are active in fetal myocardium, is considered a hallmark of transcriptional change in cardiac hypertrophy models. How this transcriptome remodeling is affected by the common genetic variation present in populations is unknown. We examined the role of genetics, as well as contributions of chromatin proteins, to regulate cardiac gene expression and heart failure susceptibility. We examined gene expression in 84 genetically distinct inbred strains of control and isoproterenol-treated mice, which exhibited varying degrees of disease. Unexpectedly, fetal gene expression was not correlated with hypertrophic phenotypes. Unbiased modeling identified 74 predictors of heart mass after isoproterenol-induced stress, but these predictors did not enrich for any cardiac pathways. However, expanded analysis of fetal genes and chromatin remodelers as groups correlated significantly with individual systemic phenotypes. Yet, cardiac transcription factors and genes shown by gain-/loss-of-function studies to contribute to hypertrophic signaling did not correlate with cardiac mass or function in disease. Because the relationship between gene expression and phenotype was strain specific, we examined genetic contribution to expression. Strikingly, strains with similar transcriptomes in the basal heart did not cluster together in the isoproterenol state, providing comprehensive evidence that there are different genetic contributors to physiological and pathological gene expression. Furthermore, the divergence in transcriptome similarity versus genetic similarity between strains is organ specific and genome-wide, suggesting chromatin is a critical buffer between genetics and gene expression.
Collapse
Affiliation(s)
- Elaheh Karbassi
- Department of Anesthesiology, David Geffen School of Medicine at UCLA, Los Angeles, California
| | - Emma Monte
- Department of Anesthesiology, David Geffen School of Medicine at UCLA, Los Angeles, California
| | - Douglas J Chapski
- Department of Anesthesiology, David Geffen School of Medicine at UCLA, Los Angeles, California
| | - Rachel Lopez
- Department of Anesthesiology, David Geffen School of Medicine at UCLA, Los Angeles, California
| | - Manuel Rosa Garrido
- Department of Anesthesiology, David Geffen School of Medicine at UCLA, Los Angeles, California
| | - Joseph Kim
- Department of Anesthesiology, David Geffen School of Medicine at UCLA, Los Angeles, California
| | - Nicholas Wisniewski
- Department of Integrative Biology and Physiology, David Geffen School of Medicine at UCLA, Los Angeles, California
| | - Christoph D Rau
- Department of Anesthesiology, David Geffen School of Medicine at UCLA, Los Angeles, California
| | - Jessica J Wang
- Department of Medicine/Cardiology, David Geffen School of Medicine at UCLA, Los Angeles, California
| | - James N Weiss
- Department of Medicine/Cardiology, David Geffen School of Medicine at UCLA, Los Angeles, California; Department of Physiology, David Geffen School of Medicine at UCLA, Los Angeles, California
| | - Yibin Wang
- Department of Anesthesiology, David Geffen School of Medicine at UCLA, Los Angeles, California; Department of Medicine/Cardiology, David Geffen School of Medicine at UCLA, Los Angeles, California; Department of Physiology, David Geffen School of Medicine at UCLA, Los Angeles, California
| | - Aldons J Lusis
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, California; Department of Medicine/Cardiology, David Geffen School of Medicine at UCLA, Los Angeles, California; Department of Microbiology Immunology and Molecular Genetics, David Geffen School of Medicine at UCLA, Los Angeles, California; and
| | - Thomas M Vondriska
- Department of Anesthesiology, David Geffen School of Medicine at UCLA, Los Angeles, California; Department of Medicine/Cardiology, David Geffen School of Medicine at UCLA, Los Angeles, California; Department of Physiology, David Geffen School of Medicine at UCLA, Los Angeles, California
| |
Collapse
|
15
|
Lusis AJ, Seldin MM, Allayee H, Bennett BJ, Civelek M, Davis RC, Eskin E, Farber CR, Hui S, Mehrabian M, Norheim F, Pan C, Parks B, Rau CD, Smith DJ, Vallim T, Wang Y, Wang J. The Hybrid Mouse Diversity Panel: a resource for systems genetics analyses of metabolic and cardiovascular traits. J Lipid Res 2016; 57:925-42. [PMID: 27099397 PMCID: PMC4878195 DOI: 10.1194/jlr.r066944] [Citation(s) in RCA: 118] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Revised: 04/12/2016] [Indexed: 02/07/2023] Open
Abstract
The Hybrid Mouse Diversity Panel (HMDP) is a collection of approximately 100 well-characterized inbred strains of mice that can be used to analyze the genetic and environmental factors underlying complex traits. While not nearly as powerful for mapping genetic loci contributing to the traits as human genome-wide association studies, it has some important advantages. First, environmental factors can be controlled. Second, relevant tissues are accessible for global molecular phenotyping. Finally, because inbred strains are renewable, results from separate studies can be integrated. Thus far, the HMDP has been studied for traits relevant to obesity, diabetes, atherosclerosis, osteoporosis, heart failure, immune regulation, fatty liver disease, and host-gut microbiota interactions. High-throughput technologies have been used to examine the genomes, epigenomes, transcriptomes, proteomes, metabolomes, and microbiomes of the mice under various environmental conditions. All of the published data are available and can be readily used to formulate hypotheses about genes, pathways and interactions.
Collapse
Affiliation(s)
- Aldons J Lusis
- Departments of Medicine, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA Microbiology, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA Human Genetics, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA
| | - Marcus M Seldin
- Departments of Medicine, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA
| | - Hooman Allayee
- Department of Preventive Medicine, University of Southern California Keck School of Medicine, Los Angeles, CA
| | - Brian J Bennett
- Department of Genetics, University of North Carolina, Chapel Hill, NC
| | - Mete Civelek
- Departments of Biomedical Engineering University of Virginia, Charlottesville, VA
| | - Richard C Davis
- Departments of Medicine, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA
| | - Eleazar Eskin
- Departments of Computer Science, University of California-Los Angeles, Los Angeles, CA
| | - Charles R Farber
- Public Health Sciences, University of Virginia, Charlottesville, VA
| | - Simon Hui
- Departments of Medicine, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA
| | - Margarete Mehrabian
- Departments of Medicine, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA
| | - Frode Norheim
- Departments of Medicine, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA
| | - Calvin Pan
- Human Genetics, University of California-Los Angeles, Los Angeles, CA
| | - Brian Parks
- Department of Nutritional Sciences, University of Wisconsin-Madison, Madison, WI
| | - Christoph D Rau
- Anesthesiology, University of California-Los Angeles, Los Angeles, CA
| | - Desmond J Smith
- Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA
| | - Thomas Vallim
- Departments of Medicine, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA
| | - Yibin Wang
- Anesthesiology, University of California-Los Angeles, Los Angeles, CA
| | - Jessica Wang
- Departments of Medicine, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA
| |
Collapse
|
16
|
Riccadonna S, Jurman G, Visintainer R, Filosi M, Furlanello C. DTW-MIC Coexpression Networks from Time-Course Data. PLoS One 2016; 11:e0152648. [PMID: 27031641 PMCID: PMC4816347 DOI: 10.1371/journal.pone.0152648] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2014] [Accepted: 03/17/2016] [Indexed: 01/01/2023] Open
Abstract
When modeling coexpression networks from high-throughput time course data, Pearson Correlation Coefficient (PCC) is one of the most effective and popular similarity functions. However, its reliability is limited since it cannot capture non-linear interactions and time shifts. Here we propose to overcome these two issues by employing a novel similarity function, Dynamic Time Warping Maximal Information Coefficient (DTW-MIC), combining a measure taking care of functional interactions of signals (MIC) and a measure identifying time lag (DTW). By using the Hamming-Ipsen-Mikhailov (HIM) metric to quantify network differences, the effectiveness of the DTW-MIC approach is demonstrated on a set of four synthetic and one transcriptomic datasets, also in comparison to TimeDelay ARACNE and Transfer Entropy.
Collapse
Affiliation(s)
| | - Giuseppe Jurman
- Research and Innovation Centre, Fondazione Edmund Mach, San Michele all’Adige, Italy
| | - Roberto Visintainer
- Research and Innovation Centre, Fondazione Edmund Mach, San Michele all’Adige, Italy
| | - Michele Filosi
- Research and Innovation Centre, Fondazione Edmund Mach, San Michele all’Adige, Italy
| | - Cesare Furlanello
- Research and Innovation Centre, Fondazione Edmund Mach, San Michele all’Adige, Italy
| |
Collapse
|
17
|
Quantitative assessment of gene expression network module-validation methods. Sci Rep 2015; 5:15258. [PMID: 26470848 PMCID: PMC4607977 DOI: 10.1038/srep15258] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 09/21/2015] [Indexed: 02/01/2023] Open
Abstract
Validation of pluripotent modules in diverse networks holds enormous potential for systems biology and network pharmacology. An arising challenge is how to assess the accuracy of discovering all potential modules from multi-omic networks and validating their architectural characteristics based on innovative computational methods beyond function enrichment and biological validation. To display the framework progress in this domain, we systematically divided the existing Computational Validation Approaches based on Modular Architecture (CVAMA) into topology-based approaches (TBA) and statistics-based approaches (SBA). We compared the available module validation methods based on 11 gene expression datasets, and partially consistent results in the form of homogeneous models were obtained with each individual approach, whereas discrepant contradictory results were found between TBA and SBA. The TBA of the Zsummary value had a higher Validation Success Ratio (VSR) (51%) and a higher Fluctuation Ratio (FR) (80.92%), whereas the SBA of the approximately unbiased (AU) p-value had a lower VSR (12.3%) and a lower FR (45.84%). The Gray area simulated study revealed a consistent result for these two models and indicated a lower Variation Ratio (VR) (8.10%) of TBA at 6 simulated levels. Despite facing many novel challenges and evidence limitations, CVAMA may offer novel insights into modular networks.
Collapse
|
18
|
Goldstein P, Korol AB, Reiner-Benaim A. Two-stage genome-wide search for epistasis with implementation to Recombinant Inbred Lines (RIL) populations. PLoS One 2014; 9:e115680. [PMID: 25536193 PMCID: PMC4275240 DOI: 10.1371/journal.pone.0115680] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Accepted: 11/07/2014] [Indexed: 11/18/2022] Open
Abstract
OBJECTIVE AND METHODS This paper proposes an inegrative two-stage genome-wide search for pairwise epistasis on expression quantitative trait loci (eQTL). The traits are clustered into multi-trait complexes that account for correlations between them that may result from common epistasis effects. The search is done by first screening for epistatic regions and then using dense markers within the identified regions, resulting in substantial reduction in the number of tests for epistasis. The FDR is controlled using a hierarchical procedure that accounts for the search structure. Each combination of trait and marker-pair is tested using a model that accounts for both statistical and functional interpretations of epistasis and considers orthogonal effects, such that their contributions to heritability can be estimated individually. We examine the impact of using multi-trait complexes rather than single traits, and of using a hierarchical search for epistasis rather than skipping the initial screen for epistatic regions. We apply the proposed algorithm on Arabidopsis transcription data. PRINCIPAL FINDINGS Both epistasis detection power and heritability contributed by epistasis increased when using multi-trait complexes rather than single traits. Epistatic effects common to the eQTLs included in the complexes have higher chance of being identified by analysis of multi-trait complexes, particularly when epistatic effects on individual traits are small. Compared to direct testing for all potential epistatic effects, the hierarchical search was substantially more powerful in detecting epistasis, while controlling the FDR at the desired level. Association in functional roles within genomic regions was observed, supporting an initial screen for epistatic QTLs.
Collapse
Affiliation(s)
- Pavel Goldstein
- Department of Statistics, University of Haifa, Haifa, 3498838, Israel
| | - Abraham B. Korol
- Department of Evolutionary and Environmental Biology and Institute of Evolution, University of Haifa, Haifa, 3498838, Israel
| | - Anat Reiner-Benaim
- Department of Statistics, University of Haifa, Haifa, 3498838, Israel
- * E-mail:
| |
Collapse
|
19
|
Shaham G, Tuller T. Most associations between transcript features and gene expression are monotonic. MOLECULAR BIOSYSTEMS 2014; 10:1426-40. [PMID: 24675795 DOI: 10.1039/c3mb70617f] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Dozens of previous studies in the field have dealt with the relations between transcript features and their expression. Indeed, understanding the way gene expression is encoded in transcripts should not only contribute to disciplines, such as functional genomics and molecular evolution, but also to biotechnology and human health. Previous studies in the field mainly aimed at predicting protein levels of genes based on their transcript features. Most of the models employed in this context assume that the effect of each transcript feature on gene expression is monotonic. In the current study we aim to understand, for the first time, if indeed the relations between transcript features (i.e., the UTRs and ORF) and measurements related to the different stages of gene expression is monotonic. To this end, we analyze 5432 transcript features and perform gene expression measurements (mRNA levels, ribosomal densities, protein levels, etc.) of 4367 S. cerevisiae genes. We use the Maximal Information Coefficient (MIC) in order to identify potential relations that are not necessarily linear or monotonic. Our analyses demonstrate that the relation between most transcript features and the examined gene expression measurements is monotonic (only up to 1-5% of the variables, with significance levels of 0.001, are non-monotonic); in addition, in the cases of deviation from monotonicity the relation/deviation is very weak. These results should help in guiding the development of computational gene expression modeling and engineering, and improve the understanding of this process. Furthermore, the relatively simple relations between a transcript's nucleotide composition and its expression should contribute towards better understanding of transcript evolution at the molecular level.
Collapse
Affiliation(s)
- Gilad Shaham
- Department of Biomedical Engineering, the Engineering Faculty, Tel Aviv University, Israel.
| | | |
Collapse
|