1
|
Wu Y, Wu PH, Chambliss A, Wirtz D, Sun SX. Unifying fragmented perspectives with additive deep learning for high-dimensional models from partial faceted datasets. NPJ BIOLOGICAL PHYSICS AND MECHANICS 2025; 2:5. [PMID: 40012561 PMCID: PMC11850287 DOI: 10.1038/s44341-025-00009-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2024] [Accepted: 01/08/2025] [Indexed: 02/28/2025]
Abstract
Biological systems are complex networks where measurable functions emerge from interactions among thousands of components. Many studies aim to link biological function with molecular elements, yet quantifying their contributions simultaneously remains challenging, especially at the single-cell level. We propose a machine-learning approach that integrates faceted data subsets to reconstruct a complete view of the system using conditional distributions. We develop both polynomial regression and neural network models, validated with two examples: a mechanical spring network under external forces and an 8-dimensional biological network involving the senescence marker P53, using single-cell data. Our results demonstrate successful system reconstruction from partial datasets, with predictive accuracy improving as more variables are measured. This approach offers a systematic method to integrate fragmented experimental data, enabling unbiased and holistic modeling of complex biological functions.
Collapse
Affiliation(s)
- Yufei Wu
- Department of Mechanical Engineering, Johns Hopkins University, Baltimore, MD USA
- Institute for NanoBioTechnology, Johns Hopkins University, Baltimore, MD USA
| | - Pei-Hsun Wu
- Institute for NanoBioTechnology, Johns Hopkins University, Baltimore, MD USA
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD USA
| | - Allison Chambliss
- Department of Pathology & Laboratory Medicine, University of California Los Angeles, Los Angeles, CA USA
| | - Denis Wirtz
- Institute for NanoBioTechnology, Johns Hopkins University, Baltimore, MD USA
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD USA
| | - Sean X. Sun
- Department of Mechanical Engineering, Johns Hopkins University, Baltimore, MD USA
- Institute for NanoBioTechnology, Johns Hopkins University, Baltimore, MD USA
- Center for Cell Dynamics, Johns Hopkins School of Medicine, Baltimore, MD USA
| |
Collapse
|
2
|
Yuan Q, Duren Z. Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data. Nat Biotechnol 2025; 43:247-257. [PMID: 38609714 PMCID: PMC11825371 DOI: 10.1038/s41587-024-02182-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 02/26/2024] [Indexed: 04/14/2024]
Abstract
Existing methods for gene regulatory network (GRN) inference rely on gene expression data alone or on lower resolution bulk data. Despite the recent integration of chromatin accessibility and RNA sequencing data, learning complex mechanisms from limited independent data points still presents a daunting challenge. Here we present LINGER (Lifelong neural network for gene regulation), a machine-learning method to infer GRNs from single-cell paired gene expression and chromatin accessibility data. LINGER incorporates atlas-scale external bulk data across diverse cellular contexts and prior knowledge of transcription factor motifs as a manifold regularization. LINGER achieves a fourfold to sevenfold relative increase in accuracy over existing methods and reveals a complex regulatory landscape of genome-wide association studies, enabling enhanced interpretation of disease-associated variants and genes. Following the GRN inference from reference single-cell multiome data, LINGER enables the estimation of transcription factor activity solely from bulk or single-cell gene expression data, leveraging the abundance of available gene expression data to identify driver regulators from case-control studies.
Collapse
Affiliation(s)
- Qiuyue Yuan
- Center for Human Genetics, Department of Genetics and Biochemistry, Clemson University, Greenwood, SC, USA
| | - Zhana Duren
- Center for Human Genetics, Department of Genetics and Biochemistry, Clemson University, Greenwood, SC, USA.
| |
Collapse
|
3
|
Chatrabgoun O, Daneshkhah A, Torkaman P, Johnston M, Sohrabi Safa N, Kashif Bashir A. Covariate-adjusted construction of gene regulatory networks using a combination of generalized linear model and penalized maximum likelihood. PLoS One 2025; 20:e0309556. [PMID: 39879184 PMCID: PMC11778759 DOI: 10.1371/journal.pone.0309556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Accepted: 08/03/2024] [Indexed: 01/31/2025] Open
Abstract
Many machine learning techniques have been used to construct gene regulatory networks (GRNs) through precision matrix that considers conditional independence among genes, and finally produces sparse version of GRNs. This construction can be improved using the auxiliary information like gene expression profile of the related species or gene markers. To reach out this goal, we apply a generalized linear model (GLM) in first step and later a penalized maximum likelihood to construct the gene regulatory network using Glasso technique for the residuals of a multi-level multivariate GLM among the gene expressions of one species as a multi-levels response variable and the gene expression of related species as a multivariate covariates. By considering the intrinsic property of the gene data which the number of variables is much greater than the number of available samples, a bootstrap version of multi-response multivariate GLM is used. To find most appropriate related species, a cross-validation technique has been used to compute the minimum square error of the fitted GLM under different regularization. The penalized maximum likelihood under a lasso or elastic net penalty is applied on the residual of fitted GLM to find the sparse precision matrix. Finally, we show that the presented algorithm which is a combination of fitted GLM and applying the penalized maximum likelihood on the residual of the model is extremely fast, and can exploit sparsity in the constructed GRNs. Also, we exhibit flexibility of the proposed method presented in this paper by comparing with the other methods to demonstrate the super validity of our approach.
Collapse
Affiliation(s)
- Omid Chatrabgoun
- School of Computing, Electronics and Mathematics, Coventry University, Coventry, United Kingdom
- Department of Statistics, Faculty of Mathematical Sciences and Statistics, Malayer University, Malayer, Iran
| | - Alireza Daneshkhah
- Faculty of Mathematics and Data Science, Emirates Aviation University, Dubai, UAE
| | - Parisa Torkaman
- Department of Statistics, Faculty of Mathematical Sciences and Statistics, Malayer University, Malayer, Iran
| | - Mark Johnston
- School of Computing, Electronics and Mathematics, Coventry University, Coventry, United Kingdom
| | - Nader Sohrabi Safa
- Department of Computing and Mathematics, Manchester Metropolitan University, Manchester, United Kingdom
| | - Ali Kashif Bashir
- Department of Computing and Mathematics, Manchester Metropolitan University, Manchester, United Kingdom
| |
Collapse
|
4
|
Kuismin M, Sillanpää MJ. Network hub gene detection using the entire solution path information. Genetics 2025; 229:1-33. [PMID: 39535861 PMCID: PMC11708912 DOI: 10.1093/genetics/iyae187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Accepted: 10/27/2024] [Indexed: 11/16/2024] Open
Abstract
Gene co-expression networks typically comprise modules and their associated hub genes, which are regulating numerous downstream interactions within the network. Methods for hub screening, as well as data-driven estimation of hub co-expression networks using graphical models, can serve as useful tools for identifying these hubs. Graphical model-based penalization methods typically have one or multiple regularization terms, each of which encourages some favorable characteristics (e.g. sparsity, hubs, and power-law) to the estimated complex gene network. It is common practice to find a single optimal graphical model corresponding to a specific value of the regularization parameter(s). However, instead of doing this, one could aggregate information across several graphical models, all of which depend on the same data set, along the solution path in the hub gene detection process. We propose a novel method for detecting hub genes that utilizes the information available in the solution path. Our procedure is related to stability selection, but we replace resampling with a simple statistic. This procedure amalgamates information from each node of the data-driven graphical models into a single influence statistic, similar to Cook's distance. We call this statistic the Mean Degree Squared Distance (MDSD). Our simulation and empirical studies demonstrate that the MDSD statistic maintains a good balance between false positive and true positive hubs. An R package MDSD is publicly available on GitHub under the General Public License https://github.com/markkukuismin/MDSD.
Collapse
Affiliation(s)
- Markku Kuismin
- Research Unit of Mathematical Sciences, University of Oulu, P.O. BOX 8000, Oulu FI-90014, Finland
| | - Mikko J Sillanpää
- Research Unit of Mathematical Sciences, University of Oulu, P.O. BOX 8000, Oulu FI-90014, Finland
| |
Collapse
|
5
|
Li R, Xu S, Li Y, Tang Z, Feng D, Cai J, Ma S. Incorporating prior information in gene expression network-based cancer heterogeneity analysis. Biostatistics 2024; 26:kxae028. [PMID: 39074174 DOI: 10.1093/biostatistics/kxae028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 07/04/2024] [Accepted: 07/08/2024] [Indexed: 07/31/2024] Open
Abstract
Cancer is molecularly heterogeneous, with seemingly similar patients having different molecular landscapes and accordingly different clinical behaviors. In recent studies, gene expression networks have been shown as more effective/informative for cancer heterogeneity analysis than some simpler measures. Gene interconnections can be classified as "direct" and "indirect," where the latter can be caused by shared genomic regulators (such as transcription factors, microRNAs, and other regulatory molecules) and other mechanisms. It has been suggested that incorporating the regulators of gene expressions in network analysis and focusing on the direct interconnections can lead to a deeper understanding of the more essential gene interconnections. Such analysis can be seriously challenged by the large number of parameters (jointly caused by network analysis, incorporation of regulators, and heterogeneity) and often weak signals. To effectively tackle this problem, we propose incorporating prior information contained in the published literature. A key challenge is that such prior information can be partial or even wrong. We develop a two-step procedure that can flexibly accommodate different levels of prior information quality. Simulation demonstrates the effectiveness of the proposed approach and its superiority over relevant competitors. In the analysis of a breast cancer dataset, findings different from the alternatives are made, and the identified sample subgroups have important clinical differences.
Collapse
Affiliation(s)
- Rong Li
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, 06511, CT, United States
| | - Shaodong Xu
- Center for Applied Statistics and School of Statistics, Renmin University of China, 59 Zhongguancun Street, 100872, Beijing, China
| | - Yang Li
- Center for Applied Statistics and School of Statistics, Renmin University of China, 59 Zhongguancun Street, 100872, Beijing, China
| | - Zuojian Tang
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharmaceuticals Inc., 900 Ridgebury Road, Ridgefield, 06877, CT, United States
| | - Di Feng
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharmaceuticals Inc., 900 Ridgebury Road, Ridgefield, 06877, CT, United States
| | - James Cai
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharmaceuticals Inc., 900 Ridgebury Road, Ridgefield, 06877, CT, United States
| | - Shuangge Ma
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, 06511, CT, United States
| |
Collapse
|
6
|
Lei L, Deng X, Liu F, Gao H, Duan Y, Li J, Fu S, Li H, Zhou Y, Liao R, Liu H, Zhou C. Exploitation of Key Regulatory Modules and Genes for High-Salt Adaptation in Schizothoracine by Weighted Gene Co-Expression Network Analysis. Animals (Basel) 2024; 15:56. [PMID: 39794999 PMCID: PMC11718949 DOI: 10.3390/ani15010056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2024] [Revised: 12/23/2024] [Accepted: 12/27/2024] [Indexed: 01/13/2025] Open
Abstract
Schizothoracine fishes in saltwater lakes of the Tibetan Plateau are important models for studying the evolution and uplift of the Tibetan Plateau. Examining their adaptation to the high-salt environment is interesting. In this study, we first assembled the RNA-Seq data of each tissue of G. przewalskii, G. selincuoensis, and G. namensis from Qinghai Lake, Selincuo Lake, and Namtso Lake, respectively, obtained by the group previously. After obtaining reliable results, the adaptation of the gills, kidneys, and livers of the three species to the high-salinity environment was assessed by weighted gene co-expression network analysis (WGCNA). Using module eigengenes (ME), 21, 22, and 22 gene modules were identified for G. przewalskii, G. selincuoensis, and G. nemesis, respectively. Functional clustering analysis of genes in the significant association module identified several genes associated with osmolarity-regulated potential KEGG pathways in the gills of three species of Schizothoracine fish. Th17 cell differentiation pathway was up-regulated in the gills of all three species; histocompatibility class 2 II antigen and E alpha (h2-ea) were up-regulated genes in this pathway. Functional clustering analysis of genes in apparently related modules in the kidney unveiled several differential KEGG pathways. The pentose phosphate pathway was up-regulated in the three Schizothoracine fishes, and glucose-6-phosphate dehydrogenase (g6pd) was an up-regulated gene in this pathway. In the livers of the three Schizothorax species, the propanoate metabolism pathway was up-regulated, and succinate-CoA ligase GDP-forming subunit beta (suclg2) was an up-regulated gene in this pathway. The above analyses provide reference data for the adaptation of Schizothorax to high-salt environments and lay the foundation for future studies on the adaptive mechanism of Schizothorax in the plateau. These results partly fill the void in the knowledge gap in the survival adaptations of Schizothoracine fishes to highland saline lakes.
Collapse
Affiliation(s)
- Luo Lei
- College of Fisheries, Southwest University, Chongqing 402460, China; (L.L.); (X.D.); (Y.D.); (J.L.); (S.F.); (H.L.); (Y.Z.); (R.L.)
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Southwest University, Chongqing 400715, China;
| | - Xingxing Deng
- College of Fisheries, Southwest University, Chongqing 402460, China; (L.L.); (X.D.); (Y.D.); (J.L.); (S.F.); (H.L.); (Y.Z.); (R.L.)
- Livestock and Aquatic Products Affairs Center of Lengshuitan District, Yongzhou 425000, China
| | - Fei Liu
- Institute of Aquatic Sciences, Tibet Autonomous Region Academy of Agricultural and Animal Husbandry Sciences, Lhasa 851418, China;
| | - He Gao
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Southwest University, Chongqing 400715, China;
| | - Yuting Duan
- College of Fisheries, Southwest University, Chongqing 402460, China; (L.L.); (X.D.); (Y.D.); (J.L.); (S.F.); (H.L.); (Y.Z.); (R.L.)
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Southwest University, Chongqing 400715, China;
| | - Junting Li
- College of Fisheries, Southwest University, Chongqing 402460, China; (L.L.); (X.D.); (Y.D.); (J.L.); (S.F.); (H.L.); (Y.Z.); (R.L.)
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Southwest University, Chongqing 400715, China;
| | - Suxing Fu
- College of Fisheries, Southwest University, Chongqing 402460, China; (L.L.); (X.D.); (Y.D.); (J.L.); (S.F.); (H.L.); (Y.Z.); (R.L.)
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Southwest University, Chongqing 400715, China;
| | - Hejiao Li
- College of Fisheries, Southwest University, Chongqing 402460, China; (L.L.); (X.D.); (Y.D.); (J.L.); (S.F.); (H.L.); (Y.Z.); (R.L.)
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Southwest University, Chongqing 400715, China;
| | - Yinhua Zhou
- College of Fisheries, Southwest University, Chongqing 402460, China; (L.L.); (X.D.); (Y.D.); (J.L.); (S.F.); (H.L.); (Y.Z.); (R.L.)
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Southwest University, Chongqing 400715, China;
| | - Rongrong Liao
- College of Fisheries, Southwest University, Chongqing 402460, China; (L.L.); (X.D.); (Y.D.); (J.L.); (S.F.); (H.L.); (Y.Z.); (R.L.)
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Southwest University, Chongqing 400715, China;
| | - Haiping Liu
- College of Fisheries, Southwest University, Chongqing 402460, China; (L.L.); (X.D.); (Y.D.); (J.L.); (S.F.); (H.L.); (Y.Z.); (R.L.)
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Southwest University, Chongqing 400715, China;
| | - Chaowei Zhou
- College of Fisheries, Southwest University, Chongqing 402460, China; (L.L.); (X.D.); (Y.D.); (J.L.); (S.F.); (H.L.); (Y.Z.); (R.L.)
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Southwest University, Chongqing 400715, China;
| |
Collapse
|
7
|
Yang Y, Lorincz-Comi N, Zhu X. Estimation of a genetic Gaussian network using GWAS summary data. Biometrics 2024; 80:ujae148. [PMID: 39656744 PMCID: PMC11639901 DOI: 10.1093/biomtc/ujae148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 11/02/2024] [Accepted: 11/14/2024] [Indexed: 12/16/2024]
Abstract
A genetic Gaussian network of multiple phenotypes, constructed through the inverse matrix of the genetic correlation matrix, is informative for understanding the biological dependencies of the phenotypes. However, its estimation may be challenging because the genetic correlation estimates are biased due to estimation errors and idiosyncratic pleiotropy inherent in GWAS summary statistics. Here, we introduce a novel approach called estimation of genetic graph (EGG), which eliminates the estimation error bias and idiosyncratic pleiotropy bias with the same techniques used in multivariable Mendelian randomization. The genetic network estimated by EGG can be interpreted as shared common biological contributions between phenotypes, conditional on others. We use both simulations and real data to demonstrate the superior efficacy of our novel method in comparison with the traditional network estimators.
Collapse
Affiliation(s)
- Yihe Yang
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH 44106, United States
| | - Noah Lorincz-Comi
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH 44106, United States
| | - Xiaofeng Zhu
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH 44106, United States
| |
Collapse
|
8
|
Taylor MA, Kandyba E, Halliwill K, Delrosario R, Khoroshkin M, Goodarzi H, Quigley D, Li YR, Wu D, Bollam SR, Mirzoeva OK, Akhurst RJ, Balmain A. Stem-cell states converge in multistage cutaneous squamous cell carcinoma development. Science 2024; 384:eadi7453. [PMID: 38815020 DOI: 10.1126/science.adi7453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 04/05/2024] [Indexed: 06/01/2024]
Abstract
Stem cells play a critical role in cancer development by contributing to cell heterogeneity, lineage plasticity, and drug resistance. We created gene expression networks from hundreds of mouse tissue samples (both normal and tumor) and integrated these with lineage tracing and single-cell RNA-seq, to identify convergence of cell states in premalignant tumor cells expressing markers of lineage plasticity and drug resistance. Two of these cell states representing multilineage plasticity or proliferation were inversely correlated, suggesting a mutually exclusive relationship. Treatment of carcinomas in vivo with chemotherapy repressed the proliferative state and activated multilineage plasticity whereas inhibition of differentiation repressed plasticity and potentiated responses to cell cycle inhibitors. Manipulation of this cell state transition point may provide a source of potential combinatorial targets for cancer therapy.
Collapse
Affiliation(s)
- Mark A Taylor
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
- Clinical Research Centre, Medical University of Bialystok, Bialystok 15-089, Poland
| | - Eve Kandyba
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
| | - Kyle Halliwill
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
- AbbVie, South San Francisco, CA 94080, USA
| | - Reyno Delrosario
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
| | - Matvei Khoroshkin
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
| | - Hani Goodarzi
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, CA 94518, USA
- Department of Urology, University of California San Francisco, San Francisco, CA 94518, USA
- Arc Institute, Palo Alto, CA 94304, USA
| | - David Quigley
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Urology, University of California San Francisco, San Francisco, CA 94518, USA
- Department of Epidemiology & Biostatistics, University of California San Francisco, San Francisco, CA 94518, USA
| | - Yun Rose Li
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Radiation Oncology, City of Hope National Medical Center, Duarte, CA 91010, USA
- Department of Cancer Genetics & Epigenetics, City of Hope National Medical Center, Duarte, CA 91010, USA
- Division of Quantitative Medicine & Systems Biology, Translational Genomics Research Institute, Phoenix, CA 85004, USA
| | - Di Wu
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
| | - Saumya R Bollam
- Biomedical Sciences Graduate Program, University of California San Francisco, San Francisco, CA 94518, USA
| | - Olga K Mirzoeva
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
| | - Rosemary J Akhurst
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Anatomy, University of California San Francisco, San Francisco, CA 94518, USA
| | - Allan Balmain
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, CA 94518, USA
| |
Collapse
|
9
|
Singh V, Singh V. Inferring Interaction Networks from Transcriptomic Data: Methods and Applications. Methods Mol Biol 2024; 2812:11-37. [PMID: 39068355 DOI: 10.1007/978-1-0716-3886-6_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Transcriptomic data is a treasure trove in modern molecular biology, as it offers a comprehensive viewpoint into the intricate nuances of gene expression dynamics underlying biological systems. This genetic information must be utilized to infer biomolecular interaction networks that can provide insights into the complex regulatory mechanisms underpinning the dynamic cellular processes. Gene regulatory networks and protein-protein interaction networks are two major classes of such networks. This chapter thoroughly investigates the wide range of methodologies used for distilling insightful revelations from transcriptomic data that include association-based methods (based on correlation among expression vectors), probabilistic models (using Bayesian and Gaussian models), and interologous methods. We reviewed different approaches for evaluating the significance of interactions based on the network topology and biological functions of the interacting molecules and discuss various strategies for the identification of functional modules. The chapter concludes with highlighting network-based techniques of prioritizing key genes, outlining the centrality-based, diffusion- based, and subgraph-based methods. The chapter provides a meticulous framework for investigating transcriptomic data to uncover assembly of complex molecular networks for their adaptable analyses across a broad spectrum of biological domains.
Collapse
Affiliation(s)
- Vikram Singh
- Centre for Computational Biology and Bioinformatics, Central University of Himachal Pradesh, Dharamshala, Himachal Pradesh, India
| | - Vikram Singh
- Centre for Computational Biology and Bioinformatics, Central University of Himachal Pradesh, Dharamshala, Himachal Pradesh, India.
| |
Collapse
|
10
|
Zito F, Cutello V, Pavone M. A Machine Learning Approach to Simulate Gene Expression and Infer Gene Regulatory Networks. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1214. [PMID: 37628244 PMCID: PMC10453511 DOI: 10.3390/e25081214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 07/20/2023] [Accepted: 08/10/2023] [Indexed: 08/27/2023]
Abstract
The ability to simulate gene expression and infer gene regulatory networks has vast potential applications in various fields, including medicine, agriculture, and environmental science. In recent years, machine learning approaches to simulate gene expression and infer gene regulatory networks have gained significant attention as a promising area of research. By simulating gene expression, we can gain insights into the complex mechanisms that control gene expression and how they are affected by various environmental factors. This knowledge can be used to develop new treatments for genetic diseases, improve crop yields, and better understand the evolution of species. In this article, we address this issue by focusing on a novel method capable of simulating the gene expression regulation of a group of genes and their mutual interactions. Our framework enables us to simulate the regulation of gene expression in response to alterations or perturbations that can affect the expression of a gene. We use both artificial and real benchmarks to empirically evaluate the effectiveness of our methodology. Furthermore, we compare our method with existing ones to understand its advantages and disadvantages. We also present future ideas for improvement to enhance the effectiveness of our method. Overall, our approach has the potential to greatly improve the field of gene expression simulation and gene regulatory network inference, possibly leading to significant advancements in genetics.
Collapse
Affiliation(s)
| | | | - Mario Pavone
- Department of Mathematics and Computer Science, University of Catania, 95125 Catania, Italy
| |
Collapse
|
11
|
Yang L, Lin W, Leng S. Conditional cross-map-based technique: From pairwise dynamical causality to causal network reconstruction. CHAOS (WOODBURY, N.Y.) 2023; 33:2894465. [PMID: 37276551 DOI: 10.1063/5.0144310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Accepted: 05/08/2023] [Indexed: 06/07/2023]
Abstract
Causality detection methods based on mutual cross mapping have been fruitfully developed and applied to data originating from nonlinear dynamical systems, where the causes and effects are non-separable. However, these pairwise methods still have shortcomings in discriminating typical network structures, including common drivers, indirect dependencies, and facing the curse of dimensionality, when they are stepping to causal network reconstruction. A few endeavors have been devoted to conquer these shortcomings. Here, we propose a novel method that could be regarded as one of these endeavors. Our method, named conditional cross-map-based technique, can eliminate third-party information and successfully detect direct dynamical causality, where the detection results can exactly be categorized into four standard normal forms by the designed criterion. To demonstrate the practical usefulness of our model-free, data-driven method, data generated from different representative models covering all kinds of network motifs and measured from real-world systems are investigated. Because correct identification of the direct causal links is essential to successful modeling, predicting, and controlling the underlying complex systems, our method does shed light on uncovering the inner working mechanisms of real-world systems only using the data experimentally obtained in a variety of disciplines.
Collapse
Affiliation(s)
- Liufei Yang
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai 200433, China
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
| | - Wei Lin
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai 200433, China
- School of Mathematical Sciences and Shanghai Centre for Mathematical Sciences, Fudan University, Shanghai 200433, China
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
| | - Siyang Leng
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai 200433, China
- Institute of AI and Robotics, Academy for Engineering and Technology, Fudan University, Shanghai 200433, China
| |
Collapse
|
12
|
Liu Q, Li J, Dong M, Liu M, Chai Y. Identification of Gene Regulatory Networks Using Variational Bayesian Inference in the Presence of Missing Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:399-409. [PMID: 35061589 DOI: 10.1109/tcbb.2022.3144418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The identification of gene regulatory networks (GRN) from gene expression time series data is a challenge and open problem in system biology. This paper considers the structure inference of GRN from the incomplete and noisy gene expression data, which is a not well-studied issue for GRN inference. In this paper, the dynamical behavior of the gene expression process is described by a stochastic nonlinear state-space model with unknown noise information. A variational Bayesian (VB) framework are proposed to estimate the parameters and gene expression levels simultaneously. One of the advantages of this method is that it can easily handle the missing observations by generating the prediction values. Considering the sparsity of GRN, the smoothed gene data are modeled by the extreme gradient boosting tree, and the regulatory interactions among genes are identified by the importance scores based on the tree model. The proposed method is tested on the artificial DREAM4 datasets and one real gene expression dataset of yeast. The comparative results show that the proposed method can effectively recover the regulatory interactions of GRN in the presence of missing observations and outperforms the existing methods for GRN identification.
Collapse
|
13
|
Xu X, Zhu X, Zhu C. GAN-based deep learning framework of network reconstruction. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00893-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
AbstractInferring the topology of a network from network dynamics is a significant problem with both theoretical research significance and practical value. This paper considers how to reconstruct the network topology according to the continuous-time data on the network. Inspired by the generative adversarial network(GAN), we design a deep learning framework based on network continuous-time data. The framework predicts the edge connection probability between network nodes by learning the correlation between network node state vectors. To verify the accuracy and adaptability of our method, we conducted extensive experiments on scale-free networks and small-world networks at different network scales using three different dynamics: heat diffusion dynamics, mutualistic interaction dynamics, and gene regulation dynamics. Experimental results show that our method significantly outperforms the other five traditional correlation indices, which demonstrates that our method can reconstruct the topology of different scale networks well under different network dynamics.
Collapse
|
14
|
Ismail A, Gajjar P, Park M, Mahboob A, Tsolova V, Subramanian J, Darwish AG, El-Sharkawy I. A recessive mutation in muscadine grapes causes berry color-loss without influencing anthocyanin pathway. Commun Biol 2022; 5:1012. [PMID: 36153380 PMCID: PMC9509324 DOI: 10.1038/s42003-022-04001-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 09/13/2022] [Indexed: 11/10/2022] Open
Abstract
Anthocyanins, a major class of flavonoids, are important pigments of grape berries. Despite the recent discovery of the genetic cause underlying the loss of color, the metabolomic and molecular responses are unknown. Anthocyanin quantification among diverse berry color muscadines suggests that all genotypes could produce adequate anthocyanin quantities, irrespective of berry color. Transcriptome profiling of contrasting color muscadine genotypes proposes a potential deficiency that occurs within the anthocyanin transport and/or degradation mechanisms and might cause unpigmented berries. Genome-wide association studies highlighted a region on chromosome-4, comprising several genes encoding glutathione S-transferases involved in anthocyanin transport. Sequence comparison among genotypes reveals the presence of two GST4b alleles that differ by substituting the conserved amino acid residue Pro171-to-Leu. Molecular dynamics simulations demonstrate that GST4b2–Leu171 encodes an inactive protein due to modifications within the H-binding site. Population genotyping suggests the recessive inheritance of the unpigmented trait with a GST4b2/2 homozygous. A model defining colorless muscadines’ response to the mutation stimulus, avoiding the impact of trapped anthocyanins within the cytoplasm is established. Transcriptome profiling and mutational analysis suggest a potential deficiency in anthocyanin transport by glutathione S-transferases and/or degradation mechanisms that might cause unpigmented berries.
Collapse
|
15
|
Wang MG, Ou-Yang L, Yan H, Zhang XF. Inferring Gene Co-Expression Networks by Incorporating Prior Protein-Protein Interaction Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2894-2906. [PMID: 34383650 DOI: 10.1109/tcbb.2021.3103407] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Inferring gene co-expression networks from high-throughput gene expression data is an important task in bioinformatics. Many gene networks often exhibit modular structures. Although several Gaussian graphical model-based methods have been developed to estimate gene co-expression networks by incorporating the modular structural prior, none of them takes into account the modular structures captured by the prior networks (e.g., protein interaction networks). In this study, we propose a novel prior network-dependent gene network inference (pGNI) method to estimate gene co-expression networks by integrating gene expression data and prior protein interaction network data. The underlying modular structure is learned from both sets of data. Through simulation studies, we demonstrate the feasibility and effectiveness of our method. We also apply our method to two real datasets. The modular structures in the networks estimated by our method are biological significant.
Collapse
|
16
|
Saikia M, Bhattacharyya DK, Kalita JK. CBDCEM: An effective centrality based differential co-expression method for critical gene finding. GENE REPORTS 2022. [DOI: 10.1016/j.genrep.2022.101688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/09/2022]
|
17
|
Griessenberger F, Trutschnig W, Junker RR. qad
: An R‐package to detect asymmetric and directed dependence in bivariate samples. Methods Ecol Evol 2022. [DOI: 10.1111/2041-210x.13951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
Affiliation(s)
| | | | - Robert R. Junker
- Department of Environment and Biodiversity University of Salzburg Salzburg Austria
- Evolutionary Ecology of Plants, Department of Biology Philipps‐University Marburg Marburg Germany
| |
Collapse
|
18
|
Lv Y, Amanullah S, Liu S, Zhang C, Liu H, Zhu Z, Zhang X, Gao P, Luan F. Comparative Transcriptome Analysis Identified Key Pathways and Genes Regulating Differentiated Stigma Color in Melon ( Cucumis melo L.). Int J Mol Sci 2022; 23:ijms23126721. [PMID: 35743161 PMCID: PMC9224399 DOI: 10.3390/ijms23126721] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 06/08/2022] [Accepted: 06/14/2022] [Indexed: 11/27/2022] Open
Abstract
Stigma color is an important morphological trait in many flowering plants. Visual observations in different field experiments have shown that a green stigma in melons is more attractive to natural pollinators than a yellow one. In the current study, we evaluated the characterization of two contrasted melon lines (MR-1 with a green stigma and M4-7 with a yellow stigma). Endogenous quantification showed that the chlorophyll and carotenoid content in the MR-1 stigmas was higher compared to the M4-7 stigmas. The primary differences in the chloroplast ultrastructure at different developmental stages depicted that the stigmas of both melon lines were mainly enriched with granum, plastoglobulus, and starch grains. Further, comparative transcriptomic analysis was performed to identify the candidate pathways and genes regulating melon stigma color during key developmental stages (S1–S3). The obtained results indicated similar biological processes involved in the three stages, but major differences were observed in light reactions and chloroplast pathways. The weighted gene co-expression network analysis (WGCNA) of differentially expressed genes (DEGs) uncovered a “black” network module (655 out of 5302 genes), mainly corresponding to light reactions, light harvesting, the chlorophyll metabolic process, and the chlorophyll biosynthetic process, and exhibited a significant contribution to stigma color. Overall, the expression of five key genes of the chlorophyll synthesis pathway—CAO (MELO03C010624), CHLH (MELO03C007233), CRD (MELO03C026802), HEMA (MELO03C011113), POR (MELO03C016714)—were checked at different stages of stigma development in both melon lines using quantitative real time polymerase chain reaction (qRT-PCR). The results exhibited that the expression of these genes gradually increased during the stigma development of the MR-1 line but decreased in the M4-7 line at S2. In addition, the expression trends in different stages were the same as RNA-seq, indicating data accuracy. To sum up, our research reveals an in-depth molecular mechanism of stigma coloration and suggests that chlorophyll and related biological activity play an important role in differentiating melon stigma color.
Collapse
Affiliation(s)
- Yuanzuo Lv
- Key Laboratory of Biology and Genetic Improvement of Horticulture Crops (Northeast Region), Ministry of Agriculture and Rural Affairs, Northeast Agricultural University, Harbin 150030, China; (Y.L.); (S.A.); (S.L.); (C.Z.); (H.L.); (Z.Z.)
- College of Horticulture and Landscape Architecture, Northeast Agricultural University, Harbin 150030, China
| | - Sikandar Amanullah
- Key Laboratory of Biology and Genetic Improvement of Horticulture Crops (Northeast Region), Ministry of Agriculture and Rural Affairs, Northeast Agricultural University, Harbin 150030, China; (Y.L.); (S.A.); (S.L.); (C.Z.); (H.L.); (Z.Z.)
- College of Horticulture and Landscape Architecture, Northeast Agricultural University, Harbin 150030, China
| | - Shi Liu
- Key Laboratory of Biology and Genetic Improvement of Horticulture Crops (Northeast Region), Ministry of Agriculture and Rural Affairs, Northeast Agricultural University, Harbin 150030, China; (Y.L.); (S.A.); (S.L.); (C.Z.); (H.L.); (Z.Z.)
- College of Horticulture and Landscape Architecture, Northeast Agricultural University, Harbin 150030, China
| | - Chen Zhang
- Key Laboratory of Biology and Genetic Improvement of Horticulture Crops (Northeast Region), Ministry of Agriculture and Rural Affairs, Northeast Agricultural University, Harbin 150030, China; (Y.L.); (S.A.); (S.L.); (C.Z.); (H.L.); (Z.Z.)
- College of Horticulture and Landscape Architecture, Northeast Agricultural University, Harbin 150030, China
| | - Hongyu Liu
- Key Laboratory of Biology and Genetic Improvement of Horticulture Crops (Northeast Region), Ministry of Agriculture and Rural Affairs, Northeast Agricultural University, Harbin 150030, China; (Y.L.); (S.A.); (S.L.); (C.Z.); (H.L.); (Z.Z.)
- College of Horticulture and Landscape Architecture, Northeast Agricultural University, Harbin 150030, China
| | - Zicheng Zhu
- Key Laboratory of Biology and Genetic Improvement of Horticulture Crops (Northeast Region), Ministry of Agriculture and Rural Affairs, Northeast Agricultural University, Harbin 150030, China; (Y.L.); (S.A.); (S.L.); (C.Z.); (H.L.); (Z.Z.)
- College of Horticulture and Landscape Architecture, Northeast Agricultural University, Harbin 150030, China
| | - Xian Zhang
- Horticulture College of Northwest A&F University, Yangling, Xianyang 712100, China;
| | - Peng Gao
- Key Laboratory of Biology and Genetic Improvement of Horticulture Crops (Northeast Region), Ministry of Agriculture and Rural Affairs, Northeast Agricultural University, Harbin 150030, China; (Y.L.); (S.A.); (S.L.); (C.Z.); (H.L.); (Z.Z.)
- College of Horticulture and Landscape Architecture, Northeast Agricultural University, Harbin 150030, China
- Correspondence: (P.G.); (F.L.)
| | - Feishi Luan
- Key Laboratory of Biology and Genetic Improvement of Horticulture Crops (Northeast Region), Ministry of Agriculture and Rural Affairs, Northeast Agricultural University, Harbin 150030, China; (Y.L.); (S.A.); (S.L.); (C.Z.); (H.L.); (Z.Z.)
- College of Horticulture and Landscape Architecture, Northeast Agricultural University, Harbin 150030, China
- Correspondence: (P.G.); (F.L.)
| |
Collapse
|
19
|
Mekedem M, Ravel P, Colinge J. Application of modular response analysis to medium- to large-size biological systems. PLoS Comput Biol 2022; 18:e1009312. [PMID: 35442961 PMCID: PMC9060349 DOI: 10.1371/journal.pcbi.1009312] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Revised: 05/02/2022] [Accepted: 03/31/2022] [Indexed: 11/18/2022] Open
Abstract
The development of high-throughput genomic technologies associated with recent genetic perturbation techniques such as short hairpin RNA (shRNA), gene trapping, or gene editing (CRISPR/Cas9) has made it possible to obtain large perturbation data sets. These data sets are invaluable sources of information regarding the function of genes, and they offer unique opportunities to reverse engineer gene regulatory networks in specific cell types. Modular response analysis (MRA) is a well-accepted mathematical modeling method that is precisely aimed at such network inference tasks, but its use has been limited to rather small biological systems so far. In this study, we show that MRA can be employed on large systems with almost 1,000 network components. In particular, we show that MRA performance surpasses general-purpose mutual information-based algorithms. Part of these competitive results was obtained by the application of a novel heuristic that pruned MRA-inferred interactions a posteriori. We also exploited a block structure in MRA linear algebra to parallelize large system resolutions.
Collapse
Affiliation(s)
- Meriem Mekedem
- Université de Montpellier, Montpellier, France
- Institut de Recherche en Cancérologie de Montpellier, Inserm U1194, Montpellier, France
- Institut régional du Cancer Montpellier, Montpellier, France
| | - Patrice Ravel
- Université de Montpellier, Montpellier, France
- Institut de Recherche en Cancérologie de Montpellier, Inserm U1194, Montpellier, France
- Institut régional du Cancer Montpellier, Montpellier, France
- Faculté de Pharmacie, Université de Montpellier, Montpellier, France
| | - Jacques Colinge
- Université de Montpellier, Montpellier, France
- Institut de Recherche en Cancérologie de Montpellier, Inserm U1194, Montpellier, France
- Institut régional du Cancer Montpellier, Montpellier, France
- Faculté de Médecine, Université de Montpellier, Montpellier, France
| |
Collapse
|
20
|
Identifying large scale interaction atlases using probabilistic graphs and external knowledge. J Clin Transl Sci 2022; 6:e27. [PMID: 35321220 PMCID: PMC8922291 DOI: 10.1017/cts.2022.18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 12/29/2021] [Accepted: 02/07/2022] [Indexed: 11/17/2022] Open
Abstract
Introduction: Reconstruction of gene interaction networks from experimental data provides a deep understanding of the underlying biological mechanisms. The noisy nature of the data and the large size of the network make this a very challenging task. Complex approaches handle the stochastic nature of the data but can only do this for small networks; simpler, linear models generate large networks but with less reliability. Methods: We propose a divide-and-conquer approach using probabilistic graph representations and external knowledge. We cluster the experimental data and learn an interaction network for each cluster, which are merged using the interaction network for the representative genes selected for each cluster. Results: We generated an interaction atlas for 337 human pathways yielding a network of 11,454 genes with 17,777 edges. Simulated gene expression data from this atlas formed the basis for reconstruction. Based on the area under the curve of the precision-recall curve, the proposed approach outperformed the baseline (random classifier) by ∼15-fold and conventional methods by ∼5–17-fold. The performance of the proposed workflow is significantly linked to the accuracy of the clustering step that tries to identify the modularity of the underlying biological mechanisms. Conclusions: We provide an interaction atlas generation workflow optimizing the algorithm/parameter selection. The proposed approach integrates external knowledge in the reconstruction of the interactome using probabilistic graphs. Network characterization and understanding long-range effects in interaction atlases provide means for comparative analysis with implications in biomarker discovery and therapeutic approaches. The proposed workflow is freely available at http://otulab.unl.edu/atlas.
Collapse
|
21
|
A graph model of combination therapies. Drug Discov Today 2022; 27:1210-1217. [PMID: 35143962 DOI: 10.1016/j.drudis.2022.02.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 12/31/2021] [Accepted: 02/02/2022] [Indexed: 11/24/2022]
Abstract
The simultaneous use of multiple medications causes drug-drug interactions (DDI) that impact therapeutic efficacy. Here, we argue that graph theory, in conjunction with game theory and ecosystem theory, can address this issue. We treat the coexistence of multiple drugs as a system in which DDI is modeled by game theory. We develop an ordinary differential equation model to characterize how the concentration of a drug changes as a result of its independent capacity and the dependent influence of other drugs through the metabolic response of the host. We coalesce all drugs into personalized and context-specific networks, which can reveal key DDI determinants of therapeutical efficacy. Our model can quantify drug synergy and antagonism and test the translational success of combination therapies to the clinic.
Collapse
|
22
|
Kuismin M, Dodangeh F, Sillanpää MJ. Gap-com: general model selection criterion for sparse undirected gene networks with nontrivial community structure. G3 (BETHESDA, MD.) 2022; 12:jkab437. [PMID: 35100338 PMCID: PMC9210289 DOI: 10.1093/g3journal/jkab437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 12/06/2021] [Indexed: 06/14/2023]
Abstract
We introduce a new model selection criterion for sparse complex gene network modeling where gene co-expression relationships are estimated from data. This is a novel formulation of the gap statistic and it can be used for the optimal choice of a regularization parameter in graphical models. Our criterion favors gene network structure which differs from a trivial gene interaction structure obtained totally at random. We call the criterion the gap-com statistic (gap community statistic). The idea of the gap-com statistic is to examine the difference between the observed and the expected counts of communities (clusters) where the expected counts are evaluated using either data permutations or reference graph (the Erdős-Rényi graph) resampling. The latter represents a trivial gene network structure determined by chance. We put emphasis on complex network inference because the structure of gene networks is usually nontrivial. For example, some of the genes can be clustered together or some genes can be hub genes. We evaluate the performance of the gap-com statistic in graphical model selection and compare its performance to some existing methods using simulated and real biological data examples.
Collapse
Affiliation(s)
- Markku Kuismin
- Research Unit of Mathematical Sciences, University of Oulu, Oulu FI-90014, Finland
- Biocenter Oulu, University of Oulu, Oulu FI-90014, Finland
- School of Computing, University of Eastern Finland, Joensuu FI-80101, Finland
| | - Fatemeh Dodangeh
- Research Unit of Mathematical Sciences, University of Oulu, Oulu FI-90014, Finland
| | - Mikko J Sillanpää
- Research Unit of Mathematical Sciences, University of Oulu, Oulu FI-90014, Finland
- Biocenter Oulu, University of Oulu, Oulu FI-90014, Finland
- Infotech Oulu, University of Oulu, Oulu FI-90014, Finland
| |
Collapse
|
23
|
Wu G, Li X, Guo W, Wei Z, Hu T, Shan Y, Gu J. JEBIN: analyzing gene co-expressions across multiple datasets by joint network embedding. Brief Bioinform 2022; 23:6519533. [PMID: 35134135 DOI: 10.1093/bib/bbab603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 12/15/2021] [Accepted: 12/27/2021] [Indexed: 11/13/2022] Open
Abstract
The inference of gene co-expression associations is one of the fundamental tasks for large-scale transcriptomic data analysis. Due to the high dimensionality and high noises in transcriptomic data, it is difficult to infer stable gene co-expression associations from single dataset. Meta-analysis of multisource data can effectively tackle this problem. We proposed Joint Embedding of multiple BIpartite Networks (JEBIN) to learn the low-dimensional consensus representation for genes by integrating multiple expression datasets. JEBIN infers gene co-expression associations in a nonlinear and global similarity manner and can integrate datasets with different distributions in linear time complexity with the gene and total sample size. The effectiveness and scalability of JEBIN were verified by simulation experiments, and its superiority over the commonly used integration methods was proved by three indexes on real biological datasets. Then, JEBIN was applied to study the gene co-expression patterns of hepatocellular carcinoma (HCC) based on multiple expression datasets of HCC and adjacent normal tissues, and further on latest HCC single-cell RNA-seq data. Results show that gene co-expressions are highly different between bulk and single-cell datasets. Finally, many differentially co-expressed ligand-receptor pairs were discovered by comparing HCC with adjacent normal data, providing candidate HCC targets for abnormal cell-cell communications.
Collapse
Affiliation(s)
- Guiying Wu
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiangyu Li
- School of Software Engineering, Beijing Jiaotong University, Beijing 100044, China
| | - Wenbo Guo
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Zheng Wei
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Tao Hu
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Yiran Shan
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Jin Gu
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Department of Automation, Tsinghua University, Beijing 100084, China
| |
Collapse
|
24
|
Hu H, Qiu Y. Inference for nonparanormal partial correlation via regularized rank‐based nodewise regression. Biometrics 2022. [DOI: 10.1111/biom.13624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 10/05/2021] [Accepted: 12/16/2021] [Indexed: 11/28/2022]
Affiliation(s)
- Haoyan Hu
- Department of Statistics Iowa State University Ames Iowa 50010 U.S.A
| | - Yumou Qiu
- Department of Statistics Iowa State University Ames Iowa 50010 U.S.A
| |
Collapse
|
25
|
The Genomic Physics of COVID-19 Pathogenesis and Spread. Cells 2021; 11:cells11010080. [PMID: 35011641 PMCID: PMC8750765 DOI: 10.3390/cells11010080] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 12/19/2021] [Accepted: 12/23/2021] [Indexed: 12/11/2022] Open
Abstract
Coronavirus disease (COVID-19) spreads mainly through close contact of infected persons, but the molecular mechanisms underlying its pathogenesis and transmission remain unknown. Here, we propose a statistical physics model to coalesce all molecular entities into a cohesive network in which the roadmap of how each entity mediates the disease can be characterized. We argue that the process of how a transmitter transforms the virus into a recipient constitutes a triad unit that propagates COVID-19 along reticulate paths. Intrinsically, person-to-person transmissibility may be mediated by how genes interact transversely across transmitter, recipient, and viral genomes. We integrate quantitative genetic theory into hypergraph theory to code the main effects of the three genomes as nodes, pairwise cross-genome epistasis as edges, and high-order cross-genome epistasis as hyperedges in a series of mobile hypergraphs. Charting a genome-wide atlas of horizontally epistatic hypergraphs can facilitate the systematic characterization of the community genetic mechanisms underlying COVID-19 spread. This atlas can typically help design effective containment and mitigation strategies and screen and triage those more susceptible persons and those asymptomatic carriers who are incubation virus transmitters.
Collapse
|
26
|
Gong H, Zhu S, Zhu X, Fang Q, Zhang XY, Wu R. A Multilayer Interactome Network Constructed in a Forest Poplar Population Mediates the Pleiotropic Control of Complex Traits. Front Genet 2021; 12:769688. [PMID: 34868256 PMCID: PMC8633413 DOI: 10.3389/fgene.2021.769688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 10/19/2021] [Indexed: 11/13/2022] Open
Abstract
The effects of genes on physiological and biochemical processes are interrelated and interdependent; it is common for genes to express pleiotropic control of complex traits. However, the study of gene expression and participating pathways in vivo at the whole-genome level is challenging. Here, we develop a coupled regulatory interaction differential equation to assess overall and independent genetic effects on trait growth. Based on evolutionary game theory and developmental modularity theory, we constructed multilayer, omnigenic networks of bidirectional, weighted, and positive or negative epistatic interactions using a forest poplar tree mapping population, which were organized into metagalactic, intergalactic, and local interstellar networks that describe layers of structure between modules, submodules, and individual single nucleotide polymorphisms, respectively. These multilayer interactomes enable the exploration of complex interactions between genes, and the analysis of not only differential expression of quantitative trait loci but also previously uncharacterized determinant SNPs, which are negatively regulated by other SNPs, based on the deconstruction of genetic effects to their component parts. Our research framework provides a tool to comprehend the pleiotropic control of complex traits and explores the inherent directional connections between genes in the structure of omnigenic networks.
Collapse
Affiliation(s)
- Huiying Gong
- College of Science, Beijing Forestry University, Beijing, China
| | - Sheng Zhu
- College of Biology and the Environment, Nanjing Forestry University, Nanjing, China
| | - Xuli Zhu
- Center for Computational Biology, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
| | - Qing Fang
- Faculty of Science, Yamagata University, Yamagata, Japan
| | - Xiao-Yu Zhang
- College of Science, Beijing Forestry University, Beijing, China
| | - Rongling Wu
- Center for Computational Biology, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, China
- Center for Statistical Genetics, The Pennsylvania State University, Hershey, PA, United States
| |
Collapse
|
27
|
Feng L, Jiang P, Li C, Zhao J, Dong A, Yang D, Wu R. Genetic dissection of growth trajectories in forest trees: From FunMap to FunGraph. FORESTRY RESEARCH 2021; 1:19. [PMID: 39524511 PMCID: PMC11524299 DOI: 10.48130/fr-2021-0019] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 10/14/2021] [Indexed: 11/16/2024]
Abstract
Growth is the developmental process involving important genetic components. Functional mapping (FunMap) has been used as an approach to map quantitative trait loci (QTLs) governing growth trajectories by incorporating growth equations. FunMap is based on reductionism thinking, with a power to identify a small set of significant QTLs from the whole pool of genome-wide markers. Yet, increasing evidence shows that a complex trait is controlled by all genes the organism may possibly carry. Here, we describe and demonstrate a different mapping approach that encapsulates all markers into genetic interaction networks. This approach, symbolized as FunGraph, combines functional mapping, evolutionary game theory, and prey-predator theory into mathematical graphs, allowing the observed genetic effect of a locus to be decomposed into its independent component (resulting from this locus' intrinsic capacity) and dependent component (due to extrinsic regulation by other loci). Using FunGraph, we can visualize and trace the roadmap of how each locus interact with every other locus to impact growth. In a population-based association study of Euphrates poplar, we use FunGraph to identify the previously neglected genetic interaction effects that contribute to the genetic architecture of juvenile stem growth. FunGraph could open up a novel gateway to comprehend the global genetic control mechanisms of complex traits.
Collapse
Affiliation(s)
- Li Feng
- Center for Computational Biology, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Peng Jiang
- Center for Computational Biology, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Caifeng Li
- Center for Computational Biology, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Jinshuai Zhao
- Center for Computational Biology, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Ang Dong
- Center for Computational Biology, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Dengcheng Yang
- Center for Computational Biology, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Rongling Wu
- Center for Statistical Genetics, Departments of Public Health Sciences and Statistics, The Pennsylvania State University, Hershey, PA 17033, USA
| |
Collapse
|
28
|
Wei Z, Kim D. Measure of asymmetric association for ordinal contingency tables via the bilinear extension copula. Stat Probab Lett 2021. [DOI: 10.1016/j.spl.2021.109183] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
29
|
Belyaeva A, Squires C, Uhler C. DCI: learning causal differences between gene regulatory networks. Bioinformatics 2021; 37:3067-3069. [PMID: 33704425 PMCID: PMC9991896 DOI: 10.1093/bioinformatics/btab167] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Revised: 01/12/2021] [Accepted: 03/08/2021] [Indexed: 02/02/2023] Open
Abstract
SUMMARY Designing interventions to control gene regulation necessitates modeling a gene regulatory network by a causal graph. Currently, large-scale gene expression datasets from different conditions, cell types, disease states, and developmental time points are being collected. However, application of classical causal inference algorithms to infer gene regulatory networks based on such data is still challenging, requiring high sample sizes and computational resources. Here, we describe an algorithm that efficiently learns the differences in gene regulatory mechanisms between different conditions. Our difference causal inference (DCI) algorithm infers changes (i.e. edges that appeared, disappeared, or changed weight) between two causal graphs given gene expression data from the two conditions. This algorithm is efficient in its use of samples and computation since it infers the differences between causal graphs directly without estimating each possibly large causal graph separately. We provide a user-friendly Python implementation of DCI and also enable the user to learn the most robust difference causal graph across different tuning parameters via stability selection. Finally, we show how to apply DCI to single-cell RNA-seq data from different conditions and cell states, and we also validate our algorithm by predicting the effects of interventions. AVAILABILITY AND IMPLEMENTATION Python package freely available at http://uhlerlab.github.io/causaldag/dci. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anastasiya Belyaeva
- Laboratory for Information and Decision Systems and Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Chandler Squires
- Laboratory for Information and Decision Systems and Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Caroline Uhler
- Laboratory for Information and Decision Systems and Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
30
|
Inferring multilayer interactome networks shaping phenotypic plasticity and evolution. Nat Commun 2021; 12:5304. [PMID: 34489412 PMCID: PMC8421358 DOI: 10.1038/s41467-021-25086-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Accepted: 07/12/2021] [Indexed: 02/07/2023] Open
Abstract
Phenotypic plasticity represents a capacity by which the organism changes its phenotypes in response to environmental stimuli. Despite its pivotal role in adaptive evolution, how phenotypic plasticity is genetically controlled remains elusive. Here, we develop a unified framework for coalescing all single nucleotide polymorphisms (SNPs) from a genome-wide association study (GWAS) into a quantitative graph. This framework integrates functional genetic mapping, evolutionary game theory, and predator-prey theory to decompose the net genetic effect of each SNP into its independent and dependent components. The independent effect arises from the intrinsic capacity of a SNP, only expressed when it is in isolation, whereas the dependent effect results from the extrinsic influence of other SNPs. The dependent effect is conceptually beyond the traditional definition of epistasis by not only characterizing the strength of epistasis but also capturing the bi-causality of epistasis and the sign of the causality. We implement functional clustering and variable selection to infer multilayer, sparse, and multiplex interactome networks from any dimension of genetic data. We design and conduct two GWAS experiments using Staphylococcus aureus, aimed to test the genetic mechanisms underlying the phenotypic plasticity of this species to vancomycin exposure and Escherichia coli coexistence. We reconstruct the two most comprehensive genetic networks for abiotic and biotic phenotypic plasticity. Pathway analysis shows that SNP-SNP epistasis for phenotypic plasticity can be annotated to protein-protein interactions through coding genes. Our model can unveil the regulatory mechanisms of significant loci and excavate missing heritability from some insignificant loci. Our multilayer genetic networks provide a systems tool for dissecting environment-induced evolution.
Collapse
|
31
|
Leifer I, Sánchez-Pérez M, Ishida C, Makse HA. Predicting synchronized gene coexpression patterns from fibration symmetries in gene regulatory networks in bacteria. BMC Bioinformatics 2021; 22:363. [PMID: 34238210 PMCID: PMC8265036 DOI: 10.1186/s12859-021-04213-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2020] [Accepted: 05/19/2021] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Gene regulatory networks coordinate the expression of genes across physiological states and ensure a synchronized expression of genes in cellular subsystems, critical for the coherent functioning of cells. Here we address the question whether it is possible to predict gene synchronization from network structure alone. We have recently shown that synchronized gene expression can be predicted from symmetries in the gene regulatory networks described by the concept of symmetry fibrations. We showed that symmetry fibrations partition the genes into groups called fibers based on the symmetries of their 'input trees', the set of paths in the network through which signals can reach a gene. In idealized dynamic gene expression models, all genes in a fiber are perfectly synchronized, while less idealized models-with gene input functions differencing between genes-predict symmetry breaking and desynchronization. RESULTS To study the functional role of gene fibers and to test whether some of the fiber-induced coexpression remains in reality, we analyze gene fibrations for the gene regulatory networks of E. coli and B. subtilis and confront them with expression data. We find approximate gene coexpression patterns consistent with symmetry fibrations with idealized gene expression dynamics. This shows that network structure alone provides useful information about gene synchronization, and suggest that gene input functions within fibers may be further streamlined by evolutionary pressures to realize a coexpression of genes. CONCLUSIONS Thus, gene fibrations provide a sound conceptual tool to describe tunable coexpression induced by network topology and shaped by mechanistic details of gene expression.
Collapse
Affiliation(s)
- Ian Leifer
- Levich Institute,Physics Department, City College of New York, New York, NY, 10031, USA
| | - Mishael Sánchez-Pérez
- Levich Institute,Physics Department, City College of New York, New York, NY, 10031, USA
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Cecilia Ishida
- Faculty of Medicine and Biomedical Sciences, Autonomous University of Chihuahua, 31125, Chihuahua, Chihuahua, Mexico
| | - Hernán A Makse
- Levich Institute,Physics Department, City College of New York, New York, NY, 10031, USA.
| |
Collapse
|
32
|
Kontio JAJ, Pyhäjärvi T, Sillanpää MJ. Model guided trait-specific co-expression network estimation as a new perspective for identifying molecular interactions and pathways. PLoS Comput Biol 2021; 17:e1008960. [PMID: 33939702 PMCID: PMC8118548 DOI: 10.1371/journal.pcbi.1008960] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 05/13/2021] [Accepted: 04/13/2021] [Indexed: 11/19/2022] Open
Abstract
A wide variety of 1) parametric regression models and 2) co-expression networks have been developed for finding gene-by-gene interactions underlying complex traits from expression data. While both methodological schemes have their own well-known benefits, little is known about their synergistic potential. Our study introduces their methodological fusion that cross-exploits the strengths of individual approaches via a built-in information-sharing mechanism. This fusion is theoretically based on certain trait-conditioned dependency patterns between two genes depending on their role in the underlying parametric model. Resulting trait-specific co-expression network estimation method 1) serves to enhance the interpretation of biological networks in a parametric sense, and 2) exploits the underlying parametric model itself in the estimation process. To also account for the substantial amount of intrinsic noise and collinearities, often entailed by expression data, a tailored co-expression measure is introduced along with this framework to alleviate related computational problems. A remarkable advance over the reference methods in simulated scenarios substantiate the method's high-efficiency. As proof-of-concept, this synergistic approach is successfully applied in survival analysis, with acute myeloid leukemia data, further highlighting the framework's versatility and broad practical relevance.
Collapse
Affiliation(s)
- Juho A. J. Kontio
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, Finland
| | - Tanja Pyhäjärvi
- Department of Ecology and Genetics, University of Oulu, Oulu, Finland
- Department of Forest Sciences, University of Helsinki, Helsinki, Finland
| | - Mikko J. Sillanpää
- Research Unit of Mathematical Sciences, University of Oulu, Oulu, Finland
- * E-mail:
| |
Collapse
|
33
|
Wang H, Ye M, Fu Y, Dong A, Zhang M, Feng L, Zhu X, Bo W, Jiang L, Griffin CH, Liang D, Wu R. Modeling genome-wide by environment interactions through omnigenic interactome networks. Cell Rep 2021; 35:109114. [PMID: 33979624 DOI: 10.1016/j.celrep.2021.109114] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 03/11/2021] [Accepted: 04/21/2021] [Indexed: 10/21/2022] Open
Abstract
How genes interact with the environment to shape phenotypic variation and evolution is a fundamental question intriguing to biologists from various fields. Existing linear models built on single genes are inadequate to reveal the complexity of genotype-environment (G-E) interactions. Here, we develop a conceptual model for mechanistically dissecting G-E interplay by integrating previously disconnected theories and methods. Under this integration, evolutionary game theory, developmental modularity theory, and a variable selection method allow us to reconstruct environment-induced, maximally informative, sparse, and casual multilayer genetic networks. We design and conduct two mapping experiments by using a desert-adapted tree species to validate the biological application of the model proposed. The model identifies previously uncharacterized molecular mechanisms that mediate trees' response to saline stress. Our model provides a tool to comprehend the genetic architecture of trait variation and evolution and trace the information flow of each gene toward phenotypes within omnigenic networks.
Collapse
Affiliation(s)
- Haojie Wang
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Meixia Ye
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Yaru Fu
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Ang Dong
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Miaomiao Zhang
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Li Feng
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Xuli Zhu
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Wenhao Bo
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Libo Jiang
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Christopher H Griffin
- Applied Research Laboratory, The Pennsylvania State University, University Park, PA 16802, USA
| | - Dan Liang
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Rongling Wu
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China; Center for Statistical Genetics, Departments of Public Health Sciences and Statistics, The Pennsylvania State University, Hershey, PA 17033, USA.
| |
Collapse
|
34
|
Wang YXR, Li L, Li JJ, Huang H. Network Modeling in Biology: Statistical Methods for Gene and Brain Networks. Stat Sci 2021; 36:89-108. [PMID: 34305304 PMCID: PMC8296984 DOI: 10.1214/20-sts792] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The rise of network data in many different domains has offered researchers new insight into the problem of modeling complex systems and propelled the development of numerous innovative statistical methodologies and computational tools. In this paper, we primarily focus on two types of biological networks, gene networks and brain networks, where statistical network modeling has found both fruitful and challenging applications. Unlike other network examples such as social networks where network edges can be directly observed, both gene and brain networks require careful estimation of edges using covariates as a first step. We provide a discussion on existing statistical and computational methods for edge esitimation and subsequent statistical inference problems in these two types of biological networks.
Collapse
Affiliation(s)
- Y X Rachel Wang
- School of Mathematics and Statistics, University of Sydney, Australia
| | - Lexin Li
- Department of Biostatistics and Epidemiology, School of Public Health, University of California, Berkeley
| | | | - Haiyan Huang
- Department of Statistics, University of California, Berkeley
| |
Collapse
|
35
|
Cahan P, Cacchiarelli D, Dunn SJ, Hemberg M, de Sousa Lopes SMC, Morris SA, Rackham OJL, Del Sol A, Wells CA. Computational Stem Cell Biology: Open Questions and Guiding Principles. Cell Stem Cell 2021; 28:20-32. [PMID: 33417869 PMCID: PMC7799393 DOI: 10.1016/j.stem.2020.12.012] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Computational biology is enabling an explosive growth in our understanding of stem cells and our ability to use them for disease modeling, regenerative medicine, and drug discovery. We discuss four topics that exemplify applications of computation to stem cell biology: cell typing, lineage tracing, trajectory inference, and regulatory networks. We use these examples to articulate principles that have guided computational biology broadly and call for renewed attention to these principles as computation becomes increasingly important in stem cell biology. We also discuss important challenges for this field with the hope that it will inspire more to join this exciting area.
Collapse
Affiliation(s)
- Patrick Cahan
- Institute for Cell Engineering, Department of Biomedical Engineering, Department of Molecular Biology and Genetics, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA.
| | - Davide Cacchiarelli
- Telethon Institute of Genetics and Medicine (TIGEM), Armenise/Harvard Laboratory of Integrative Genomics, Pozzuoli, Italy d Department of Translational Medicine, University of Naples "Federico II," Naples, Italy
| | - Sara-Jane Dunn
- DeepMind, 14-18 Handyside Street, London N1C 4DN, UK; Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Jeffrey Cheah Biomedical Centre, Puddicombe Way, Cambridge Biomedical Campus, Cambridge CB2 0AW, UK
| | - Martin Hemberg
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | | | - Samantha A Morris
- Department of Developmental Biology, Department of Genetics, Center of Regenerative Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Owen J L Rackham
- Centre for Computational Biology and The Program for Cardiovascular and Metabolic Disorders, Duke-NUS Medical School, Singapore, Singapore
| | - Antonio Del Sol
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 Avenue du Swing, Belvaux 4366, Luxembourg; CIC bioGUNE, Bizkaia Technology Park, 801 Building, 48160 Derio, Spain; IKERBASQUE, Basque Foundation for Science, Bilbao 48013, Spain
| | - Christine A Wells
- Centre for Stem Cell Systems, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC 3010, Australia
| |
Collapse
|
36
|
Vohradsky J, Schwarz M, Ramaniuk O, Ruiz-Larrabeiti O, Vaňková Hausnerová V, Šanderová H, Krásný L. Kinetic Modeling and Meta-Analysis of the Bacillus subtilis SigB Regulon during Spore Germination and Outgrowth. Microorganisms 2021; 9:microorganisms9010112. [PMID: 33466511 PMCID: PMC7824861 DOI: 10.3390/microorganisms9010112] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 12/21/2020] [Accepted: 12/29/2020] [Indexed: 11/16/2022] Open
Abstract
The exponential increase in the number of conducted studies combined with the development of sequencing methods have led to an enormous accumulation of partially processed experimental data in the past two decades. Here, we present an approach using literature-mined data complemented with gene expression kinetic modeling and promoter sequence analysis. This approach allowed us to identify the regulon of Bacillus subtilis sigma factor SigB of RNA polymerase (RNAP) specifically expressed during germination and outgrowth. SigB is critical for the cell's response to general stress but is also expressed during spore germination and outgrowth, and this specific regulon is not known. This approach allowed us to (i) define a subset of the known SigB regulon controlled by SigB specifically during spore germination and outgrowth, (ii) identify the influence of the promoter sequence binding motif organization on the expression of the SigB-regulated genes, and (iii) suggest additional sigma factors co-controlling other SigB-dependent genes. Experiments then validated promoter sequence characteristics necessary for direct RNAP-SigB binding. In summary, this work documents the potential of computational approaches to unravel new information even for a well-studied system; moreover, the study specifically identifies the subset of the SigB regulon, which is activated during germination and outgrowth.
Collapse
Affiliation(s)
- Jiri Vohradsky
- Laboratory of Bioinformatics, Institute of Microbiology of the Czech Academy of Sciences, Vídeňská 1083, 14220 Prague, Czech Republic;
- Correspondence:
| | - Marek Schwarz
- Laboratory of Bioinformatics, Institute of Microbiology of the Czech Academy of Sciences, Vídeňská 1083, 14220 Prague, Czech Republic;
| | - Olga Ramaniuk
- Laboratory of Microbial Genetics and Gene Expression, Institute of Microbiology of the Czech Academy of Sciences, Vídeňská 1083, 14220 Prague, Czech Republic; (O.R.); (O.R.-L.); (V.V.H.); (H.Š.); (L.K.)
| | - Olatz Ruiz-Larrabeiti
- Laboratory of Microbial Genetics and Gene Expression, Institute of Microbiology of the Czech Academy of Sciences, Vídeňská 1083, 14220 Prague, Czech Republic; (O.R.); (O.R.-L.); (V.V.H.); (H.Š.); (L.K.)
- Bacterial Stress Response Research Group, Department of Immunology, Microbiology and Parasitology, University of the Basque Country UPV/EHU, 48940 Leioa, Spain
| | - Viola Vaňková Hausnerová
- Laboratory of Microbial Genetics and Gene Expression, Institute of Microbiology of the Czech Academy of Sciences, Vídeňská 1083, 14220 Prague, Czech Republic; (O.R.); (O.R.-L.); (V.V.H.); (H.Š.); (L.K.)
| | - Hana Šanderová
- Laboratory of Microbial Genetics and Gene Expression, Institute of Microbiology of the Czech Academy of Sciences, Vídeňská 1083, 14220 Prague, Czech Republic; (O.R.); (O.R.-L.); (V.V.H.); (H.Š.); (L.K.)
| | - Libor Krásný
- Laboratory of Microbial Genetics and Gene Expression, Institute of Microbiology of the Czech Academy of Sciences, Vídeňská 1083, 14220 Prague, Czech Republic; (O.R.); (O.R.-L.); (V.V.H.); (H.Š.); (L.K.)
| |
Collapse
|
37
|
|
38
|
Zaborowski AB, Walther D. Determinants of correlated expression of transcription factors and their target genes. Nucleic Acids Res 2020; 48:11347-11369. [PMID: 33104784 PMCID: PMC7672440 DOI: 10.1093/nar/gkaa927] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 10/01/2020] [Accepted: 10/06/2020] [Indexed: 11/14/2022] Open
Abstract
While transcription factors (TFs) are known to regulate the expression of their target genes (TGs), only a weak correlation of expression between TFs and their TGs has generally been observed. As lack of correlation could be caused by additional layers of regulation, the overall correlation distribution may hide the presence of a subset of regulatory TF-TG pairs with tight expression coupling. Using reported regulatory pairs in the plant Arabidopsis thaliana along with comprehensive gene expression information and testing a wide array of molecular features, we aimed to discern the molecular determinants of high expression correlation of TFs and their TGs. TF-family assignment, stress-response process involvement, short genomic distances of the TF-binding sites to the transcription start site of their TGs, few required protein-protein-interaction connections to establish physical interactions between the TF and polymerase-II, unambiguous TF-binding motifs, increased numbers of miRNA target-sites in TF-mRNAs, and a young evolutionary age of TGs were found particularly indicative of high TF-TG correlation. The modulating roles of post-transcriptional, post-translational processes, and epigenetic factors have been characterized as well. Our study reveals that regulatory pairs with high expression coupling are associated with specific molecular determinants.
Collapse
Affiliation(s)
- Adam B Zaborowski
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| | - Dirk Walther
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| |
Collapse
|
39
|
Nepomuceno-Chamorro IA, Nepomuceno JA, Galván-Rojas JL, Vega-Márquez B, Rubio-Escudero C. Using prior knowledge in the inference of gene association networks. APPL INTELL 2020. [DOI: 10.1007/s10489-020-01705-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
40
|
Liu W, Sun X, Peng L, Zhou L, Lin H, Jiang Y. RWRNET: A Gene Regulatory Network Inference Algorithm Using Random Walk With Restart. Front Genet 2020; 11:591461. [PMID: 33101398 PMCID: PMC7545090 DOI: 10.3389/fgene.2020.591461] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 09/02/2020] [Indexed: 11/30/2022] Open
Abstract
Inferring gene regulatory networks from expression data is essential in identifying complex regulatory relationships among genes and revealing the mechanism of certain diseases. Various computation methods have been developed for inferring gene regulatory networks. However, these methods focus on the local topology of the network rather than on the global topology. From network optimisation standpoint, emphasising the global topology of the network also reduces redundant regulatory relationships. In this study, we propose a novel network inference algorithm using Random Walk with Restart (RWRNET) that combines local and global topology relationships. The method first captures the local topology through three elements of random walk and then combines the local topology with the global topology by Random Walk with Restart. The Markov Blanket discovery algorithm is then used to deal with isolated genes. The proposed method is compared with several state-of-the-art methods on the basis of six benchmark datasets. Experimental results demonstrated the effectiveness of the proposed method.
Collapse
Affiliation(s)
- Wei Liu
- School of Computer Science, Xiangtan University, Xiangtan, China.,Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, China
| | - Xingen Sun
- School of Computer Science, Xiangtan University, Xiangtan, China
| | - Li Peng
- School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, China
| | - Lili Zhou
- School of Computer Science, Xiangtan University, Xiangtan, China
| | - Hui Lin
- School of Computer Science, Xiangtan University, Xiangtan, China
| | - Yi Jiang
- School of Computer Science, Xiangtan University, Xiangtan, China
| |
Collapse
|
41
|
Rengasamy M, Zhong Y, Marsland A, Chen K, Douaihy A, Brent D, Melhem NM. Signaling networks in inflammatory pathways and risk for suicidal behavior. Brain Behav Immun Health 2020; 7:100122. [PMID: 33791683 PMCID: PMC8009526 DOI: 10.1016/j.bbih.2020.100122] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 07/23/2020] [Accepted: 07/28/2020] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Suicide is a leading cause of death in the young adult population, with few biological markers identified thus far to be associated with suicidality. Cytokines (including IL-6 and TNFα) may contribute to increased risk for depression and suicidality. Few studies have examined the associations of cytokine mRNA expression with depression and suicidal ideation and behavior. This study examines these associations and whether cytokine signaling networks differentiate suicide attempters (SA), suicide ideators (SI), and healthy controls (HC). METHODS Cytokine pathway marker (CPM; e.g. cytokines and proteins in cytokine signaling pathways) mRNA gene expression in whole blood was examined in suicide attempters (n = 38), suicide ideators (n = 38), and healthy controls (n = 36). Between-group differences in CPM gene expression were examined. We also examined association of the mRNA of these genes with the severity of depression and suicidal ideation. Novel Gaussian Graphical Model (GGM) techniques were utilized to examine between-network partial correlation differences in cytokine signaling networks relevant to IL-6 and TNFα signaling pathways. RESULTS The severity of depression symptoms was positively associated with TNFα mRNA levels and negatively associated with IL-10 mRNA levels, but CPM expression was not associated with suicidal ideation severity. There were no between-group differences in CPM markers among healthy controls, SI and SA groups after correcting for multiple comparisons. In network analyses, we found suggestive results of between-group network differences between SI and control groups in gene pairs with IL-6R and STAT3 as common nodes. DISCUSSION In a cohort of suicide attempters and ideators, TNFα and IL-10 mRNA levels appear to be associated with depressive symptomology, consistent with elevation of pro-inflammatory cytokine production and reduction of anti-inflammatory cytokine production. Additionally, cytokine signaling networks may differentiate suicide ideators from healthy controls based on between-network differences, with differences possibly related to relationships of IL6R or STAT3 with other components of cytokine signaling networks.
Collapse
Affiliation(s)
- Manivel Rengasamy
- Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Yongqi Zhong
- Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
- University of Pittsburgh Graduate School of Public Health, Department of Epidemiology, Pittsburgh, PA, USA
| | - Anna Marsland
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Kehui Chen
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Antoine Douaihy
- Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - David Brent
- Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Nadine M. Melhem
- Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| |
Collapse
|
42
|
Zandi E, Ayatollahi Mehrgardi A, Esmailizadeh A. Mammary tissue transcriptomic analysis for construction of integrated regulatory networks involved in lactogenesis of Ovis aries. Genomics 2020; 112:4277-4287. [PMID: 32693106 DOI: 10.1016/j.ygeno.2020.07.025] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 06/19/2020] [Accepted: 07/13/2020] [Indexed: 10/23/2022]
Abstract
The mammary gland experiences vast changes between the onset of lactation and pregnancy. This remodeling involves different functions such as lactation that is controlled by innumerable regulators and various gene networks which are still not completely understood. MicroRNAs (miRNAs) are one of the important non-coding gene regulators which control an extensive range of biological processes. Thus, exploring miRNAs functions is important for solving gene regulation complexity. The main purpose in the present study is to identify the various gene regulative integrated networks involved in lactation progress in mammary gland. We analyzed ovine mammary tissue data sets which included expression profiles of mRNA (genes) and miRNAs related to six ewes in different days of lactation and nutritional treatments. We combined two different types of information: the network that is module inference by mRNAs (RNA-seq data), miRNAs and transcription factors (TFs) expression matrix and prediction of targets via computational methods. To discover the miRNAs regulatory function, 134 modules were predicted by using gene expression data and 14 TFs and 20 miRNAs were allocated to these predicted modules. By applying this integrated computation-based method, 38 miRNA-modules and 35 TF-module interactions were identified from ovine mammary tissue data during lactogenesis. A lot of these modules were involved in lipid and protein metabolism, as well as steroids and vitamin biosynthesis, which would play key roles in mammary tissue and lactation development. These results present new information about the regulatory procedures at the miRNAs and TF levels throughout lactation.
Collapse
Affiliation(s)
- Elmira Zandi
- Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, PB 76169-133, Iran; Yong Researchers Society, Shahid Bahonar University of Kerman, PB 76169-133, Kerman, Iran
| | - Ahmad Ayatollahi Mehrgardi
- Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, PB 76169-133, Iran
| | - Ali Esmailizadeh
- Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, PB 76169-133, Iran.
| |
Collapse
|
43
|
Chowdhury HA, Bhattacharyya DK, Kalita JK. (Differential) Co-Expression Analysis of Gene Expression: A Survey of Best Practices. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1154-1173. [PMID: 30668502 DOI: 10.1109/tcbb.2019.2893170] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Analysis of gene expression data is widely used in transcriptomic studies to understand functions of molecules inside a cell and interactions among molecules. Differential co-expression analysis studies diseases and phenotypic variations by finding modules of genes whose co-expression patterns vary across conditions. We review the best practices in gene expression data analysis in terms of analysis of (differential) co-expression, co-expression network, differential networking, and differential connectivity considering both microarray and RNA-seq data along with comparisons. We highlight hurdles in RNA-seq data analysis using methods developed for microarrays. We include discussion of necessary tools for gene expression analysis throughout the paper. In addition, we shed light on scRNA-seq data analysis by including preprocessing and scRNA-seq in co-expression analysis along with useful tools specific to scRNA-seq. To get insights, biological interpretation and functional profiling is included. Finally, we provide guidelines for the analyst, along with research issues and challenges which should be addressed.
Collapse
|
44
|
Williams DR, Rast P. Back to the basics: Rethinking partial correlation network methodology. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2020; 73:187-212. [PMID: 31206621 PMCID: PMC8572131 DOI: 10.1111/bmsp.12173] [Citation(s) in RCA: 95] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 03/02/2019] [Indexed: 05/08/2023]
Abstract
The Gaussian graphical model (GGM) is an increasingly popular technique used in psychology to characterize relationships among observed variables. These relationships are represented as elements in the precision matrix. Standardizing the precision matrix and reversing the sign yields corresponding partial correlations that imply pairwise dependencies in which the effects of all other variables have been controlled for. The graphical lasso (glasso) has emerged as the default estimation method, which uses ℓ1 -based regularization. The glasso was developed and optimized for high-dimensional settings where the number of variables (p) exceeds the number of observations (n), which is uncommon in psychological applications. Here we propose to go 'back to the basics', wherein the precision matrix is first estimated with non-regularized maximum likelihood and then Fisher Z transformed confidence intervals are used to determine non-zero relationships. We first show the exact correspondence between the confidence level and specificity, which is due to 1 minus specificity denoting the false positive rate (i.e., α). With simulations in low-dimensional settings (p ≪ n), we then demonstrate superior performance compared to the glasso for detecting the non-zero effects. Further, our results indicate that the glasso is inconsistent for the purpose of model selection and does not control the false discovery rate, whereas the proposed method converges on the true model and directly controls error rates. We end by discussing implications for estimating GGMs in psychology.
Collapse
|
45
|
Serra A, Fratello M, Cattelani L, Liampa I, Melagraki G, Kohonen P, Nymark P, Federico A, Kinaret PAS, Jagiello K, Ha MK, Choi JS, Sanabria N, Gulumian M, Puzyn T, Yoon TH, Sarimveis H, Grafström R, Afantitis A, Greco D. Transcriptomics in Toxicogenomics, Part III: Data Modelling for Risk Assessment. NANOMATERIALS (BASEL, SWITZERLAND) 2020; 10:E708. [PMID: 32276469 PMCID: PMC7221955 DOI: 10.3390/nano10040708] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Revised: 03/25/2020] [Accepted: 03/26/2020] [Indexed: 12/30/2022]
Abstract
Transcriptomics data are relevant to address a number of challenges in Toxicogenomics (TGx). After careful planning of exposure conditions and data preprocessing, the TGx data can be used in predictive toxicology, where more advanced modelling techniques are applied. The large volume of molecular profiles produced by omics-based technologies allows the development and application of artificial intelligence (AI) methods in TGx. Indeed, the publicly available omics datasets are constantly increasing together with a plethora of different methods that are made available to facilitate their analysis, interpretation and the generation of accurate and stable predictive models. In this review, we present the state-of-the-art of data modelling applied to transcriptomics data in TGx. We show how the benchmark dose (BMD) analysis can be applied to TGx data. We review read across and adverse outcome pathways (AOP) modelling methodologies. We discuss how network-based approaches can be successfully employed to clarify the mechanism of action (MOA) or specific biomarkers of exposure. We also describe the main AI methodologies applied to TGx data to create predictive classification and regression models and we address current challenges. Finally, we present a short description of deep learning (DL) and data integration methodologies applied in these contexts. Modelling of TGx data represents a valuable tool for more accurate chemical safety assessment. This review is the third part of a three-article series on Transcriptomics in Toxicogenomics.
Collapse
Affiliation(s)
- Angela Serra
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.S.); (M.F.); (L.C.); (A.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
| | - Michele Fratello
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.S.); (M.F.); (L.C.); (A.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
| | - Luca Cattelani
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.S.); (M.F.); (L.C.); (A.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
| | - Irene Liampa
- School of Chemical Engineering, National Technical University of Athens, 157 80 Athens, Greece; (I.L.); (H.S.)
| | - Georgia Melagraki
- Nanoinformatics Department, NovaMechanics Ltd., Nicosia 1065, Cyprus; (G.M.); (A.A.)
| | - Pekka Kohonen
- Institute of Environmental Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden; (P.K.); (P.N.); (R.G.)
- Division of Toxicology, Misvik Biology, 20520 Turku, Finland
| | - Penny Nymark
- Institute of Environmental Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden; (P.K.); (P.N.); (R.G.)
- Division of Toxicology, Misvik Biology, 20520 Turku, Finland
| | - Antonio Federico
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.S.); (M.F.); (L.C.); (A.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
| | - Pia Anneli Sofia Kinaret
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.S.); (M.F.); (L.C.); (A.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
- Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland
| | - Karolina Jagiello
- QSAR Lab Ltd., Aleja Grunwaldzka 190/102, 80-266 Gdansk, Poland; (K.J.); (T.P.)
- University of Gdansk, Faculty of Chemistry, Wita Stwosza 63, 80-308 Gdansk, Poland
| | - My Kieu Ha
- Center for Next Generation Cytometry, Hanyang University, Seoul 04763, Korea; (M.K.H.); (J.-S.C.); (T.-H.Y.)
- Department of Chemistry, College of Natural Sciences, Hanyang University, Seoul 04763, Korea
- Institute of Next Generation Material Design, Hanyang University, Seoul 04763, Korea
| | - Jang-Sik Choi
- Center for Next Generation Cytometry, Hanyang University, Seoul 04763, Korea; (M.K.H.); (J.-S.C.); (T.-H.Y.)
- Department of Chemistry, College of Natural Sciences, Hanyang University, Seoul 04763, Korea
- Institute of Next Generation Material Design, Hanyang University, Seoul 04763, Korea
| | - Natasha Sanabria
- National Institute for Occupational Health, Johannesburg 30333, South Africa; (N.S.); (M.G.)
| | - Mary Gulumian
- National Institute for Occupational Health, Johannesburg 30333, South Africa; (N.S.); (M.G.)
- Haematology and Molecular Medicine Department, School of Pathology, University of the Witwatersrand, Johannesburg 2050, South Africa
| | - Tomasz Puzyn
- QSAR Lab Ltd., Aleja Grunwaldzka 190/102, 80-266 Gdansk, Poland; (K.J.); (T.P.)
- University of Gdansk, Faculty of Chemistry, Wita Stwosza 63, 80-308 Gdansk, Poland
| | - Tae-Hyun Yoon
- Center for Next Generation Cytometry, Hanyang University, Seoul 04763, Korea; (M.K.H.); (J.-S.C.); (T.-H.Y.)
- Department of Chemistry, College of Natural Sciences, Hanyang University, Seoul 04763, Korea
- Institute of Next Generation Material Design, Hanyang University, Seoul 04763, Korea
| | - Haralambos Sarimveis
- School of Chemical Engineering, National Technical University of Athens, 157 80 Athens, Greece; (I.L.); (H.S.)
| | - Roland Grafström
- Institute of Environmental Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden; (P.K.); (P.N.); (R.G.)
- Division of Toxicology, Misvik Biology, 20520 Turku, Finland
| | - Antreas Afantitis
- Nanoinformatics Department, NovaMechanics Ltd., Nicosia 1065, Cyprus; (G.M.); (A.A.)
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.S.); (M.F.); (L.C.); (A.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
- Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland
| |
Collapse
|
46
|
Singh U, Hur M, Dorman K, Wurtele ES. MetaOmGraph: a workbench for interactive exploratory data analysis of large expression datasets. Nucleic Acids Res 2020; 48:e23. [PMID: 31956905 PMCID: PMC7039010 DOI: 10.1093/nar/gkz1209] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Revised: 12/05/2019] [Accepted: 12/17/2019] [Indexed: 12/17/2022] Open
Abstract
The diverse and growing omics data in public domains provide researchers with tremendous opportunity to extract hidden, yet undiscovered, knowledge. However, the vast majority of archived data remain unused. Here, we present MetaOmGraph (MOG), a free, open-source, standalone software for exploratory analysis of massive datasets. Researchers, without coding, can interactively visualize and evaluate data in the context of its metadata, honing-in on groups of samples or genes based on attributes such as expression values, statistical associations, metadata terms and ontology annotations. Interaction with data is easy via interactive visualizations such as line charts, box plots, scatter plots, histograms and volcano plots. Statistical analyses include co-expression analysis, differential expression analysis and differential correlation analysis, with significance tests. Researchers can send data subsets to R for additional analyses. Multithreading and indexing enable efficient big data analysis. A researcher can create new MOG projects from any numerical data; or explore an existing MOG project. MOG projects, with history of explorations, can be saved and shared. We illustrate MOG by case studies of large curated datasets from human cancer RNA-Seq, where we identify novel putative biomarker genes in different tumors, and microarray and metabolomics data from Arabidopsis thaliana. MOG executable and code: http://metnetweb.gdcb.iastate.edu/ and https://github.com/urmi-21/MetaOmGraph/.
Collapse
Affiliation(s)
- Urminder Singh
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
- Center for Metabolic Biology, Iowa State University, Ames, IA 50011, USA
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
| | - Manhoi Hur
- Center for Metabolic Biology, Iowa State University, Ames, IA 50011, USA
| | - Karin Dorman
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
- Department of Statistics, Iowa State University, Ames, IA 50011, USA
| | - Eve Syrkin Wurtele
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
- Center for Metabolic Biology, Iowa State University, Ames, IA 50011, USA
- Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
47
|
Law SR, Kellgren TG, Björk R, Ryden P, Keech O. Centralization Within Sub-Experiments Enhances the Biological Relevance of Gene Co-expression Networks: A Plant Mitochondrial Case Study. FRONTIERS IN PLANT SCIENCE 2020; 11:524. [PMID: 32582224 PMCID: PMC7287149 DOI: 10.3389/fpls.2020.00524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Accepted: 04/07/2020] [Indexed: 05/07/2023]
Abstract
UNLABELLED Gene co-expression networks (GCNs) can be prepared using a variety of mathematical approaches based on data sampled across diverse developmental processes, tissue types, pathologies, mutant backgrounds, and stress conditions. These networks are used to identify genes with similar expression dynamics but are prone to introducing false-positive and false-negative relationships, especially in the instance of large and heterogenous datasets. With the aim of optimizing the relevance of edges in GCNs and enhancing global biological insight, we propose a novel approach that involves a data-centering step performed simultaneously per gene and per sub-experiment, called centralization within sub-experiments (CSE). Using a gene set encoding the plant mitochondrial proteome as a case study, our results show that all CSE-based GCNs assessed had significantly more edges within the majority of the considered functional sub-networks, such as the mitochondrial electron transport chain and its complexes, than GCNs not using CSE; thus demonstrating that CSE-based GCNs are efficient at predicting canonical functions and associated pathways, here referred to as the core gene network. Furthermore, we show that correlation analyses using CSE-processed data can be used to fine-tune prediction of the function of uncharacterized genes; while its use in combination with analyses based on non-CSE data can augment conventional stress analyses with the innate connections underpinning the dynamic system being examined. Therefore, CSE is an effective alternative method to conventional batch correction approaches, particularly when dealing with large and heterogenous datasets. The method is easy to implement into a pre-existing GCN analysis pipeline and can provide enhanced biological relevance to conventional GCNs by allowing users to delineate a core gene network. AUTHOR SUMMARY Gene co-expression networks (GCNs) are the product of a variety of mathematical approaches that identify causal relationships in gene expression dynamics but are prone to the misdiagnoses of false-positives and false-negatives, especially in the instance of large and heterogenous datasets. In light of the burgeoning output of next-generation sequencing projects performed on a variety of species, and developmental or clinical conditions; the statistical power and complexity of these networks will undoubtedly increase, while their biological relevance will be fiercely challenged. Here, we propose a novel approach to generate a "core" GCN with enhanced biological relevance. Our method involves a data-centering step that effectively removes all primary treatment/tissue effects, which is simple to employ and can be easily implemented into pre-existing GCN analysis pipelines. The gain in biological relevance resulting from the adoption of this approach was assessed using a plant mitochondrial case study.
Collapse
Affiliation(s)
- Simon R. Law
- Department of Plant Physiology, Umeå Plant Science Centre, Umeå Universitet, Umeå, Sweden
| | - Therese G. Kellgren
- Department of Mathematics and Mathematical Statistics, Umeå Universitet, Umeå, Sweden
| | - Rafael Björk
- Department of Mathematics and Mathematical Statistics, Umeå Universitet, Umeå, Sweden
| | - Patrik Ryden
- Department of Mathematics and Mathematical Statistics, Umeå Universitet, Umeå, Sweden
- *Correspondence: Patrik Ryden,
| | - Olivier Keech
- Department of Plant Physiology, Umeå Plant Science Centre, Umeå Universitet, Umeå, Sweden
- Olivier Keech,
| |
Collapse
|
48
|
Liu J, Tian Z, Xiao Y, Liu H, Hao S, Zhang X, Wang C, Sun J, Yu H, Yan J. Gene Regulatory Relationship Mining Using Improved Three-Phase Dependency Analysis Approach. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:339-346. [PMID: 30281476 DOI: 10.1109/tcbb.2018.2872993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
How to mine the gene regulatory relationship and construct gene regulatory network (GRN) is of utmost interest within the whole biological community, however, which has been consistently a challenging problem since the tremendous complexity in cellular systems. In present work, we construct gene regulatory network using an improved three-phase dependency analysis algorithm (TPDA) Bayesian network learning method, which includes the steps of Drafting, Thickening, and Thinning. In order to solve the problem of learning result is not reliable due to the high order conditional independence test, we use the entropy estimation approach of Gaussian kernel probability density estimator to calculate the (conditional) mutual information between genes. The experiment on the public benchmark data sets show the improved method outperforms the other nine kinds of Bayesian network learning methods when to process the data with large sample size, with small number of discrete values, and the frequency of different discrete values is about same. In addition, the improved TPDA method was further applied on a real large gene expression data set on RNA-seq from a global collection with 368 elite maize inbred lines. Experiment results show it performs better than the original TPDA method and the other nine kinds of Bayesian network learning algorithms significantly.
Collapse
|
49
|
Kuismin M, Saatoglu D, Niskanen AK, Jensen H, Sillanpää MJ. Genetic assignment of individuals to source populations using network estimation tools. Methods Ecol Evol 2019. [DOI: 10.1111/2041-210x.13323] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Markku Kuismin
- Research Unit of Mathematical Sciences University of Oulu Oulu Finland
- Biocenter Oulu University of Oulu Oulu Finland
| | - Dilan Saatoglu
- Centre for Biodiversity Dynamics Department of Biology Norwegian University of Science and Technology Trondheim Norway
| | - Alina K. Niskanen
- Centre for Biodiversity Dynamics Department of Biology Norwegian University of Science and Technology Trondheim Norway
- Ecology and Genetics Research Unit University of Oulu Oulu Finland
| | - Henrik Jensen
- Centre for Biodiversity Dynamics Department of Biology Norwegian University of Science and Technology Trondheim Norway
| | - Mikko J. Sillanpää
- Research Unit of Mathematical Sciences University of Oulu Oulu Finland
- Biocenter Oulu University of Oulu Oulu Finland
- Infotech Oulu University of Oulu Oulu Finland
| |
Collapse
|
50
|
Staunton PM, Miranda-CasoLuengo AA, Loftus BJ, Gormley IC. BINDER: computationally inferring a gene regulatory network for Mycobacterium abscessus. BMC Bioinformatics 2019; 20:466. [PMID: 31500560 PMCID: PMC6734328 DOI: 10.1186/s12859-019-3042-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 08/21/2019] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Although many of the genic features in Mycobacterium abscessus have been fully validated, a comprehensive understanding of the regulatory elements remains lacking. Moreover, there is little understanding of how the organism regulates its transcriptomic profile, enabling cells to survive in hostile environments. Here, to computationally infer the gene regulatory network for Mycobacterium abscessus we propose a novel statistical computational modelling approach: BayesIan gene regulatory Networks inferreD via gene coExpression and compaRative genomics (BINDER). In tandem with derived experimental coexpression data, the property of genomic conservation is exploited to probabilistically infer a gene regulatory network in Mycobacterium abscessus.Inference on regulatory interactions is conducted by combining 'primary' and 'auxiliary' data strata. The data forming the primary and auxiliary strata are derived from RNA-seq experiments and sequence information in the primary organism Mycobacterium abscessus as well as ChIP-seq data extracted from a related proxy organism Mycobacterium tuberculosis. The primary and auxiliary data are combined in a hierarchical Bayesian framework, informing the apposite bivariate likelihood function and prior distributions respectively. The inferred relationships provide insight to regulon groupings in Mycobacterium abscessus. RESULTS We implement BINDER on data relating to a collection of 167,280 regulator-target pairs resulting in the identification of 54 regulator-target pairs, across 5 transcription factors, for which there is strong probability of regulatory interaction. CONCLUSIONS The inferred regulatory interactions provide insight to, and a valuable resource for further studies of, transcriptional control in Mycobacterium abscessus, and in the family of Mycobacteriaceae more generally. Further, the developed BINDER framework has broad applicability, useable in settings where computational inference of a gene regulatory network requires integration of data sources derived from both the primary organism of interest and from related proxy organisms.
Collapse
Affiliation(s)
- Patrick M. Staunton
- School of Medicine, Conway Institute, University College Dublin, Dublin, Ireland
| | | | - Brendan J. Loftus
- School of Medicine, Conway Institute, University College Dublin, Dublin, Ireland
| | - Isobel Claire Gormley
- School of Mathematics and Statistics, Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland
| |
Collapse
|