1
|
Hammond J, Smith VA. Bayesian networks for network inference in biology. J R Soc Interface 2025; 22:20240893. [PMID: 40328299 PMCID: PMC12055290 DOI: 10.1098/rsif.2024.0893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2024] [Revised: 02/14/2025] [Accepted: 02/20/2025] [Indexed: 05/08/2025] Open
Abstract
Bayesian networks (BNs) have been used for reconstructing interactions from biological data, in disciplines ranging from molecular biology to ecology and neuroscience. BNs learn conditional dependencies between variables, which best 'explain' the data, represented as a directed graph which approximates the relationships between variables. In the 2000s, BNs were a popular method that promised an approach capable of inferring biological networks from data. Here, we review the use of BNs applied to biological data over the past two decades and evaluate their efficacy. We find that BNs are successful in inferring biological networks, frequently identifying novel interactions or network components missed by previous analyses. We suggest that as false positive results are underreported, it is difficult to assess the accuracy of BNs in inferring biological networks. BN learning appears most successful for small numbers of variables with high-quality datasets that either discretize the data into few states or include perturbative data. We suggest that BNs have failed to live up to the promise of the 2000s but that this is most likely due to experimental constraints on datasets, and the success of BNs at inferring networks in a variety of biological contexts suggests they are a powerful tool for biologists.
Collapse
Affiliation(s)
- James Hammond
- Department of Biology, University of Oxford, Oxford, UK
- School of Biology, University of St Andrews, St Andrews, UK
| | - V. Anne Smith
- School of Biology, University of St Andrews, St Andrews, UK
| |
Collapse
|
2
|
Chen L, Acharyya S, Luo C, Ni Y, Baladandayuthapani V. A probabilistic modeling framework for genomic networks incorporating sample heterogeneity. CELL REPORTS METHODS 2025; 5:100984. [PMID: 39954675 PMCID: PMC11955270 DOI: 10.1016/j.crmeth.2025.100984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 10/28/2024] [Accepted: 01/24/2025] [Indexed: 02/17/2025]
Abstract
Probabilistic graphical models are powerful tools to quantify, visualize, and interpret network dependencies in complex biological systems such as high-throughput -omics. However, many graphical models assume sample homogeneity, limiting their effectiveness. We propose a flexible Bayesian approach called graphical regression (GraphR), which (1) incorporates sample heterogeneity at different scales through a regression-based formulation, (2) enables sparse sample-specific network estimation, (3) identifies and quantifies potential effects of heterogeneity on network structures, and (4) achieves computational efficiency via variational Bayes algorithms. We illustrate the comparative efficiency of GraphR against existing state-of-the-art methods in terms of network structure recovery and computational cost across multiple settings. We use GraphR to analyze three multi-omic and spatial transcriptomic datasets to investigate inter- and intra-sample molecular networks and delineate biological discoveries that otherwise cannot be revealed by existing approaches. We have developed a GraphR R package along with an accompanying Shiny App that provides comprehensive analysis and dynamic visualization functions.
Collapse
Affiliation(s)
- Liying Chen
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Satwik Acharyya
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Chunyu Luo
- Division of Biostatistics, University of Pennsylvania, Philadelphia, PA, USA
| | - Yang Ni
- Department of Statistics, Texas A&M University, College Station, TX, USA
| | | |
Collapse
|
3
|
Lechon T, Kent NA, Murray JAH, Scofield S. Regulation of meristem and hormone function revealed through analysis of directly-regulated SHOOT MERISTEMLESS target genes. Sci Rep 2025; 15:240. [PMID: 39747964 PMCID: PMC11696002 DOI: 10.1038/s41598-024-83985-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Accepted: 12/18/2024] [Indexed: 01/04/2025] Open
Abstract
The Arabidopsis Knotted1-like homeobox (KNOX) gene SHOOT MERISTEMLESS (STM) encodes a homeodomain transcription factor that operates as a central component of the gene regulatory network (GRN) controlling shoot apical meristem formation and maintenance. It regulates the expression of target genes that include transcriptional regulators associated with meristem function, particularly those involved in pluripotency and cellular differentiation, as well as genes involved in hormone metabolism and signaling. Previous studies have identified KNOX-regulated genes and their associated cis-regulatory elements in several plant species. However, little is known about STM-DNA interactions in the regulatory regions of target genes in Arabidopsis. Here, we identify and map STM binding sites in the Arabidopsis genome using global ChIP-seq analysis to reveal potential directly-regulated STM target genes. We show that in the majority of target loci, STM binds within 1 kb upstream of the TSS, with other loci showing STM binding at more distal enhancer sites, and we reveal enrichment of DNA motifs containing a TGAC and/or TGAT core in STM-bound target gene cis-regulatory elements. We further demonstrate that many STM-bound genes are transcriptionally responsive to altered levels of STM activity, and show that among these, transcriptional regulators with key roles in meristem and hormone function are highly represented. Finally, we use a subset of these target genes to perform Bayesian network analysis to infer gene regulatory associations and to construct a refined GRN for STM-mediated control of meristem function.
Collapse
Affiliation(s)
- Tamara Lechon
- School of Biosciences, Cardiff University, Cardiff, CF10 3AX, UK
| | - Nicholas A Kent
- School of Biosciences, Cardiff University, Cardiff, CF10 3AX, UK
| | - James A H Murray
- School of Biosciences, Cardiff University, Cardiff, CF10 3AX, UK
| | - Simon Scofield
- School of Biosciences, Cardiff University, Cardiff, CF10 3AX, UK.
| |
Collapse
|
4
|
Liu Z, Lin H, Li X, Xue H, Lu Y, Xu F, Shuai J. The network structural entropy for single-cell RNA sequencing data during skin aging. Brief Bioinform 2024; 26:bbae698. [PMID: 39757115 DOI: 10.1093/bib/bbae698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2024] [Revised: 11/29/2024] [Accepted: 12/18/2024] [Indexed: 01/07/2025] Open
Abstract
Aging is a complex and heterogeneous biological process at cellular, tissue, and individual levels. Despite extensive effort in scientific research, a comprehensive understanding of aging mechanisms remains lacking. This study analyzed aging-related gene networks, using single-cell RNA sequencing data from >15 000 cells. We constructed a gene correlation network, integrating gene expressions into the weights of network edges, and ranked gene importance using a random walk model to generate a gene importance matrix. This unsupervised method improved the clustering performance of cell types. To further quantify the complexity of gene networks during aging, we introduced network structural entropy. The findings of our study reveal that the overall network structural entropy increases in the aged cells compared to the young cells. However, network entropy changes varied greatly within different cell subtypes. Specifically, the network structural entropy among various cell types may increase, remain unchanged, or decrease. This wide range of changes may be closely related to their individual functions, highlighting the cellular heterogeneity and potential key network reconfigurations. Analyzing gene network entropy provides insights into the molecular mechanisms behind aging. This study offers new scientific evidence and theoretical support for understanding the changes in cell functions during aging.
Collapse
Affiliation(s)
- Zhilong Liu
- Department of Physics, Xiamen University, No. 422, Siming South Road, Xiamen, Fujian, 361005, China
| | - Hai Lin
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), No. 999, Jinshi Road, Yongzhong Street, Longwan District, Wenzhou, Zhejiang, 325000, China; Wenzhou Institute, University of Chinese Academy of Sciences, No. 1, Jinlian Road, Longwan District, Wenzhou, Zhejiang, 325000, China
| | - Xiang Li
- Department of Physics, Xiamen University, No. 422, Siming South Road, Xiamen, Fujian, 361005, China
| | - Hao Xue
- Department of Computational Biology, Cornell University, 110 Biotechnology Building, Ithaca, 14853 NY, United States
| | - Yuer Lu
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), No. 999, Jinshi Road, Yongzhong Street, Longwan District, Wenzhou, Zhejiang, 325000, China; Wenzhou Institute, University of Chinese Academy of Sciences, No. 1, Jinlian Road, Longwan District, Wenzhou, Zhejiang, 325000, China
| | - Fei Xu
- Department of Physics, Anhui Normal University, No. 189 Jiuhua South Road, Wuhu, Anhui, 241002, China
| | - Jianwei Shuai
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), No. 999, Jinshi Road, Yongzhong Street, Longwan District, Wenzhou, Zhejiang, 325000, China; Wenzhou Institute, University of Chinese Academy of Sciences, No. 1, Jinlian Road, Longwan District, Wenzhou, Zhejiang, 325000, China
| |
Collapse
|
5
|
Li Y, Scheel-Sailer A, Riener R, Paez-Granados D. Mixed-variable graphical modeling framework towards risk prediction of hospital-acquired pressure injury in spinal cord injury individuals. Sci Rep 2024; 14:25067. [PMID: 39443567 PMCID: PMC11499609 DOI: 10.1038/s41598-024-75691-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Accepted: 10/08/2024] [Indexed: 10/25/2024] Open
Abstract
Developing machine learning (ML) methods for healthcare predictive modeling requires absolute explainability and transparency to build trust and accountability. Graphical models (GM) are key tools for this but face challenges like small sample sizes, mixed variables, and latent confounders. This paper presents a novel learning framework addressing these challenges by integrating latent variables using fast causal inference (FCI), accommodating mixed variables with predictive permutation conditional independence tests (PPCIT), and employing a systematic graphical embedding approach leveraging expert knowledge. This method ensures a transparent model structure and an explainable feature selection and modeling approach, achieving competitive prediction performance. For real-world validation, data of hospital-acquired pressure injuries (HAPI) among individuals with spinal cord injury (SCI) were used, where the approach achieved a balanced accuracy of 0.941 and an AUC of 0.983, outperforming most benchmarks. The PPCIT method also demonstrated superior accuracy and scalability over other benchmarks in causal discovery validation on synthetic datasets that closely resemble our real dataset. This holistic framework effectively addresses the challenges of mixed variables and explainable predictive modeling for disease onset, which is crucial for enabling transparency and interpretability in ML-based healthcare.
Collapse
Affiliation(s)
- Yanke Li
- Department of Health Science and Technology, ETH Zürich, 8006, Zürich, Switzerland.
- Swiss Paraplegic Research (SPF), 6207, Nottwil, Switzerland.
| | - Anke Scheel-Sailer
- Swiss Paraplegic Research (SPF), 6207, Nottwil, Switzerland
- Universitätsspital Bern, 3010, Bern, Switzerland
| | - Robert Riener
- Department of Health Science and Technology, ETH Zürich, 8006, Zürich, Switzerland
- Medical Faculty, University of Zürich, 8006, Zürich, Switzerland
| | - Diego Paez-Granados
- Department of Health Science and Technology, ETH Zürich, 8006, Zürich, Switzerland.
- Swiss Paraplegic Research (SPF), 6207, Nottwil, Switzerland.
| |
Collapse
|
6
|
Hsiao YC, Dutta A. Network Modeling and Control of Dynamic Disease Pathways, Review and Perspectives. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1211-1230. [PMID: 38498762 DOI: 10.1109/tcbb.2024.3378155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
Dynamic disease pathways are a combination of complex dynamical processes among bio-molecules in a cell that leads to diseases. Network modeling of disease pathways considers disease-related bio-molecules (e.g. DNA, RNA, transcription factors, enzymes, proteins, and metabolites) and their interaction (e.g. DNA methylation, histone modification, alternative splicing, and protein modification) to study disease progression and predict therapeutic responses. These bio-molecules and their interactions are the basic elements in the study of the misregulation in the disease-related gene expression that lead to abnormal cellular responses. Gene regulatory networks, cell signaling networks, and metabolic networks are the three major types of intracellular networks for the study of the cellular responses elicited from extracellular signals. The disease-related cellular responses can be prevented or regulated by designing control strategies to manipulate these extracellular or other intracellular signals. The paper reviews the regulatory mechanisms, the dynamic models, and the control strategies for each intracellular network. The applications, limitations and the prospective for modeling and control are also discussed.
Collapse
|
7
|
Jang YH, Lee SH, Han J, Cheong S, Shim SK, Han JK, Ryoo SK, Hwang CS. Memristive Crossbar Array-Based Probabilistic Graph Modeling. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024; 36:e2403904. [PMID: 39030848 DOI: 10.1002/adma.202403904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2024] [Revised: 07/05/2024] [Indexed: 07/22/2024]
Abstract
Modern graph datasets with structural complexity and uncertainties due to incomplete information or data variability require advanced modeling techniques beyond conventional graph models. This study introduces a memristive crossbar array (CBA)-based probabilistic graph model (C-PGM) utilizing Cu0.3Te0.7/HfO2/Pt memristors, which exhibit probabilistic switching, self-rectifying, and memory characteristics. C-PGM addresses the complexities and uncertainties inherent in structural graph data across various domains, leveraging the probabilistic nature of memristors. C-PGM relies on the device-to-device variation across multiple memristive CBAs, overcoming the limitations of previous approaches that rely on sequential operations, which are slower and have a reliability concern due to repeated switching. This new approach enables the fast processing and massive implementation of probabilistic units at the expense of chip area. In this study, the hardware-based C-PGM feasibly expresses small-scale probabilistic graphs and shows minimal error in aggregate probability calculations. The probability calculation capabilities of C-PGM are applied to steady-state estimation and the PageRank algorithm, which is implemented on a simulated large-scale C-PGM. The C-PGM-based steady-state estimation and PageRank algorithm demonstrate comparable accuracy to conventional methods while significantly reducing computational costs.
Collapse
Affiliation(s)
- Yoon Ho Jang
- Department of Materials Science and Engineering and Inter-university Semiconductor Research Center, College of Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Soo Hyung Lee
- Department of Materials Science and Engineering and Inter-university Semiconductor Research Center, College of Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Janguk Han
- Department of Materials Science and Engineering and Inter-university Semiconductor Research Center, College of Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Sunwoo Cheong
- Department of Materials Science and Engineering and Inter-university Semiconductor Research Center, College of Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Sung Keun Shim
- Department of Materials Science and Engineering and Inter-university Semiconductor Research Center, College of Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Joon-Kyu Han
- System Semiconductor Engineering and Department of Electronic Engineering, Sogang University, 35 Baekbeom-ro, Mapo-gu, Seoul, 04107, Republic of Korea
| | - Seung Kyu Ryoo
- Department of Materials Science and Engineering and Inter-university Semiconductor Research Center, College of Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Cheol Seong Hwang
- Department of Materials Science and Engineering and Inter-university Semiconductor Research Center, College of Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| |
Collapse
|
8
|
Liu J, Wang Y, Men J, Wang H. Identifying vital nodes for yeast network by dynamic network entropy. BMC Bioinformatics 2024; 25:242. [PMID: 39026169 PMCID: PMC11555816 DOI: 10.1186/s12859-024-05863-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 07/10/2024] [Indexed: 07/20/2024] Open
Abstract
BACKGROUND The progress of the cell cycle of yeast involves the regulatory relationships between genes and the interactions proteins. However, it is still obscure which type of protein plays a decisive role in regulation and how to identify the vital nodes in the regulatory network. To elucidate the sensitive node or gene in the progression of yeast, here, we select 8 crucial regulatory factors from the yeast cell cycle to decipher a specific network and propose a simple mixed K2 algorithm to identify effectively the sensitive nodes and genes in the evolution of yeast. RESULTS Considering the multivariate of cell cycle data, we first utilize the K2 algorithm limited to the stationary interval for the time series segmentation to measure the scores for refining the specific network. After that, we employ the network entropy to effectively screen the obtained specific network, and simulate the gene expression data by a normal distribution approximation and the screened specific network by the partial least squares method. We can conclude that the robustness of the specific network screened by network entropy is better than that of the specific network with the determined relationship by comparing the obtained specific network with the determined relationship. Finally, we can determine that the node CDH1 has the highest score in the specific network through a sensitivity score calculated by network entropy implying the gene CDH1 is the most sensitive regulatory factor. CONCLUSIONS It is clearly of great potential value to reconstruct and visualize gene regulatory networks according to gene databases for life activities. Here, we present an available algorithm to achieve the network reconstruction by measuring the network entropy and identifying the vital nodes in the specific nodes. The results indicate that inhibiting or enhancing the expression of CDH1 can maximize the inhibition or enhancement of the yeast cell cycle. Although our algorithm is simple, it is also the first step in deciphering the profound mystery of gene regulation.
Collapse
Affiliation(s)
- Jingchen Liu
- School of Mathematics and Statistics, Hainan University, Haikou, 570228, Hainan, People's Republic of China
- Key Laboratory of Engineering Modeling and Statistical Computation of Hainan Province, Hainan University, Haikou, 570228, Hainan, People's Republic of China
- School of Mathematics, Shandong University, Jinan, 250100, Shandong, People's Republic of China
| | - Yan Wang
- Department of Neurology, The First Affiliated Hospital, University of South China, Hengyang, 421001, Hunan, People's Republic of China
| | - Jiali Men
- School of Life Sciences, Hainan University, Haikou, 570228, Hainan, People's Republic of China
| | - Haohua Wang
- School of Mathematics and Statistics, Hainan University, Haikou, 570228, Hainan, People's Republic of China.
- Key Laboratory of Engineering Modeling and Statistical Computation of Hainan Province, Hainan University, Haikou, 570228, Hainan, People's Republic of China.
| |
Collapse
|
9
|
Niu Y, Luo J, Zong C. Single-cell total-RNA profiling unveils regulatory hubs of transcription factors. Nat Commun 2024; 15:5941. [PMID: 39009595 PMCID: PMC11251146 DOI: 10.1038/s41467-024-50291-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 07/03/2024] [Indexed: 07/17/2024] Open
Abstract
Recent development of RNA velocity uses master equations to establish the kinetics of the life cycle of RNAs from unspliced RNA to spliced RNA (i.e., mature RNA) to degradation. To feed this kinetic analysis, simultaneous measurement of unspliced RNA and spliced RNA in single cells is greatly desired. However, the majority of single-cell RNA-seq chemistry primarily captures mature RNA species to measure gene expressions. Here, we develop a one-step total-RNA chemistry-based single-cell RNA-seq method: snapTotal-seq. We benchmark this method with multiple single-cell RNA-seq assays in their performance in kinetic analysis of cell cycle by RNA velocity. Next, with LASSO regression between transcription factors, we identify the critical regulatory hubs mediating the cell cycle dynamics. We also apply snapTotal-seq to profile the oncogene-induced senescence and identify the key regulatory hubs governing the entry of senescence. Furthermore, from the comparative analysis of unspliced RNA and spliced RNA, we identify a significant portion of genes whose expression changes occur in spliced RNA but not to the same degree in unspliced RNA, indicating these gene expression changes are mainly controlled by post-transcriptional regulation. Overall, we demonstrate that snapTotal-seq can provide enriched information about gene regulation, especially during the transition between cell states.
Collapse
Affiliation(s)
- Yichi Niu
- Department of Molecular and Human Genetics, Houston, TX, USA
- Genetics & Genomics Program, Houston, TX, USA
| | - Jiayi Luo
- Department of Molecular and Human Genetics, Houston, TX, USA
- Cancer and Cell Biology Program, Houston, TX, USA
| | - Chenghang Zong
- Department of Molecular and Human Genetics, Houston, TX, USA.
- Genetics & Genomics Program, Houston, TX, USA.
- Cancer and Cell Biology Program, Houston, TX, USA.
- Integrative Molecular and Biomedical Sciences Program, Houston, TX, USA.
- Dan L Duncan Comprehensive Cancer Center, Houston, TX, USA.
- McNair Medical Institute, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
10
|
Huo Q, Song R, Ma Z. Recent advances in exploring transcriptional regulatory landscape of crops. FRONTIERS IN PLANT SCIENCE 2024; 15:1421503. [PMID: 38903438 PMCID: PMC11188431 DOI: 10.3389/fpls.2024.1421503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 05/23/2024] [Indexed: 06/22/2024]
Abstract
Crop breeding entails developing and selecting plant varieties with improved agronomic traits. Modern molecular techniques, such as genome editing, enable more efficient manipulation of plant phenotype by altering the expression of particular regulatory or functional genes. Hence, it is essential to thoroughly comprehend the transcriptional regulatory mechanisms that underpin these traits. In the multi-omics era, a large amount of omics data has been generated for diverse crop species, including genomics, epigenomics, transcriptomics, proteomics, and single-cell omics. The abundant data resources and the emergence of advanced computational tools offer unprecedented opportunities for obtaining a holistic view and profound understanding of the regulatory processes linked to desirable traits. This review focuses on integrated network approaches that utilize multi-omics data to investigate gene expression regulation. Various types of regulatory networks and their inference methods are discussed, focusing on recent advancements in crop plants. The integration of multi-omics data has been proven to be crucial for the construction of high-confidence regulatory networks. With the refinement of these methodologies, they will significantly enhance crop breeding efforts and contribute to global food security.
Collapse
Affiliation(s)
| | | | - Zeyang Ma
- State Key Laboratory of Maize Bio-breeding, Frontiers Science Center for Molecular Design Breeding, Joint International Research Laboratory of Crop Molecular Breeding, National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, China
| |
Collapse
|
11
|
Zhao Y, Ansarullah, Kumar P, Mahoney JM, He H, Baker C, George J, Li S. Causal network perturbation analysis identifies known and novel type-2 diabetes driver genes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.22.595431. [PMID: 38826370 PMCID: PMC11142180 DOI: 10.1101/2024.05.22.595431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
The molecular pathogenesis of diabetes is multifactorial, involving genetic predisposition and environmental factors that are not yet fully understood. However, pancreatic β-cell failure remains among the primary reasons underlying the progression of type-2 diabetes (T2D) making targeting β-cell dysfunction an attractive pathway for diabetes treatment. To identify genetic contributors to β-cell dysfunction, we investigated single-cell gene expression changes in β-cells from healthy (C57BL/6J) and diabetic (NZO/HlLtJ) mice fed with normal or high-fat, high-sugar diet (HFHS). Our study presents an innovative integration of the causal network perturbation assessment (ssNPA) framework with meta-cell transcriptome analysis to explore the genetic underpinnings of type-2 diabetes (T2D). By generating a reference causal network and in silico perturbation, we identified novel genes implicated in T2D and validated our candidates using the Knockout Mouse Phenotyping (KOMP) Project database.
Collapse
Affiliation(s)
- Yue Zhao
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Ansarullah
- Center for Biometric Analysis, The Jackson Laboratory, Bar Harbor, ME, USA
| | - Parveen Kumar
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Hao He
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Candice Baker
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Joshy George
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Sheng Li
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Department of Genetics and Genome Sciences, University of Connecticut School of Medicine, Farmington, CT, USA
| |
Collapse
|
12
|
Michoel T, Zhang JD. Causal inference in drug discovery and development. Drug Discov Today 2023; 28:103737. [PMID: 37591410 DOI: 10.1016/j.drudis.2023.103737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Revised: 07/31/2023] [Accepted: 08/10/2023] [Indexed: 08/19/2023]
Abstract
To discover new drugs is to seek and to prove causality. As an emerging approach leveraging human knowledge and creativity, data, and machine intelligence, causal inference holds the promise of reducing cognitive bias and improving decision-making in drug discovery. Although it has been applied across the value chain, the concepts and practice of causal inference remain obscure to many practitioners. This article offers a nontechnical introduction to causal inference, reviews its recent applications, and discusses opportunities and challenges of adopting the causal language in drug discovery and development.
Collapse
Affiliation(s)
- Tom Michoel
- Computational Biology Unit, Department of Informatics, University of Bergen, Postboks 7803, 5020 Bergen, Norway
| | - Jitao David Zhang
- Pharma Early Research and Development, Roche Innovation Centre Basel, F. Hoffmann-La Roche, Grenzacherstrasse 124, 4070 Basel, Switzerland; Department of Mathematics and Computer Science, University of Basel, Spiegelgasse 1, 4051 Basel, Switzerland.
| |
Collapse
|
13
|
Niu Y, Ni Y, Pati D, Mallick BK. Covariate-Assisted Bayesian Graph Learning for Heterogeneous Data. J Am Stat Assoc 2023; 119:1985-1999. [PMID: 39507103 PMCID: PMC11536292 DOI: 10.1080/01621459.2023.2233744] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 06/01/2023] [Accepted: 06/25/2023] [Indexed: 11/08/2024]
Abstract
In a traditional Gaussian graphical model, data homogeneity is routinely assumed with no extra variables affecting the conditional independence. In modern genomic datasets, there is an abundance of auxiliary information, which often gets under-utilized in determining the joint dependency structure. In this article, we consider a Bayesian approach to model undirected graphs underlying heterogeneous multivariate observations with additional assistance from covariates. Building on product partition models, we propose a novel covariate-dependent Gaussian graphical model that allows graphs to vary with covariates so that observations whose covariates are similar share a similar undirected graph. To efficiently embed Gaussian graphical models into our proposed framework, we explore both Gaussian likelihood and pseudo-likelihood functions. For Gaussian likelihood, a G-Wishart distribution is used as a natural conjugate prior, and for the pseudo-likelihood, a product of Gaussianconditionals is used. Moreover, the proposed model has large prior support and is flexible to approximate any v-Hölder conditional variance-covariance matrices with v ∈ ( 0,1 ] . We further show that based on the theory of fractional likelihood, the rate of posterior contraction is minimax optimal assuming the true density to be a Gaussian mixture with a known number of components. The efficacy of the approach is demonstrated via simulation studies and an analysis of a protein network for a breast cancer dataset assisted by mRNA gene expression as covariates.
Collapse
Affiliation(s)
- Yabo Niu
- Department of Mathematics, University of Houston
| | - Yang Ni
- Department of Statistics, Texas A&M University
| | | | | |
Collapse
|
14
|
Wang L, Trasanidis N, Wu T, Dong G, Hu M, Bauer DE, Pinello L. Dictys: dynamic gene regulatory network dissects developmental continuum with single-cell multiomics. Nat Methods 2023; 20:1368-1378. [PMID: 37537351 DOI: 10.1038/s41592-023-01971-3] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 07/05/2023] [Indexed: 08/05/2023]
Abstract
Gene regulatory networks (GRNs) are key determinants of cell function and identity and are dynamically rewired during development and disease. Despite decades of advancement, challenges remain in GRN inference, including dynamic rewiring, causal inference, feedback loop modeling and context specificity. To address these challenges, we develop Dictys, a dynamic GRN inference and analysis method that leverages multiomic single-cell assays of chromatin accessibility and gene expression, context-specific transcription factor footprinting, stochastic process network and efficient probabilistic modeling of single-cell RNA-sequencing read counts. Dictys improves GRN reconstruction accuracy and reproducibility and enables the inference and comparative analysis of context-specific and dynamic GRNs across developmental contexts. Dictys' network analyses recover unique insights in human blood and mouse skin development with cell-type-specific and dynamic GRNs. Its dynamic network visualizations enable time-resolved discovery and investigation of developmental driver transcription factors and their regulated targets. Dictys is available as a free, open-source and user-friendly Python package.
Collapse
Affiliation(s)
- Lingfei Wang
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Department of Pathology, Harvard Medical School, Boston, MA, USA
- Gene Regulation Observatory, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Nikolaos Trasanidis
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Department of Pathology, Harvard Medical School, Boston, MA, USA
- Hugh and Josseline Langmuir Centre for Myeloma Research, Centre for Haematology, Department of Immunology and Inflammation, Imperial College London, London, UK
| | - Ting Wu
- Division of Hematology/Oncology, Boston Children's Hospital, Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Stem Cell Institute, Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Guanlan Dong
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Department of Pathology, Harvard Medical School, Boston, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Department of Pediatrics, Harvard Medical School, Bioinformatics and Integrative Genomics PhD Program, Harvard Medical School, Boston, MA, USA
| | - Michael Hu
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Department of Pathology, Harvard Medical School, Boston, MA, USA
| | - Daniel E Bauer
- Gene Regulation Observatory, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Hematology/Oncology, Boston Children's Hospital, Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Stem Cell Institute, Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Luca Pinello
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Department of Pathology, Harvard Medical School, Boston, MA, USA.
- Gene Regulation Observatory, The Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
15
|
Xue L, Wu Y, Lin Y. Dissecting and improving gene regulatory network inference using single-cell transcriptome data. Genome Res 2023; 33:1609-1621. [PMID: 37580132 PMCID: PMC10620053 DOI: 10.1101/gr.277488.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 08/07/2023] [Indexed: 08/16/2023]
Abstract
Single-cell transcriptome data has been widely used to reconstruct gene regulatory networks (GRNs) controlling critical biological processes such as development and differentiation. Although a growing list of algorithms has been developed to infer GRNs using such data, achieving an inference accuracy consistently higher than random guessing has remained challenging. To address this, it is essential to delineate how the accuracy of regulatory inference is limited. Here, we systematically characterized factors limiting the accuracy of inferred GRNs and demonstrated that using pre-mRNA information can help improve regulatory inference compared to the typically used information (i.e., mature mRNA). Using kinetic modeling and simulated single-cell data sets, we showed that target genes' mature mRNA levels often fail to accurately report upstream regulatory activities because of gene-level and network-level factors, which can be improved by using pre-mRNA levels. We tested this finding on public single-cell RNA-seq data sets using intronic reads as proxies of pre-mRNA levels and can indeed achieve a higher inference accuracy compared to using exonic reads (corresponding to mature mRNAs). Using experimental data sets, we further validated findings from the simulated data sets and identified factors such as transcription factor activity dynamics influencing the accuracy of pre-mRNA-based inference. This work delineates the fundamental limitations of gene regulatory inference and helps improve GRN inference using single-cell RNA-seq data.
Collapse
Affiliation(s)
- Lingfeng Xue
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China, 100871
| | - Yan Wu
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China, 100871
- The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing, China, 100871
| | - Yihan Lin
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China, 100871;
- The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing, China, 100871
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China, 100871
| |
Collapse
|
16
|
Liu H, Zhang X. Frequentist model averaging for undirected Gaussian graphical models. Biometrics 2023; 79:2050-2062. [PMID: 36106680 DOI: 10.1111/biom.13758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Revised: 07/12/2022] [Accepted: 08/30/2022] [Indexed: 11/30/2022]
Abstract
Advances in information technologies have made network data increasingly frequent in a spectrum of big data applications, which is often explored by probabilistic graphical models. To precisely estimate the precision matrix, we propose an optimal model averaging estimator for Gaussian graphs. We prove that the proposed estimator is asymptotically optimal when candidate models are misspecified. The consistency and the asymptotic distribution of model averaging estimator, and the weight convergence are also studied when at least one correct model is included in the candidate set. Furthermore, numerical simulations and a real data analysis on yeast genetic data are conducted to illustrate that the proposed method is promising.
Collapse
Affiliation(s)
- Huihang Liu
- School of Management, University of Science and Technology of China, Hefei, China
| | - Xinyu Zhang
- School of Management, University of Science and Technology of China, Hefei, China
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
17
|
Ma Y, Shen J, Zhao Z, Liang H, Tan Y, Liu Z, Qian K, Yang M, Hu B. What Can Facial Movements Reveal? Depression Recognition and Analysis Based on Optical Flow Using Bayesian Networks. IEEE Trans Neural Syst Rehabil Eng 2023; 31:3459-3468. [PMID: 37581961 DOI: 10.1109/tnsre.2023.3305351] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]
Abstract
Recent evidence have demonstrated that facial expressions could be a valid and important aspect for depression recognition. Although various works have been achieved in automatic depression recognition, it is a challenge to explore the inherent nuances of facial expressions that might reveal the underlying differences between depressed patients and healthy subjects under different stimuli. There is a lack of an undisturbed system that monitors depressive patients' mental states in various free-living scenarios, so this paper steps towards building a classification model where data collection, feature extraction, depression recognition and facial actions analysis are conducted to infer the differences of facial movements between depressive patients and healthy subjects. In this study, we firstly present a plan of dividing facial regions of interest to extract optical flow features of facial expressions for depression recognition. We then propose facial movements coefficients utilising discrete wavelet transformation. Specifically, Bayesian Networks equipped with construction of Pearson Correlation Coefficients based on discrete wavelet transformation is learnt, which allows for analysing movements of different facial regions. We evaluate our method on a clinically validated dataset of 30 depressed patients and 30 healthy control subjects, and experiments results obtained the accuracy and recall of 81.7%, 96.7%, respectively, outperforming other features for comparison. Most importantly, the Bayesian Networks we built on the coefficients under different stimuli may reveal some facial action patterns of depressed subjects, which have a potential to assist the automatic diagnosis of depression.
Collapse
|
18
|
Zhang J, Hu C, Zhang Q. Gene regulatory network inference based on a nonhomogeneous dynamic Bayesian network model with an improved Markov Monte Carlo sampling. BMC Bioinformatics 2023; 24:264. [PMID: 37355560 DOI: 10.1186/s12859-023-05381-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2023] [Accepted: 06/07/2023] [Indexed: 06/26/2023] Open
Abstract
A nonhomogeneous dynamic Bayesian network model, which combines the dynamic Bayesian network and the multi-change point process, solves the limitations of the dynamic Bayesian network in modeling non-stationary gene expression data to a certain extent. However, certain problems persist, such as the low network reconstruction accuracy and poor model convergence. Therefore, we propose an MD-birth move based on the Manhattan distance of the data points to increase the rationality of the multi-change point process. The underlying concept of the MD-birth move is that the direction of movement of the change point is assumed to have a larger Manhattan distance between the variance and the mean of its left and right data points. Considering the data instability characteristics, we propose a Markov chain Monte Carlo sampling method based on node-dependent particle filtering in addition to the multi-change point process. The candidate parent nodes to be sampled, which are close to the real state, are pushed to the high probability area through the particle filter, and the candidate parent node set to be sampled that is far from the real state is pushed to the low probability area and then sampled. In terms of reconstructing the gene regulatory network, the model proposed in this paper (FC-DBN) has better network reconstruction accuracy and model convergence speed than other corresponding models on the Saccharomyces cerevisiae data and RAF data.
Collapse
Affiliation(s)
- Jiayao Zhang
- College of Artificial Intelligence and Big Data, Hefei University, Hefei, 230031, China
| | - Chunling Hu
- College of Artificial Intelligence and Big Data, Hefei University, Hefei, 230031, China.
| | - Qianqian Zhang
- College of Artificial Intelligence and Big Data, Hefei University, Hefei, 230031, China
| |
Collapse
|
19
|
Federico A, Kern J, Varelas X, Monti S. Structure Learning for Gene Regulatory Networks. PLoS Comput Biol 2023; 19:e1011118. [PMID: 37200395 DOI: 10.1371/journal.pcbi.1011118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 05/31/2023] [Accepted: 04/20/2023] [Indexed: 05/20/2023] Open
Abstract
Inference of biological network structures is often performed on high-dimensional data, yet is hindered by the limited sample size of high throughput "omics" data typically available. To overcome this challenge, often referred to as the "small n, large p problem," we exploit known organizing principles of biological networks that are sparse, modular, and likely share a large portion of their underlying architecture. We present SHINE-Structure Learning for Hierarchical Networks-a framework for defining data-driven structural constraints and incorporating a shared learning paradigm for efficiently learning multiple Markov networks from high-dimensional data at large p/n ratios not previously feasible. We evaluated SHINE on Pan-Cancer data comprising 23 tumor types, and found that learned tumor-specific networks exhibit expected graph properties of real biological networks, recapture previously validated interactions, and recapitulate findings in literature. Application of SHINE to the analysis of subtype-specific breast cancer networks identified key genes and biological processes for tumor maintenance and survival as well as potential therapeutic targets for modulating known breast cancer disease genes.
Collapse
Affiliation(s)
- Anthony Federico
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, Massachusetts, United States of America
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| | - Joseph Kern
- Department of Biochemistry, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Xaralabos Varelas
- Department of Biochemistry, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Stefano Monti
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, Massachusetts, United States of America
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America
| |
Collapse
|
20
|
Cho C, Lee D, Jeong D, Kim S, Kim MK, Srinivasan S. Characterization of radiation-resistance mechanism in Spirosoma montaniterrae DY10 T in terms of transcriptional regulatory system. Sci Rep 2023; 13:4739. [PMID: 36959250 PMCID: PMC10036542 DOI: 10.1038/s41598-023-31509-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Accepted: 03/13/2023] [Indexed: 03/25/2023] Open
Abstract
To respond to the external environmental changes for survival, bacteria regulates expression of a number of genes including transcription factors (TFs). To characterize complex biological phenomena, a biological system-level approach is necessary. Here we utilized six computational biology methods to infer regulatory network and to characterize underlying biologically mechanisms relevant to radiation-resistance. In particular, we inferred gene regulatory network (GRN) and operons of radiation-resistance bacterium Spirosoma montaniterrae DY10[Formula: see text] and identified the major regulators for radiation-resistance. Our results showed that DNA repair and reactive oxygen species (ROS) scavenging mechanisms are key processes and Crp/Fnr family transcriptional regulator works as a master regulatory TF in early response to radiation.
Collapse
Affiliation(s)
- Changyun Cho
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
| | - Dohoon Lee
- Bioinformatics Institute, Seoul National University, Seoul, 08826, Republic of Korea
- BK21 FOUR Intelligence Computing, Seoul National University, Seoul, 08826, Republic of Korea
| | - Dabin Jeong
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
- Department of Computer Science and Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Myung Kyum Kim
- Department of Bio & Environmental Technology, College of Natural Science, Seoul Women's University, Seoul, 01797, Republic of Korea.
| | - Sathiyaraj Srinivasan
- Department of Bio & Environmental Technology, College of Natural Science, Seoul Women's University, Seoul, 01797, Republic of Korea.
| |
Collapse
|
21
|
Decoding transcriptional regulation via a human gene expression predictor. J Genet Genomics 2023; 50:305-317. [PMID: 36693565 DOI: 10.1016/j.jgg.2023.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 01/04/2023] [Accepted: 01/10/2023] [Indexed: 01/22/2023]
Abstract
Transcription factors (TFs) regulate cellular activities by controlling gene expression, but a predictive model describing how TFs quantitatively modulate human transcriptomes is lacking. We construct a universal human gene expression predictor and utilize it to decode transcriptional regulation. Using the expression of 1613 TFs, the predictor reconstitutes highly accurate transcriptomes for samples derived from a wide range of tissues and conditions. The broad applicability of the predictor indicates that it recapitulates the quantitative relationships between TFs and target genes ubiquitous across tissues. Significant interacting TF-target gene pairs are extracted from the predictor and enable downstream inference of TF regulators for diverse pathways involved in development, immunity, metabolism, and stress response. A detailed analysis of the hematopoiesis process reveals an atlas of key TFs regulating the development of different hematopoietic cell lineages, and a portion of these TFs are conserved between humans and mice. The results demonstrate that our method is capable of delineating the TFs responsible for fate determination. Compared to other existing tools, our approach shows better performance in recovering the correct TF regulators. Thus, we present a novel approach that can be used to study human transcriptional regulation in general.
Collapse
|
22
|
Shi T, Yu H, Blair RH. Integrated regulatory and metabolic networks of the tumor microenvironment for therapeutic target prioritization. Stat Appl Genet Mol Biol 2023; 22:sagmb-2022-0054. [PMID: 37988745 DOI: 10.1515/sagmb-2022-0054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Accepted: 09/28/2023] [Indexed: 11/23/2023]
Abstract
Translation of genomic discovery, such as single-cell sequencing data, to clinical decisions remains a longstanding bottleneck in the field. Meanwhile, computational systems biological models, such as cellular metabolism models and cell signaling pathways, have emerged as powerful approaches to provide efficient predictions in metabolites and gene expression levels, respectively. However, there has been limited research on the integration between these two models. This work develops a methodology for integrating computational models of probabilistic gene regulatory networks with a constraint-based metabolism model. By using probabilistic reasoning with Bayesian Networks, we aim to predict cell-specific changes under different interventions, which are embedded into the constraint-based models of metabolism. Applications to single-cell sequencing data of glioblastoma brain tumors generate predictions about the effects of pharmaceutical interventions on the regulatory network and downstream metabolisms in different cell types from the tumor microenvironment. The model presents possible insights into treatments that could potentially suppress anaerobic metabolism in malignant cells with minimal impact on other cell types' metabolism. The proposed integrated model can guide therapeutic target prioritization, the formulation of combination therapies, and future drug discovery. This model integration framework is also generalizable to other applications, such as different cell types, organisms, and diseases.
Collapse
Affiliation(s)
- Tiange Shi
- University at Buffalo, Biostatistics, Buffalo, USA
| | - Han Yu
- Roswell Park Comprehensive Cancer Center, Biostatistics and Bioinformatics, Buffalo, USA
| | - Rachael Hageman Blair
- University at Buffalo, Biostatistics, Institute for Artificial Intelligence and Data Science, Buffalo, USA
| |
Collapse
|
23
|
Agamah FE, Bayjanov JR, Niehues A, Njoku KF, Skelton M, Mazandu GK, Ederveen THA, Mulder N, Chimusa ER, 't Hoen PAC. Computational approaches for network-based integrative multi-omics analysis. Front Mol Biosci 2022; 9:967205. [PMID: 36452456 PMCID: PMC9703081 DOI: 10.3389/fmolb.2022.967205] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 10/20/2022] [Indexed: 08/27/2023] Open
Abstract
Advances in omics technologies allow for holistic studies into biological systems. These studies rely on integrative data analysis techniques to obtain a comprehensive view of the dynamics of cellular processes, and molecular mechanisms. Network-based integrative approaches have revolutionized multi-omics analysis by providing the framework to represent interactions between multiple different omics-layers in a graph, which may faithfully reflect the molecular wiring in a cell. Here we review network-based multi-omics/multi-modal integrative analytical approaches. We classify these approaches according to the type of omics data supported, the methods and/or algorithms implemented, their node and/or edge weighting components, and their ability to identify key nodes and subnetworks. We show how these approaches can be used to identify biomarkers, disease subtypes, crosstalk, causality, and molecular drivers of physiological and pathological mechanisms. We provide insight into the most appropriate methods and tools for research questions as showcased around the aetiology and treatment of COVID-19 that can be informed by multi-omics data integration. We conclude with an overview of challenges associated with multi-omics network-based analysis, such as reproducibility, heterogeneity, (biological) interpretability of the results, and we highlight some future directions for network-based integration.
Collapse
Affiliation(s)
- Francis E. Agamah
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI-Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Jumamurat R. Bayjanov
- Center for Molecular and Biomolecular Informatics (CMBI), Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Anna Niehues
- Center for Molecular and Biomolecular Informatics (CMBI), Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Kelechi F. Njoku
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Michelle Skelton
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI-Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Gaston K. Mazandu
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI-Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
- African Institute for Mathematical Sciences, Cape Town, South Africa
| | - Thomas H. A. Ederveen
- Center for Molecular and Biomolecular Informatics (CMBI), Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Nicola Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI-Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Emile R. Chimusa
- Department of Applied Sciences, Faculty of Health and Life Sciences, Northumbria University, Newcastle, United Kingdom
| | - Peter A. C. 't Hoen
- Center for Molecular and Biomolecular Informatics (CMBI), Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| |
Collapse
|
24
|
Biswas R, Shlizerman E. Statistical perspective on functional and causal neural connectomics: The Time-Aware PC algorithm. PLoS Comput Biol 2022; 18:e1010653. [PMID: 36374908 PMCID: PMC9704761 DOI: 10.1371/journal.pcbi.1010653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 11/28/2022] [Accepted: 10/12/2022] [Indexed: 11/16/2022] Open
Abstract
The representation of the flow of information between neurons in the brain based on their activity is termed the causal functional connectome. Such representation incorporates the dynamic nature of neuronal activity and causal interactions between them. In contrast to connectome, the causal functional connectome is not directly observed and needs to be inferred from neural time series. A popular statistical framework for inferring causal connectivity from observations is the directed probabilistic graphical modeling. Its common formulation is not suitable for neural time series since it was developed for variables with independent and identically distributed static samples. In this work, we propose to model and estimate the causal functional connectivity from neural time series using a novel approach that adapts directed probabilistic graphical modeling to the time series scenario. In particular, we develop the Time-Aware PC (TPC) algorithm for estimating the causal functional connectivity, which adapts the PC algorithm-a state-of-the-art method for statistical causal inference. We show that the model outcome of TPC has the properties of reflecting causality of neural interactions such as being non-parametric, exhibits the directed Markov property in a time-series setting, and is predictive of the consequence of counterfactual interventions on the time series. We demonstrate the utility of the methodology to obtain the causal functional connectome for several datasets including simulations, benchmark datasets, and recent multi-array electro-physiological recordings from the mouse visual cortex.
Collapse
Affiliation(s)
- Rahul Biswas
- Department of Statistics, University of Washington, Seattle, Washington, United States of America
| | - Eli Shlizerman
- Department of Applied Mathematics and Department of Electrical & Computer Engineering, University of Washington, Seattle, Washington, United States of America
| |
Collapse
|
25
|
Zenere A, Larsson EG, Altafini C. Relating balance and conditional independence in graphical models. Phys Rev E 2022; 106:044309. [PMID: 36397601 DOI: 10.1103/physreve.106.044309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 09/13/2022] [Indexed: 06/16/2023]
Abstract
When data are available for all nodes of a Gaussian graphical model, then, it is possible to use sample correlations and partial correlations to test to what extent the conditional independencies that encode the structure of the model are indeed verified by the data. In this paper, we give a heuristic rule useful in such a validation process: When the correlation subgraph involved in a conditional independence is balanced (i.e., all its cycles have an even number of negative edges), then a partial correlation is usually a contraction of the corresponding correlation, which often leads to conditional independence. In particular, the contraction rule can be made rigorous if we look at concentration subgraphs rather than correlation subgraphs. The rule is applied to real data for elementary gene regulatory motifs.
Collapse
Affiliation(s)
- Alberto Zenere
- Department of Electrical Engineering, Linköping University, SE-58183 Linköping, Sweden
| | - Erik G Larsson
- Department of Electrical Engineering, Linköping University, SE-58183 Linköping, Sweden
| | - Claudio Altafini
- Department of Electrical Engineering, Linköping University, SE-58183 Linköping, Sweden
| |
Collapse
|
26
|
Jia M, Yuan DY, Lovelace TC, Hu M, Benos PV. Causal Discovery in High-dimensional, Multicollinear Datasets. FRONTIERS IN EPIDEMIOLOGY 2022; 2:899655. [PMID: 36778756 PMCID: PMC9910507 DOI: 10.3389/fepid.2022.899655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Accepted: 08/05/2022] [Indexed: 11/13/2022]
Abstract
As the cost of high-throughput genomic sequencing technology declines, its application in clinical research becomes increasingly popular. The collected datasets often contain tens or hundreds of thousands of biological features that need to be mined to extract meaningful information. One area of particular interest is discovering underlying causal mechanisms of disease outcomes. Over the past few decades, causal discovery algorithms have been developed and expanded to infer such relationships. However, these algorithms suffer from the curse of dimensionality and multicollinearity. A recently introduced, non-orthogonal, general empirical Bayes approach to matrix factorization has been demonstrated to successfully infer latent factors with interpretable structures from observed variables. We hypothesize that applying this strategy to causal discovery algorithms can solve both the high dimensionality and collinearity problems, inherent to most biomedical datasets. We evaluate this strategy on simulated data and apply it to two real-world datasets. In a breast cancer dataset, we identified important survival-associated latent factors and biologically meaningful enriched pathways within factors related to important clinical features. In a SARS-CoV-2 dataset, we were able to predict whether a patient (1) had Covid-19 and (2) would enter the ICU. Furthermore, we were able to associate factors with known Covid-19 related biological pathways.
Collapse
Affiliation(s)
- Minxue Jia
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
- Joint Carnegie Mellon - University of Pittsburgh Computational Biology PhD Program, Pittsburgh, PA, United States
| | - Daniel Y. Yuan
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
- Joint Carnegie Mellon - University of Pittsburgh Computational Biology PhD Program, Pittsburgh, PA, United States
| | - Tyler C. Lovelace
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
- Joint Carnegie Mellon - University of Pittsburgh Computational Biology PhD Program, Pittsburgh, PA, United States
| | - Mengying Hu
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
- Joint Carnegie Mellon - University of Pittsburgh Computational Biology PhD Program, Pittsburgh, PA, United States
| | - Panayiotis V. Benos
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
- Joint Carnegie Mellon - University of Pittsburgh Computational Biology PhD Program, Pittsburgh, PA, United States
- Department of Epidemiology, University of Florida, Gainesville, FL, United States
| |
Collapse
|
27
|
Abstract
Modeling and inference are central to most areas of science and especially to evolving and complex systems. Critically, the information we have is often uncertain and insufficient, resulting in an underdetermined inference problem; multiple inferences, models, and theories are consistent with available information. Information theory (in particular, the maximum information entropy formalism) provides a way to deal with such complexity. It has been applied to numerous problems, within and across many disciplines, over the last few decades. In this perspective, we review the historical development of this procedure, provide an overview of the many applications of maximum entropy and its extensions to complex systems, and discuss in more detail some recent advances in constructing comprehensive theory based on this inference procedure. We also discuss efforts at the frontier of information-theoretic inference: application to complex dynamic systems with time-varying constraints, such as highly disturbed ecosystems or rapidly changing economies.
Collapse
|
28
|
Suter P, Kuipers J, Beerenwinkel N. Discovering gene regulatory networks of multiple phenotypic groups using dynamic Bayesian networks. Brief Bioinform 2022; 23:bbac219. [PMID: 35679575 PMCID: PMC9294428 DOI: 10.1093/bib/bbac219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 04/29/2022] [Accepted: 05/10/2022] [Indexed: 11/13/2022] Open
Abstract
Dynamic Bayesian networks (DBNs) can be used for the discovery of gene regulatory networks (GRNs) from time series gene expression data. Here, we suggest a strategy for learning DBNs from gene expression data by employing a Bayesian approach that is scalable to large networks and is targeted at learning models with high predictive accuracy. Our framework can be used to learn DBNs for multiple groups of samples and highlight differences and similarities in their GRNs. We learn these DBN models based on different structural and parametric assumptions and select the optimal model based on the cross-validated predictive accuracy. We show in simulation studies that our approach is better equipped to prevent overfitting than techniques used in previous studies. We applied the proposed DBN-based approach to two time series transcriptomic datasets from the Gene Expression Omnibus database, each comprising data from distinct phenotypic groups of the same tissue type. In the first case, we used DBNs to characterize responders and non-responders to anti-cancer therapy. In the second case, we compared normal to tumor cells of colorectal tissue. The classification accuracy reached by the DBN-based classifier for both datasets was higher than reported previously. For the colorectal cancer dataset, our analysis suggested that GRNs for cancer and normal tissues have a lot of differences, which are most pronounced in the neighborhoods of oncogenes and known cancer tissue markers. The identified differences in gene networks of cancer and normal cells may be used for the discovery of targeted therapies.
Collapse
Affiliation(s)
- Polina Suter
- Department of Biosystems Science and Engineering, ETH Zurich, Matternstrasse 26, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Switzerland
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Matternstrasse 26, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Matternstrasse 26, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Switzerland
| |
Collapse
|
29
|
Sharma M, Jha IP, Chawla S, Pandey N, Chandra O, Mishra S, Kumar V. Associating pathways with diseases using single-cell expression profiles and making inferences about potential drugs. Brief Bioinform 2022; 23:6623725. [PMID: 35772850 DOI: 10.1093/bib/bbac241] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 05/22/2022] [Accepted: 05/23/2022] [Indexed: 11/14/2022] Open
Abstract
Finding direct dependencies between genetic pathways and diseases has been the target of multiple studies as it has many applications. However, due to cellular heterogeneity and limitations of the number of samples for bulk expression profiles, such studies have faced hurdles in the past. Here, we propose a method to perform single-cell expression-based inference of association between pathway, disease and cell-type (sci-PDC), which can help to understand their cause and effect and guide precision therapy. Our approach highlighted reliable relationships between a few diseases and pathways. Using the example of diabetes, we have demonstrated how sci-PDC helps in tracking variation of association between pathways and diseases with changes in age and species. The variation in pathways-disease associations in mice and humans revealed critical facts about the suitability of the mouse model for a few pathways in the context of diabetes. The coherence between results from our method and previous reports, including information about the drug target pathways, highlights its reliability for multidimensional utility.
Collapse
Affiliation(s)
- Madhu Sharma
- Department of computational biology, Indraprastha Institute of Information Technology, Okhla Ph-III, New Delhi
| | - Indra Prakash Jha
- Department of computational biology, Indraprastha Institute of Information Technology, Okhla Ph-III, New Delhi
| | - Smriti Chawla
- Department of computational biology, Indraprastha Institute of Information Technology, Okhla Ph-III, New Delhi
| | - Neetesh Pandey
- Department of computational biology, Indraprastha Institute of Information Technology, Okhla Ph-III, New Delhi
| | - Omkar Chandra
- Department of computational biology, Indraprastha Institute of Information Technology, Okhla Ph-III, New Delhi
| | - Shreya Mishra
- Department of computational biology, Indraprastha Institute of Information Technology, Okhla Ph-III, New Delhi
| | - Vibhor Kumar
- Department of computational biology, Indraprastha Institute of Information Technology, Okhla Ph-III, New Delhi
| |
Collapse
|
30
|
Videla Rodriguez EA, Pértille F, Guerrero-Bosagna C, Mitchell JBO, Jensen P, Smith VA. Practical application of a Bayesian network approach to poultry epigenetics and stress. BMC Bioinformatics 2022; 23:261. [PMID: 35778683 PMCID: PMC9250184 DOI: 10.1186/s12859-022-04800-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 06/14/2022] [Indexed: 11/23/2022] Open
Abstract
Background Relationships among genetic or epigenetic features can be explored by learning probabilistic networks and unravelling the dependencies among a set of given genetic/epigenetic features. Bayesian networks (BNs) consist of nodes that represent the variables and arcs that represent the probabilistic relationships between the variables. However, practical guidance on how to make choices among the wide array of possibilities in Bayesian network analysis is limited. Our study aimed to apply a BN approach, while clearly laying out our analysis choices as an example for future researchers, in order to provide further insights into the relationships among epigenetic features and a stressful condition in chickens (Gallus gallus). Results Chickens raised under control conditions (n = 22) and chickens exposed to a social isolation protocol (n = 24) were used to identify differentially methylated regions (DMRs). A total of 60 DMRs were selected by a threshold, after bioinformatic pre-processing and analysis. The treatment was included as a binary variable (control = 0; stress = 1). Thereafter, a BN approach was applied: initially, a pre-filtering test was used for identifying pairs of features that must not be included in the process of learning the structure of the network; then, the average probability values for each arc of being part of the network were calculated; and finally, the arcs that were part of the consensus network were selected. The structure of the BN consisted of 47 out of 61 features (60 DMRs and the stressful condition), displaying 43 functional relationships. The stress condition was connected to two DMRs, one of them playing a role in tight and adhesive intracellular junctions in organs such as ovary, intestine, and brain. Conclusions We clearly explain our steps in making each analysis choice, from discrete BN models to final generation of a consensus network from multiple model averaging searches. The epigenetic BN unravelled functional relationships among the DMRs, as well as epigenetic features in close association with the stressful condition the chickens were exposed to. The DMRs interacting with the stress condition could be further explored in future studies as possible biomarkers of stress in poultry species. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04800-0.
Collapse
Affiliation(s)
| | - Fábio Pértille
- Environmental Toxicology Program, Institute of Organismal Biology, Uppsala University, Uppsala, Sweden.,Department of Biomedical & Clinical Sciences (BKV), Linköping University, 58183, Linköping, Sweden.,AVIAN Behavioural Genomics and Physiology Group, Department of Physics, Chemistry and Biology, Linköping University, 58183, Linköping, Sweden
| | - Carlos Guerrero-Bosagna
- Environmental Toxicology Program, Institute of Organismal Biology, Uppsala University, Uppsala, Sweden.,AVIAN Behavioural Genomics and Physiology Group, Department of Physics, Chemistry and Biology, Linköping University, 58183, Linköping, Sweden
| | - John B O Mitchell
- EaStCHEM School of Chemistry, University of St Andrews, St Andrews, Fife, KY16 9ST, UK
| | - Per Jensen
- AVIAN Behavioural Genomics and Physiology Group, Department of Physics, Chemistry and Biology, Linköping University, 58183, Linköping, Sweden
| | - V Anne Smith
- School of Biology, University of St Andrews, St Andrews, Fife, KY16 9TH, UK.
| |
Collapse
|
31
|
Thermodynamic Modelling of Transcriptional Control: A Sensitivity Analysis. MATHEMATICS 2022. [DOI: 10.3390/math10132169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Modelling is a tool used to decipher the biochemical mechanisms involved in transcriptional control. Experimental evidence in genetics is usually supported by theoretical models in order to evaluate the effects of all the possible interactions that can occur in these complicated processes. Models derived from the thermodynamic method are critical in this labour because they are able to take into account multiple mechanisms operating simultaneously at the molecular micro-scale and relate them to transcriptional initiation at the tissular macro-scale. This work is devoted to adapting computational techniques to this context in order to theoretically evaluate the role played by several biochemical mechanisms. The interest of this theoretical analysis relies on the fact that it can be contrasted against those biological experiments where the response to perturbations in the transcriptional machinery environment is evaluated in terms of genetically activated/repressed regions. The theoretical reproduction of these experiments leads to a sensitivity analysis whose results are expressed in terms of the elasticity of a threshold function determining those activated/repressed regions. The study of this elasticity function in thermodynamic models already proposed in the literature reveals that certain modelling approaches can alter the balance between the biochemical mechanisms considered, and this can cause false/misleading outcomes. The reevaluation of classical thermodynamic models gives us a more accurate and complete picture of the interactions involved in gene regulation and transcriptional control, which enables more specific predictions. This sensitivity approach provides a definite advantage in the interpretation of a wide range of genetic experimental results.
Collapse
|
32
|
Ngampruetikorn V, Sachdeva V, Torrence J, Humplik J, Schwab DJ, Palmer SE. Inferring couplings in networks across order-disorder phase transitions. PHYSICAL REVIEW RESEARCH 2022; 4:023240. [PMID: 37576946 PMCID: PMC10421637 DOI: 10.1103/physrevresearch.4.023240] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Statistical inference is central to many scientific endeavors, yet how it works remains unresolved. Answering this requires a quantitative understanding of the intrinsic interplay between statistical models, inference methods, and the structure in the data. To this end, we characterize the efficacy of direct coupling analysis (DCA) - a highly successful method for analyzing amino acid sequence data-in inferring pairwise interactions from samples of ferromagnetic Ising models on random graphs. Our approach allows for physically motivated exploration of qualitatively distinct data regimes separated by phase transitions. We show that inference quality depends strongly on the nature of data-generating distributions: optimal accuracy occurs at an intermediate temperature where the detrimental effects from macroscopic order and thermal noise are minimal. Importantly our results indicate that DCA does not always outperform its local-statistics-based predecessors; while DCA excels at low temperatures, it becomes inferior to simple correlation thresholding at virtually all temperatures when data are limited. Our findings offer insights into the regime in which DCA operates so successfully, and more broadly, how inference interacts with the structure in the data.
Collapse
Affiliation(s)
- Vudtiwat Ngampruetikorn
- Initiative for the Theoretical Sciences, The Graduate Center, CUNY, New York, New York 10016, USA
| | - Vedant Sachdeva
- Department of Organismal Biology and Anatomy and Department of Physics, University of Chicago, Chicago, Illinois 60637, USA
| | - Johanna Torrence
- Department of Organismal Biology and Anatomy and Department of Physics, University of Chicago, Chicago, Illinois 60637, USA
| | - Jan Humplik
- Institute of Science and Technology Austria, 3400 Klosterneuburg, Austria
| | - David J Schwab
- Initiative for the Theoretical Sciences, The Graduate Center, CUNY, New York, New York 10016, USA
| | - Stephanie E Palmer
- Department of Organismal Biology and Anatomy and Department of Physics, University of Chicago, Chicago, Illinois 60637, USA
| |
Collapse
|
33
|
Data-driven learning how oncogenic gene expression locally alters heterocellular networks. Nat Commun 2022; 13:1986. [PMID: 35418177 PMCID: PMC9007999 DOI: 10.1038/s41467-022-29636-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 03/22/2022] [Indexed: 11/21/2022] Open
Abstract
Developing drugs increasingly relies on mechanistic modeling and simulation. Models that capture causal relations among genetic drivers of oncogenesis, functional plasticity, and host immunity complement wet experiments. Unfortunately, formulating such mechanistic cell-level models currently relies on hand curation, which can bias how data is interpreted or the priority of drug targets. In modeling molecular-level networks, rules and algorithms are employed to limit a priori biases in formulating mechanistic models. Here we combine digital cytometry with Bayesian network inference to generate causal models of cell-level networks linking an increase in gene expression associated with oncogenesis with alterations in stromal and immune cell subsets from bulk transcriptomic datasets. We predict how increased Cell Communication Network factor 4, a secreted matricellular protein, alters the tumor microenvironment using data from patients diagnosed with breast cancer and melanoma. Predictions are then tested using two immunocompetent mouse models for melanoma, which provide consistent experimental results. While mechanistic models play increasing roles in immuno-oncology, hand network curation is current practice. Here the authors use a Bayesian data-driven approach to infer how expression of a secreted oncogene alters the cellular landscape within the tumor.
Collapse
|
34
|
Chen L, Wan H, He Q, He S, Deng M. Statistical Methods for Microbiome Compositional Data Network Inference: A Survey. J Comput Biol 2022; 29:704-723. [PMID: 35404093 DOI: 10.1089/cmb.2021.0406] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Microbes can be found almost everywhere in the world. They are not isolated, but rather interact with each other and establish connections with their living environments. Studying these interactions is essential to an understanding of the organization and complex interplay of microbial communities, as well as the structure and dynamics of various ecosystems. A widely used approach toward this objective involves the inference of microbiome interaction networks. However, owing to the compositional, high-dimensional, sparse, and heterogeneous nature of observed microbial data, applying network inference methods to estimate their associations is challenging. In addition, external environmental interference and biological concerns also make it more difficult to deal with the network inference. In this article, we provide a comprehensive review of emerging microbiome interaction network inference methods. According to various research targets, estimated networks are divided into four main categories: correlation networks, conditional correlation networks, mixture networks, and differential networks. Their assumptions, high-level ideas, advantages, as well as limitations, are presented in this review. Since real microbial interactions can be complex and dynamic, no unifying method has, to date, captured all the aspects of interest. In addition, we discuss the challenges now confronting current microbial interaction study and future prospects. Finally, we point out several feasible directions of microbial network inference analysis and highlight that future research requires the joint promotion of statistical computation methods and experimental techniques.
Collapse
Affiliation(s)
- Liang Chen
- School of Mathematical Sciences, Peking University, Beijing, China
| | - Hui Wan
- School of Mathematical Sciences, Peking University, Beijing, China
| | - Qiuyan He
- School of Mathematical Sciences, Peking University, Beijing, China
| | - Shun He
- School of Mathematical Sciences, Peking University, Beijing, China
| | - Minghua Deng
- School of Mathematical Sciences, Peking University, Beijing, China.,Center for Statistical Science, Peking University, Beijing, China.,Center for Quantitative Biology, Peking University, Beijing, China
| |
Collapse
|
35
|
Topology Adaptive Graph Estimation in High Dimensions. MATHEMATICS 2022. [DOI: 10.3390/math10081244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
We introduce Graphical TREX (GTREX), a novel method for graph estimation in high-dimensional Gaussian graphical models. By conducting neighborhood selection with TREX, GTREX avoids tuning parameters and is adaptive to the graph topology. We compared GTREX with standard methods on a new simulation setup that was designed to assess accurately the strengths and shortcomings of different methods. These simulations showed that a neighborhood selection scheme based on Lasso and an optimal (in practice unknown) tuning parameter outperformed other standard methods over a large spectrum of scenarios. Moreover, we show that GTREX can rival this scheme and, therefore, can provide competitive graph estimation without the need for tuning parameter calibration.
Collapse
|
36
|
Nardos R, Leung ET, Dahl EM, Davin S, Asquith M, Gregory WT, Karstens L. Network-Based Differences in the Vaginal and Bladder Microbial Communities Between Women With and Without Urgency Urinary Incontinence. Front Cell Infect Microbiol 2022; 12:759156. [PMID: 35402312 PMCID: PMC8988226 DOI: 10.3389/fcimb.2022.759156] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 02/17/2022] [Indexed: 12/12/2022] Open
Abstract
Background Little is known about the relationship of proximal urogenital microbiomes in the bladder and the vagina and how this contributes to bladder health. In this study, we use a microbial ecology and network framework to understand the dynamics of interactions/co-occurrences of bacteria in the bladder and vagina in women with and without urgency urinary incontinence (UUI). Methods We collected vaginal swabs and catheterized urine specimens from 20 women with UUI (cases) and 30 women without UUI (controls). We sequenced the V4 region of the bacterial 16S rRNA gene and evaluated using alpha and beta diversity metrics. We used microbial network analysis to detect interactions in the microbiome and the betweenness centrality measure to identify central bacteria in the microbial network. Bacteria exhibiting maximum betweenness centrality are considered central to the microbe-wide networks and likely maintain the overall microbial network structure. Results There were no significant differences in the vaginal or bladder microbiomes between cases and controls using alpha and beta diversity. Silhouette metric analysis identified two distinct microbiome clusters in both the bladder and vagina. One cluster was dominated by Lactobacillus genus while the other was more diverse. Network-based analyses demonstrated that vaginal and bladder microbial networks were different between cases and controls. In the vagina, there were similar numbers of genera and subgroup clusters in each network for cases and controls. However, cases tend to have more unique bacterial co-occurrences. While Bacteroides and Lactobacillus were the central bacteria with the highest betweenness centrality in controls, Aerococcus had the highest centrality in cases and correlated with bacteria commonly associated with bacterial vaginosis. In the bladder, cases have less than half as many network clusters compared to controls. Lactobacillus was the central bacteria in both groups but associated with several known uropathogens in cases. The number of shared bacterial genera between the bladder and the vagina differed between cases and controls, with cases having larger overlap (43%) compared to controls (29%). Conclusion Our study shows overlaps in microbial communities of bladder and vagina, with higher overlap in cases. We also identified differences in the bacteria that are central to the overall community structure.
Collapse
Affiliation(s)
- Rahel Nardos
- Division of Urogynecology, Oregon Health and Science University, Portland, OR, United States
- Division of Female Pelvic Medicine and Reconstructive Surgery, University of Minnesota, Minneapolis, MN, United States
- *Correspondence: Rahel Nardos,
| | - Eric T. Leung
- Division of Bioinformatics and Computational Biomedicine, Oregon Health and Science University, Portland, OR, United States
| | - Erin M. Dahl
- Division of Bioinformatics and Computational Biomedicine, Oregon Health and Science University, Portland, OR, United States
| | - Sean Davin
- Division of Arthritis and Rheumatology, Oregon Health and Science University, Portland, OR, United States
| | - Mark Asquith
- Division of Arthritis and Rheumatology, Oregon Health and Science University, Portland, OR, United States
| | - W. Thomas Gregory
- Division of Urogynecology, Oregon Health and Science University, Portland, OR, United States
| | - Lisa Karstens
- Division of Urogynecology, Oregon Health and Science University, Portland, OR, United States
- Division of Bioinformatics and Computational Biomedicine, Oregon Health and Science University, Portland, OR, United States
| |
Collapse
|
37
|
Biswas R, Shlizerman E. Statistical Perspective on Functional and Causal Neural Connectomics: A Comparative Study. Front Syst Neurosci 2022; 16:817962. [PMID: 35308566 PMCID: PMC8924489 DOI: 10.3389/fnsys.2022.817962] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 01/19/2022] [Indexed: 11/13/2022] Open
Abstract
Representation of brain network interactions is fundamental to the translation of neural structure to brain function. As such, methodologies for mapping neural interactions into structural models, i.e., inference of functional connectome from neural recordings, are key for the study of brain networks. While multiple approaches have been proposed for functional connectomics based on statistical associations between neural activity, association does not necessarily incorporate causation. Additional approaches have been proposed to incorporate aspects of causality to turn functional connectomes into causal functional connectomes, however, these methodologies typically focus on specific aspects of causality. This warrants a systematic statistical framework for causal functional connectomics that defines the foundations of common aspects of causality. Such a framework can assist in contrasting existing approaches and to guide development of further causal methodologies. In this work, we develop such a statistical guide. In particular, we consolidate the notions of associations and representations of neural interaction, i.e., types of neural connectomics, and then describe causal modeling in the statistics literature. We particularly focus on the introduction of directed Markov graphical models as a framework through which we define the Directed Markov Property—an essential criterion for examining the causality of proposed functional connectomes. We demonstrate how based on these notions, a comparative study of several existing approaches for finding causal functional connectivity from neural activity can be conducted. We proceed by providing an outlook ahead regarding the additional properties that future approaches could include to thoroughly address causality.
Collapse
Affiliation(s)
- Rahul Biswas
- Department of Statistics, University of Washington, Seattle, WA, United States
| | - Eli Shlizerman
- Department of Applied Mathematics, Department of Electrical & Computer Engineering, University of Washington, Seattle, WA, United States
- *Correspondence: Eli Shlizerman
| |
Collapse
|
38
|
Zenere A, Rundquist O, Gustafsson M, Altafini C. Multi-omics protein-coding units as massively parallel Bayesian networks: empirical validation of causality structure. iScience 2022; 25:104048. [PMID: 35355520 PMCID: PMC8958332 DOI: 10.1016/j.isci.2022.104048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 01/17/2022] [Accepted: 03/08/2022] [Indexed: 11/29/2022] Open
Abstract
In this article we use high-throughput epigenomics, transcriptomics, and proteomics data to construct fine-graded models of the “protein-coding units” gathering all transcript isoforms and chromatin accessibility peaks associated with more than 4000 genes in humans. Each protein-coding unit has the structure of a directed acyclic graph (DAG) and can be represented as a Bayesian network. The factorization of the joint probability distribution induced by the DAGs imposes a number of conditional independence relationships among the variables forming a protein-coding unit, corresponding to the missing edges in the DAGs. We show that a large fraction of these conditional independencies are indeed verified by the data. Factors driving this verification appear to be the structural and functional annotation of the transcript isoforms, as well as a notion of structural balance (or frustration-free) of the corresponding sample correlation graph, which naturally leads to reduction of correlation (and hence to independence) upon conditioning. Protein coding unit: DAG associated with epigenetic and gene information of a protein DAGs correspond to Bayesian networks Edge absence on a DAG corresponds to conditional independence Multi-omics data (ATAC-seq, RNA-seq and mass-spec) are used for DAG validation
Collapse
|
39
|
Affiliation(s)
| | - Stefano Peluso
- Department of Statistics and Quantitative Methods, Università degli Studi di Milano-Bicocca, Milan
| |
Collapse
|
40
|
Zheng L, Liu Z, Yang Y, Shen HB. Accurate inference of gene regulatory interactions from spatial gene expression with deep contrastive learning. Bioinformatics 2022; 38:746-753. [PMID: 34664632 DOI: 10.1093/bioinformatics/btab718] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2021] [Revised: 09/19/2021] [Accepted: 10/15/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Reverse engineering of gene regulatory networks (GRNs) has long been an attractive research topic in system biology. Computational prediction of gene regulatory interactions has remained a challenging problem due to the complexity of gene expression and scarce information resources. The high-throughput spatial gene expression data, like in situ hybridization images that exhibit temporal and spatial expression patterns, has provided abundant and reliable information for the inference of GRNs. However, computational tools for analyzing the spatial gene expression data are highly underdeveloped. RESULTS In this study, we develop a new method for identifying gene regulatory interactions from gene expression images, called ConGRI. The method is featured by a contrastive learning scheme and deep Siamese convolutional neural network architecture, which automatically learns high-level feature embeddings for the expression images and then feeds the embeddings to an artificial neural network to determine whether or not the interaction exists. We apply the method to a Drosophila embryogenesis dataset and identify GRNs of eye development and mesoderm development. Experimental results show that ConGRI outperforms previous traditional and deep learning methods by a large margin, which achieves accuracies of 76.7% and 68.7% for the GRNs of early eye development and mesoderm development, respectively. It also reveals some master regulators for Drosophila eye development. AVAILABILITYAND IMPLEMENTATION https://github.com/lugimzheng/ConGRI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lujing Zheng
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
- SJTU Paris Elite Institute of Technology (SPEIT), Shanghai Jiao Tong University, Shanghai 200240, China
| | - Zhenhuan Liu
- Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yang Yang
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
- Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai 200240, China
| | - Hong-Bin Shen
- Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China
- Institute of Image Processing and Pattern Recognition and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
41
|
Zhang S, Knaack S, Roy S. Enabling Studies of Genome-Scale Regulatory Network Evolution in Large Phylogenies with MRTLE. Methods Mol Biol 2022; 2477:439-455. [PMID: 35524131 PMCID: PMC9794031 DOI: 10.1007/978-1-0716-2257-5_24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Transcriptional regulatory networks specify context-specific patterns of genes and play a central role in how species evolve and adapt. Inferring genome-scale regulatory networks in non-model species is the first step for examining patterns of conservation and divergence of regulatory networks. Transcriptomic data obtained under varying environmental stimuli in multiple species are becoming increasingly available, which can be used to infer regulatory networks. However, inference and analysis of multiple gene regulatory networks in a phylogenetic setting remains challenging. We developed an algorithm, Multi-species Regulatory neTwork LEarning (MRTLE), to facilitate such studies of regulatory network evolution. MRTLE is a probabilistic graphical model-based algorithm that uses phylogenetic structure, transcriptomic data for multiple species, and sequence-specific motifs in each species to simultaneously infer genome-scale regulatory networks across multiple species. We applied MRTLE to study regulatory network evolution across six ascomycete yeasts using transcriptomic measurements collected across different stress conditions. MRTLE networks recapitulated experimentally derived interactions in the model organism S. cerevisiae as well as non-model species, and it was more beneficial for network inference than methods that do not use phylogenetic information. We examined the regulatory networks across species and found that regulators associated with significant expression and network changes are involved in stress-related processes. MTRLE and its associated downstream analysis provide a scalable and principled framework to examine evolutionary dynamics of transcriptional regulatory networks across multiple species in a large phylogeny.
Collapse
Affiliation(s)
- Shilu Zhang
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Sara Knaack
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA.
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
42
|
Kuipers J, Suter P, Moffa G. Efficient Sampling and Structure Learning of Bayesian Networks. J Comput Graph Stat 2021. [DOI: 10.1080/10618600.2021.2020127] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Affiliation(s)
- Jack Kuipers
- D-BSSE, ETH Zurich, Mattenstrasse 26, 4058 Basel, Switzerland
| | - Polina Suter
- D-BSSE, ETH Zurich, Mattenstrasse 26, 4058 Basel, Switzerland
| | - Giusi Moffa
- Division of Psychiatry, University College London, London, UK
- Department of Mathematics and Computer Science, University of Basel, Basel, Switzerland
| |
Collapse
|
43
|
Rocca A, Kholodenko BN. Can Systems Biology Advance Clinical Precision Oncology? Cancers (Basel) 2021; 13:6312. [PMID: 34944932 PMCID: PMC8699328 DOI: 10.3390/cancers13246312] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Accepted: 12/10/2021] [Indexed: 12/13/2022] Open
Abstract
Precision oncology is perceived as a way forward to treat individual cancer patients. However, knowing particular cancer mutations is not enough for optimal therapeutic treatment, because cancer genotype-phenotype relationships are nonlinear and dynamic. Systems biology studies the biological processes at the systems' level, using an array of techniques, ranging from statistical methods to network reconstruction and analysis, to mathematical modeling. Its goal is to reconstruct the complex and often counterintuitive dynamic behavior of biological systems and quantitatively predict their responses to environmental perturbations. In this paper, we review the impact of systems biology on precision oncology. We show examples of how the analysis of signal transduction networks allows to dissect resistance to targeted therapies and inform the choice of combinations of targeted drugs based on tumor molecular alterations. Patient-specific biomarkers based on dynamical models of signaling networks can have a greater prognostic value than conventional biomarkers. These examples support systems biology models as valuable tools to advance clinical and translational oncological research.
Collapse
Affiliation(s)
- Andrea Rocca
- Hygiene and Public Health, Local Health Unit of Romagna, 47121 Forlì, Italy
| | - Boris N. Kholodenko
- Systems Biology Ireland, School of Medicine, University College Dublin, Belfield, D04 V1W8 Dublin, Ireland
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, D04 V1W8 Dublin, Ireland
- Department of Pharmacology, Yale University School of Medicine, New Haven, CT 06520, USA
| |
Collapse
|
44
|
Castelletti F, Mascaro A. Structural learning and estimation of joint causal effects among network-dependent variables. STAT METHOD APPL-GER 2021. [DOI: 10.1007/s10260-021-00579-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
AbstractBayesian networks in the form of Directed Acyclic Graphs (DAGs) represent an effective tool for modeling and inferring dependence relations among variables, a process known as structural learning. In addition, when equipped with the notion of intervention, a causal DAG model can be adopted to quantify the causal effect on a response due to a hypothetical intervention on some variable. Observational data cannot distinguish between DAGs encoding the same set of conditional independencies (Markov equivalent DAGs), which however can be different from a causal perspective. In addition, because causal effects depend on the underlying network structure, uncertainty around the DAG generating model crucially affects the causal estimation results. We propose a Bayesian methodology which combines structural learning of Gaussian DAG models and inference of causal effects as arising from simultaneous interventions on any given set of variables in the system. Our approach fully accounts for the uncertainty around both the network structure and causal relationships through a joint posterior distribution over DAGs, DAG parameters and then causal effects.
Collapse
|
45
|
Castelletti F, Peluso S. Equivalence class selection of categorical graphical models. Comput Stat Data Anal 2021. [DOI: 10.1016/j.csda.2021.107304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
46
|
Moingeon P, Kuenemann M, Guedj M. Artificial intelligence-enhanced drug design and development: Toward a computational precision medicine. Drug Discov Today 2021; 27:215-222. [PMID: 34555509 DOI: 10.1016/j.drudis.2021.09.006] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 07/13/2021] [Accepted: 09/14/2021] [Indexed: 12/29/2022]
Abstract
Artificial Intelligence (AI) relies upon a convergence of technologies with further synergies with life science technologies to capture the value of massive multi-modal data in the form of predictive models supporting decision-making. AI and machine learning (ML) enhance drug design and development by improving our understanding of disease heterogeneity, identifying dysregulated molecular pathways and therapeutic targets, designing and optimizing drug candidates, as well as evaluating in silico clinical efficacy. By providing an unprecedented level of knowledge on both patient specificities and drug candidate properties, AI is fostering the emergence of a computational precision medicine allowing the design of therapies or preventive measures tailored to the singularities of individual patients in terms of their physiology, disease features, and exposure to environmental risks.
Collapse
Affiliation(s)
- Philippe Moingeon
- Servier, Research and Development, 50 rue Carnot, 92284 Suresnes Cedex, France.
| | - Mélaine Kuenemann
- Servier, Research and Development, 50 rue Carnot, 92284 Suresnes Cedex, France
| | - Mickaël Guedj
- Servier, Research and Development, 50 rue Carnot, 92284 Suresnes Cedex, France
| |
Collapse
|
47
|
Han SW, Park S, Zhong H, Ryu ES, Wang P, Jung S, Lim J, Yoon J, Kim S. Estimation of joint directed acyclic graphs with lasso family for gene networks. COMMUN STAT-SIMUL C 2021. [DOI: 10.1080/03610918.2019.1618869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Sung Won Han
- School of Industrial Management Engineering, Korea University, Seongbuk-gu, Seoul, Republic of Korea
| | - Sunghoon Park
- School of Industrial Management Engineering, Korea University, Seongbuk-gu, Seoul, Republic of Korea
| | - Hua Zhong
- Division of Biostatistics, Department of Population Health, New York University, New York, New York, USA
| | - Eun-Seok Ryu
- Department of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of Korea
| | - Pei Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Sehee Jung
- AI Analytics Team, Deep Visions, Seodaemun-gu, Seoul, Republic of Korea
| | - Jayeon Lim
- Department of Applied Statistics, Konkuk University, Gwangjin-gu, Seoul, Republic of Korea
| | - Jeewhan Yoon
- Department of Management of Technology, Graduate School of Management of Technology, Korea University, Seongbuk-gu, Seoul, South Korea
| | - SungHwan Kim
- Department of Applied Statistics, Konkuk University, Gwangjin-gu, Seoul, Republic of Korea
| |
Collapse
|
48
|
Leahy BD, Racowsky C, Needleman D. Inferring simple but precise quantitative models of human oocyte and early embryo development. J R Soc Interface 2021; 18:20210475. [PMID: 34493094 PMCID: PMC8424348 DOI: 10.1098/rsif.2021.0475] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Accepted: 08/16/2021] [Indexed: 11/12/2022] Open
Abstract
Macroscopic, phenomenological models are useful as concise framings of our understandings in fields from statistical physics to finance to biology. Constructing a phenomenological model for development would provide a framework for understanding the complicated, regulatory nature of oogenesis and embryogenesis. Here, we use a data-driven approach to infer quantitative, precise models of human oocyte maturation and pre-implantation embryo development, by analysing clinical in-vitro fertilization (IVF) data on 7399 IVF cycles resulting in 57 827 embryos. Surprisingly, we find that both oocyte maturation and early embryo development are quantitatively described by simple models with minimal interactions. This simplicity suggests that oogenesis and embryogenesis are composed of modular processes that are relatively siloed from one another. In particular, our analysis provides strong evidence that (i) pre-antral follicles produce anti-Müllerian hormone independently of effects from other follicles, (ii) oocytes mature to metaphase-II independently of the woman's age, her BMI and other factors, (iii) early embryo development is memoryless for the variables assessed here, in that the probability of an embryo transitioning from its current developmental stage to the next is independent of its previous stage. Our results both provide insight into the fundamentals of oogenesis and embryogenesis and have implications for the clinical IVF.
Collapse
Affiliation(s)
- Brian D. Leahy
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
- SEAS, Harvard University, Cambridge, MA, USA
| | - Catherine Racowsky
- Brigham Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Daniel Needleman
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
- SEAS, Harvard University, Cambridge, MA, USA
- Center for Computational Biology, Flatiron Institute, New York, NY, USA
| |
Collapse
|
49
|
Li CZ, Kawaguchi ES, Li G. A New ℓ0-Regularized Log-Linear Poisson Graphical Model with Applications to RNA Sequencing Data. J Comput Biol 2021; 28:880-891. [PMID: 34375132 PMCID: PMC8558075 DOI: 10.1089/cmb.2020.0558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
In this article, we develop a new ℓ 0 -based sparse Poisson graphical model with applications to gene network inference from RNA-seq gene expression count data. Assuming a pair-wise Markov property, we propose to fit a separate broken adaptive ridge-regularized log-linear Poisson regression on each node to evaluate the conditional, instead of marginal, association between two genes in the presence of all other genes. The resulting sparse gene networks are generally more accurate than those generated by the ℓ 1 -regularized Poisson graphical model as demonstrated by our empirical studies. A real data illustration is given on a kidney renal clear cell carcinoma micro-RNA-seq data from the Cancer Genome Atlas.
Collapse
Affiliation(s)
- Caesar Z. Li
- Department of Biostatistics, School of Public Health, University of California at Los Angeles, Los Angeles, California, USA
| | - Eric S. Kawaguchi
- Graduate Programs in Biostatistics and Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, California, USA
| | - Gang Li
- Department of Biostatistics, School of Public Health, University of California at Los Angeles, Los Angeles, California, USA
| |
Collapse
|
50
|
Jin G, Song Q. Flexible brain: a domain-model based bayesian network for classification. J EXP THEOR ARTIF IN 2021. [DOI: 10.1080/0952813x.2021.1949753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Guanghao Jin
- School of Telecommunication Engineering, Beijing Polytechnic, Beijing, China
| | - Qingzeng Song
- School of Computer Science and Technology, Tiangong University, Tianjin, China
| |
Collapse
|