1
|
Huang J, Cheng CY, Brooks MD, Jeffers TL, Doner NM, Shih HJ, Frangos S, Katari MS, Coruzzi GM. Model-to-crop conserved NUE Regulons enhance machine learning predictions of nitrogen use efficiency. THE PLANT CELL 2025; 37:koaf093. [PMID: 40365911 PMCID: PMC12124406 DOI: 10.1093/plcell/koaf093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/19/2025] [Accepted: 04/07/2025] [Indexed: 05/15/2025]
Abstract
Systems biology aims to uncover gene regulatory networks (GRNs) for agricultural traits, but validating them in crops is challenging. We addressed this challenge by learning and validating model-to-crop transcription factor (TF) regulons governing nitrogen use efficiency (NUE). First, a fine-scale time-course nitrogen (N) response transcriptome analysis revealed a conserved temporal N response cascade in maize (Zea mays) and Arabidopsis (Arabidopsis thaliana). These data were used to infer time-based causal TF target edges in N-regulated GRNs. By validating 23 maize TFs in a cell-based TF-perturbation assay (Transient Assay Reporting Genome-wide Effects of Transcription factors), precision/recall analysis enabled us to prune high-confidence edges between ∼200 TFs/700 maize target genes. We next learned gene-to-NUE trait scores using XGBoost machine learning models trained on conserved N-responsive genes across maize and Arabidopsis accessions. By integrating NUE gene scores within our N-GRN, we ranked maize TFs based on a cumulative NUE Regulon score. NUE Regulons for top-ranked TFs were validated using the cell-based TARGET assay in maize (e.g. ZmMYB34/R3→24 targets) and the Arabidopsis ZmMYB34/R3 ortholog (e.g. AtDIV1→23 targets). The genes in this NUE Regulon significantly enhanced the ability of XGBoost models to predict NUE traits in both maize and Arabidopsis. Thus, our pipeline for identifying TF regulons that combines GRN inference, machine learning, and orthologous network regulons offers a strategic framework for crop trait improvement.
Collapse
Affiliation(s)
- Ji Huang
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| | - Chia-Yi Cheng
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
- Department of Life Science, College of Life Science, National Taiwan University, Taipei 10663, Taiwan
| | - Matthew D Brooks
- Global Change and Photosynthesis Research Unit, USDA-ARS, Urbana, IL 61801, USA
| | - Tim L Jeffers
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| | - Nathan M Doner
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| | - Hung-Jui Shih
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| | - Samantha Frangos
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| | - Manpreet Singh Katari
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| | - Gloria M Coruzzi
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| |
Collapse
|
2
|
Aviña-Padilla K, Zambada-Moreno O, Jimenez-Limas MA, Hammond RW, Hernández-Rosales M. Dissecting the role of bHLH transcription factors in the potato spindle tuber viroid (PSTVd)-tomato pathosystem using network approaches. PLoS One 2025; 20:e0318573. [PMID: 40334007 PMCID: PMC12058033 DOI: 10.1371/journal.pone.0318573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2024] [Accepted: 01/19/2025] [Indexed: 05/09/2025] Open
Abstract
Viroids, minimalist plant pathogens, pose significant threats to crops by causing severe diseases. Transcriptome profiling technologies have significantly advanced the analysis of viroid-infected host plants, providing critical insights into gene regulation by these pathogens. Despite these advancements, the presence of numerous genes of unknown function continues to limit a complete understanding of the transcriptome data. Co-expression analysis addresses this issue by clustering genes into modules based on global gene expression levels, with genes in the same cluster likely participating in the same biological pathways. In a previous study, we emphasized the importance of basic helix-loop-helix (bHLH) proteins in transcriptional reprogramming in tomato host in response to different potato spindle tuber viroid (PSTVd) strains. In the current research, we delve into tissue-specific gene modules, particularly in root and leaf tissues, governed by bHLH transcription factors (TFs) during PSTVd infections. Utilizing public datasets that span Control (C), mock-inoculated, PSTVd-mild (M), and PSTVd-severe (S23) strains in time-course infections, we uncovered differentially expressed gene modules. These modules were functionally characterized to identify essential hub genes, notably highlighting the regulatory coordination of bHLH TFs, depicted through the significant bifan motif found in these interactions. Expanding on these findings, we explored bipartite networks, discerning both common and unique bHLH TF regulatory roles. Our findings reveal that bHLH TFs play pivotal roles in regulating processes such as energy metabolism and facilitating rapid membrane repair in infected roots. In leaves, changes in the external layers affected photosynthesis, linking bHLH TFs to distinct metabolic functions. Through this holistic approach, we deepen our understanding of viroid-host interactions and the intricate regulatory mechanisms underpinning them.
Collapse
Affiliation(s)
- Katia Aviña-Padilla
- Deparment of Genetic Engineering, Center for Research and Advanced Studies (Cinvestav), Irapuato, Guanajuato, Mexico
- Department of Crop Sciences, University of Illinois at Urbana–Champaign, Urbana, Illinois, United States of America
| | - Octavio Zambada-Moreno
- Deparment of Genetic Engineering, Center for Research and Advanced Studies (Cinvestav), Irapuato, Guanajuato, Mexico
| | | | - Rosemarie W. Hammond
- United States of America Department of Agriculture, Beltsville Agricultural Research Center, Beltsville, Maryland, United States of America
| | - Maribel Hernández-Rosales
- Deparment of Genetic Engineering, Center for Research and Advanced Studies (Cinvestav), Irapuato, Guanajuato, Mexico
| |
Collapse
|
3
|
Lundqvist N, Garbulowski M, Hillerton T, Sonnhammer ELL. Topology-based metrics for finding the optimal sparsity in gene regulatory network inference. Bioinformatics 2025; 41:btaf120. [PMID: 40127172 PMCID: PMC12057811 DOI: 10.1093/bioinformatics/btaf120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 11/22/2024] [Accepted: 03/19/2025] [Indexed: 03/26/2025] Open
Abstract
MOTIVATION Gene regulatory network (GRN) inference is a complex task aiming to unravel regulatory interactions between genes in a cell. A major shortcoming of most GRN inference methods is that they do not attempt to find the optimal sparsity, i.e. the single best GRN, which is important when applying GRN inference in a real situation. Instead, the sparsity tends to be controlled by an arbitrarily set hyperparameter. RESULTS In this paper, two new methods for predicting the optimal sparsity of GRNs are formulated and benchmarked on simulated perturbation-based gene expression data using four GRN inference methods: LASSO, Zscore, LSCON, and GENIE3. Both sparsity prediction methods are defined using the hypothesis that the topology of real GRNs is scale-free, and are evaluated based on their ability to predict the sparsity of the true GRN. The results show that the new topology-based approaches reliably predict a sparsity close to the true one. This ability is valuable for real-world applications where a single GRN is inferred from real data. In such situations, it is vital to be able to infer a GRN with the correct sparsity. AVAILABILITY AND IMPLEMENTATION https://bitbucket.org/sonnhammergrni/powerlaw_sparsity/ and https://codeocean.com/capsule/4393635/.
Collapse
Affiliation(s)
- Nils Lundqvist
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Solna 171 21, Sweden
| | - Mateusz Garbulowski
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Solna 171 21, Sweden
| | - Thomas Hillerton
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Solna 171 21, Sweden
| | - Erik L L Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Solna 171 21, Sweden
| |
Collapse
|
4
|
Hammond J, Smith VA. Bayesian networks for network inference in biology. J R Soc Interface 2025; 22:20240893. [PMID: 40328299 PMCID: PMC12055290 DOI: 10.1098/rsif.2024.0893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2024] [Revised: 02/14/2025] [Accepted: 02/20/2025] [Indexed: 05/08/2025] Open
Abstract
Bayesian networks (BNs) have been used for reconstructing interactions from biological data, in disciplines ranging from molecular biology to ecology and neuroscience. BNs learn conditional dependencies between variables, which best 'explain' the data, represented as a directed graph which approximates the relationships between variables. In the 2000s, BNs were a popular method that promised an approach capable of inferring biological networks from data. Here, we review the use of BNs applied to biological data over the past two decades and evaluate their efficacy. We find that BNs are successful in inferring biological networks, frequently identifying novel interactions or network components missed by previous analyses. We suggest that as false positive results are underreported, it is difficult to assess the accuracy of BNs in inferring biological networks. BN learning appears most successful for small numbers of variables with high-quality datasets that either discretize the data into few states or include perturbative data. We suggest that BNs have failed to live up to the promise of the 2000s but that this is most likely due to experimental constraints on datasets, and the success of BNs at inferring networks in a variety of biological contexts suggests they are a powerful tool for biologists.
Collapse
Affiliation(s)
- James Hammond
- Department of Biology, University of Oxford, Oxford, UK
- School of Biology, University of St Andrews, St Andrews, UK
| | - V. Anne Smith
- School of Biology, University of St Andrews, St Andrews, UK
| |
Collapse
|
5
|
Wei PJ, Jin HW, Gao Z, Su Y, Zheng CH. GAEDGRN: reconstruction of gene regulatory networks based on gravity-inspired graph autoencoders. Brief Bioinform 2025; 26:bbaf232. [PMID: 40415678 DOI: 10.1093/bib/bbaf232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2025] [Revised: 04/25/2025] [Accepted: 05/04/2025] [Indexed: 05/27/2025] Open
Abstract
Reconstructing high-resolution gene regulatory networks (GRNs) based on single-cell RNA sequencing data provides an opportunity to gain insight into disease pathogenesis. At present, there are a large number of GRN reconstruction methods based on graph neural networks, and they can obtain excellent performance in GRN inference by extracting network structure features. However, most of these methods fail to fully exploit the directional characteristics or even ignore them when extracting network structural features. To this end, a novel framework called GAEDGRN is proposed based on gravity-inspired graph autoencoder (GIGAE) to infer potential causal relationships between genes. Among them, GIGAE can help us capture the complex directed network topology in GRN. Additionally, due to the uneven distribution of the latent vectors generated by the graph autoencoder, a random walk-based method is used to regularize the latent vectors learnt by the encoder. Furthermore, considering that some genes in GRN usually have a significant impact on biological functions, GAEDGRN designs a gene importance score calculation method and pays attention to genes with high importance in the process of GRN reconstruction. Experimental results on seven cell types of three GRN types show that GAEDGRN achieves high accuracy and strong robustness. Moreover, a case study on human embryonic stem cells demonstrates that GAEDGRN can help identify important genes.
Collapse
Affiliation(s)
- Pi-Jing Wei
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institute of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Huai-Wan Jin
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Zhen Gao
- The Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Yansen Su
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| | - Chun-Hou Zheng
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei 230601, Anhui, China
| |
Collapse
|
6
|
Su G, Wang H, Zhang Y, Wilkins MR, Canete PF, Yu D, Yang Y, Zhang W. Inferring gene regulatory networks by hypergraph generative model. CELL REPORTS METHODS 2025; 5:101026. [PMID: 40220759 DOI: 10.1016/j.crmeth.2025.101026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Revised: 01/16/2025] [Accepted: 03/20/2025] [Indexed: 04/14/2025]
Abstract
We present hypergraph variational autoencoder (HyperG-VAE), a Bayesian deep generative model that leverages hypergraph representation to model single-cell RNA sequencing (scRNA-seq) data. The model features a cell encoder with a structural equation model to account for cellular heterogeneity and construct gene regulatory networks (GRNs) alongside a gene encoder using hypergraph self-attention to identify gene modules. The synergistic optimization of encoders via a decoder improves GRN inference, single-cell clustering, and data visualization, as validated by benchmarks. HyperG-VAE effectively uncovers gene regulation patterns and demonstrates robustness in downstream analyses, as shown in B cell development data from bone marrow. Gene set enrichment analysis of overlapping genes in predicted GRNs confirms the gene encoder's role in refining GRN inference. Offering an efficient solution for scRNA-seq analysis and GRN construction, HyperG-VAE also holds the potential for extending GRN modeling to temporal and multimodal single-cell omics.
Collapse
Affiliation(s)
- Guangxin Su
- School of Computer Science and Engineering, The University of New South Wales, Sydney, NSW, Australia; ARC Centre of Excellence for the Mathematical Analysis of Cellular Systems (MACSYS), Melbourne, VIC, Australia
| | - Hanchen Wang
- ARC Centre of Excellence for the Mathematical Analysis of Cellular Systems (MACSYS), Melbourne, VIC, Australia; Australian Artificial Intelligence Institute, The University of Technology Sydney, Sydney, NSW, Australia
| | - Ying Zhang
- School of Computer Science and Technology, Zhejiang Gongshang University, Zhejiang, China
| | - Marc R Wilkins
- ARC Centre of Excellence for the Mathematical Analysis of Cellular Systems (MACSYS), Melbourne, VIC, Australia; Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, NSW, Australia
| | - Pablo F Canete
- Frazer Institute, Faculty of Health, Medicine and Behaviour Sciences, The University of Queensland, Brisbane, QLD, Australia
| | - Di Yu
- Frazer Institute, Faculty of Health, Medicine and Behaviour Sciences, The University of Queensland, Brisbane, QLD, Australia; Ian Frazer Centre for Children's Immunotherapy Research, Child Health Research Centre, Faculty of Health, Medicine and Behaviour Sciences, The University of Queensland, Brisbane, QLD, Australia
| | - Yang Yang
- Frazer Institute, Faculty of Health, Medicine and Behaviour Sciences, The University of Queensland, Brisbane, QLD, Australia.
| | - Wenjie Zhang
- School of Computer Science and Engineering, The University of New South Wales, Sydney, NSW, Australia; ARC Centre of Excellence for the Mathematical Analysis of Cellular Systems (MACSYS), Melbourne, VIC, Australia.
| |
Collapse
|
7
|
Morin A, Chu CP, Pavlidis P. Identifying reproducible transcription regulator coexpression patterns with single cell transcriptomics. PLoS Comput Biol 2025; 21:e1012962. [PMID: 40257984 PMCID: PMC12011263 DOI: 10.1371/journal.pcbi.1012962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2024] [Accepted: 03/13/2025] [Indexed: 04/23/2025] Open
Abstract
The proliferation of single cell transcriptomics has potentiated our ability to unveil patterns that reflect dynamic cellular processes such as the regulation of gene transcription. In this study, we leverage a broad collection of single cell RNA-seq data to identify the gene partners whose expression is most coordinated with each human and mouse transcription regulator (TR). We assembled 120 human and 103 mouse scRNA-seq datasets from the literature (>28 million cells), constructing a single cell coexpression network for each. We aimed to understand the consistency of TR coexpression profiles across a broad sampling of biological contexts, rather than examine the preservation of context-specific signals. Our workflow therefore explicitly prioritizes the patterns that are most reproducible across cell types. Towards this goal, we characterize the similarity of each TR's coexpression within and across species. We create single cell coexpression rankings for each TR, demonstrating that this aggregated information recovers literature curated targets on par with ChIP-seq data. We then combine the coexpression and ChIP-seq information to identify candidate regulatory interactions supported across methods and species. Finally, we highlight interactions for the important neural TR ASCL1 to demonstrate how our compiled information can be adopted for community use.
Collapse
Affiliation(s)
- Alexander Morin
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, British Columbia, Canada
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Ching Pan Chu
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, British Columbia, Canada
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Paul Pavlidis
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
8
|
Johnson Z, Anderson D, Cheung MS, Bohutskyi P. Gene network centrality analysis identifies key regulators coordinating day-night metabolic transitions in Synechococcus elongatus PCC 7942 despite limited accuracy in predicting direct regulator-gene interactions. Front Microbiol 2025; 16:1569559. [PMID: 40207147 PMCID: PMC11979508 DOI: 10.3389/fmicb.2025.1569559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2025] [Accepted: 03/07/2025] [Indexed: 04/11/2025] Open
Abstract
Synechococcus elongatus PCC 7942 is a model organism for studying circadian regulation and bioproduction, where precise temporal control of metabolism significantly impacts photosynthetic efficiency and CO2-to-bioproduct conversion. Despite extensive research on core clock components, our understanding of the broader regulatory network orchestrating genome-wide metabolic transitions remains incomplete. We address this gap by applying machine learning tools and network analysis to investigate the transcriptional architecture governing circadian-controlled gene expression. While our approach showed moderate accuracy in predicting individual transcription factor-gene interactions - a common challenge with real expression data - network-level topological analysis successfully revealed the organizational principles of circadian regulation. Our analysis identified distinct regulatory modules coordinating day-night metabolic transitions, with photosynthesis and carbon/nitrogen metabolism controlled by day-phase regulators, while nighttime modules orchestrate glycogen mobilization and redox metabolism. Through network centrality analysis, we identified potentially significant but previously understudied transcriptional regulators: HimA as a putative DNA architecture regulator, and TetR and SrrB as potential coordinators of nighttime metabolism, working alongside established global regulators RpaA and RpaB. This work demonstrates how network-level analysis can extract biologically meaningful insights despite limitations in predicting direct regulatory interactions. The regulatory principles uncovered here advance our understanding of how cyanobacteria coordinate complex metabolic transitions and may inform metabolic engineering strategies for enhanced photosynthetic bioproduction from CO2.
Collapse
Affiliation(s)
- Zachary Johnson
- Biological Sciences Division, Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA, United States
- Department of Biological Systems Engineering, Washington State University, Pullman, WA, United States
| | - David Anderson
- Biological Sciences Division, Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA, United States
| | - Margaret S. Cheung
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, United States
- Department of Physics, University of Washington, Seattle, WA, United States
| | - Pavlo Bohutskyi
- Biological Sciences Division, Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA, United States
- Department of Biological Systems Engineering, Washington State University, Pullman, WA, United States
| |
Collapse
|
9
|
Chevalley M, Roohani YH, Mehrjou A, Leskovec J, Schwab P. A large-scale benchmark for network inference from single-cell perturbation data. Commun Biol 2025; 8:412. [PMID: 40069299 PMCID: PMC11897147 DOI: 10.1038/s42003-025-07764-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Accepted: 02/18/2025] [Indexed: 03/15/2025] Open
Abstract
Mapping biological mechanisms in cellular systems is a fundamental step in early-stage drug discovery that serves to generate hypotheses on what disease-relevant molecular targets may effectively be modulated by pharmacological interventions. With the advent of high-throughput methods for measuring single-cell gene expression under genetic perturbations, we now have effective means for generating evidence for causal gene-gene interactions at scale. However, evaluating the performance of network inference methods in real-world environments is challenging due to the lack of ground-truth knowledge. Moreover, traditional evaluations conducted on synthetic datasets do not reflect the performance in real-world systems. We thus introduce CausalBench, a benchmark suite revolutionizing network inference evaluation with real-world, large-scale single-cell perturbation data. CausalBench, distinct from existing benchmarks, offers biologically-motivated metrics and distribution-based interventional measures, providing a more realistic evaluation of network inference methods. An initial systematic evaluation of state-of-the-art causal inference methods using our CausalBench suite highlights how poor scalability of existing methods limits performance. Moreover, methods that use interventional information do not outperform those that only use observational data, contrary to what is observed on synthetic benchmarks. CausalBench subsequently enables the development of numerous promising methods through a community challenge, thus demonstrating its potential as a transformative tool in the field of computational biology, bridging the gap between theoretical innovation and practical application in drug discovery and disease understanding. Thus, CausalBench opens new avenues for method developers in causal network inference research, and provides to practitioners a principled and reliable way to track progress in network methods for real-world interventional data.
Collapse
Affiliation(s)
| | - Yusuf H Roohani
- GSK.ai, Zug, Switzerland
- Stanford University, Stanford, CA, USA
| | | | | | | |
Collapse
|
10
|
Grover CE, Jareczek JJ, Swaminathan S, Lee Y, Howell AH, Rani H, Arick MA, Leach AG, Miller ER, Yang P, Hu G, Xiong X, Mallery EL, Peterson DG, Xie J, Haigler CH, Zabotina OA, Szymanski DB, Wendel JF. A high-resolution model of gene expression during Gossypium hirsutum (cotton) fiber development. BMC Genomics 2025; 26:221. [PMID: 40050725 PMCID: PMC11884195 DOI: 10.1186/s12864-025-11360-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Accepted: 02/11/2025] [Indexed: 03/10/2025] Open
Abstract
BACKGROUND Cotton fiber development relies on complex and intricate biological processes to transform newly differentiated fiber initials into the mature, extravagantly elongated cellulosic cells that are the foundation of this economically important cash crop. Here we extend previous research into cotton fiber development by employing controlled conditions to minimize variability and utilizing time-series sampling and analyses to capture daily transcriptomic changes from early elongation through the early stages of secondary wall synthesis (6 to 24 days post anthesis; DPA). RESULTS A majority of genes are expressed in fiber, largely partitioned into two major coexpression modules that represent genes whose expression generally increases or decreases during development. Differential gene expression reveals a massive transcriptomic shift between 16 and 17 DPA, corresponding to the onset of the transition phase that leads to secondary wall synthesis. Subtle gene expression changes are captured by the daily sampling, which are discussed in the context of fiber development. Coexpression and gene regulatory networks are constructed and associated with phenotypic aspects of fiber development, including turgor and cellulose production. Key genes are considered in the broader context of plant secondary wall synthesis, noting their known and putative roles in cotton fiber development. CONCLUSIONS The analyses presented here highlight the importance of fine-scale temporal sampling on understanding developmental processes and offer insight into genes and regulatory networks that may be important in conferring the unique fiber phenotype.
Collapse
Affiliation(s)
- Corrinne E Grover
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA.
| | - Josef J Jareczek
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA
- Present address: Bellarmine University, Louisville, KY, USA
| | - Sivakumar Swaminathan
- Roy J Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA, 50011, USA
| | - Youngwoo Lee
- Department of Botany and Plant Pathology, Center for Plant Biology, Purdue University, West Lafayette, IN, 47907, USA
| | - Alexander H Howell
- Department of Botany and Plant Pathology, Center for Plant Biology, Purdue University, West Lafayette, IN, 47907, USA
| | - Heena Rani
- Department of Botany and Plant Pathology, Center for Plant Biology, Purdue University, West Lafayette, IN, 47907, USA
- Present address: USDA-ARS, Cereal Crops Research Unit, Madison, WI, 53726, USA
| | - Mark A Arick
- Institute for Genomics, Biocomputing & Biotechnology, Mississippi State University, Mississippi State, MS, 39762, USA
| | - Alexis G Leach
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA
- Present address: Cell and Molecular Biology Graduate Group, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, 19104, USA
| | - Emma R Miller
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA
| | - Pengcheng Yang
- Department of Statistics, Purdue University, West Lafayette, IN, 47907, USA
| | - Guanjing Hu
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xianpeng Xiong
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Eileen L Mallery
- Department of Botany and Plant Pathology, Center for Plant Biology, Purdue University, West Lafayette, IN, 47907, USA
| | - Daniel G Peterson
- Institute for Genomics, Biocomputing & Biotechnology, Mississippi State University, Mississippi State, MS, 39762, USA
| | - Jun Xie
- Department of Statistics, Purdue University, West Lafayette, IN, 47907, USA
| | - Candace H Haigler
- Department of Crop & Soil Sciences, North Carolina State University, Raleigh, NC, 27695, USA
- Department of Plant & Microbial Biology, North Carolina State University, Raleigh, NC, 27695, USA
| | - Olga A Zabotina
- Roy J Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA, 50011, USA
| | - Daniel B Szymanski
- Department of Botany and Plant Pathology, Center for Plant Biology, Purdue University, West Lafayette, IN, 47907, USA
| | - Jonathan F Wendel
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA
| |
Collapse
|
11
|
Yu W, Lin Z, Lan M, Ou-Yang L. GCLink: a graph contrastive link prediction framework for gene regulatory network inference. Bioinformatics 2025; 41:btaf074. [PMID: 39960893 PMCID: PMC11881698 DOI: 10.1093/bioinformatics/btaf074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 01/10/2025] [Accepted: 02/13/2025] [Indexed: 03/06/2025] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) unveil the intricate interactions among genes, pivotal in elucidating the complex biological processes within cells. The advent of single-cell RNA-sequencing (scRNA-seq) enables the inference of GRNs at single-cell resolution. However, the majority of current supervised network inference methods typically concentrate on predicting pairwise gene regulatory interaction, thus failing to fully exploit correlations among all genes and exhibiting limited generalization performance. RESULTS To address these issues, we propose a graph contrastive link prediction (GCLink) model to infer potential gene regulatory interactions from scRNA-seq data. Based on known gene regulatory interactions and scRNA-seq data, GCLink introduces a graph contrastive learning strategy to aggregate the feature and neighborhood information of genes to learn their representations. This approach reduces the dependence of our model on sample size and enhance its ability in predicting potential gene regulatory interactions. Extensive experiments on real scRNA-seq datasets demonstrate that GCLink outperforms other state-of-the-art methods in most cases. Furthermore, by pretraining GCLink on a source cell line with abundant known regulatory interactions and fine-tuning it on a target cell line with limited amount of known interactions, our GCLink model exhibits good performance in GRN inference, demonstrating its effectiveness in inferring GRNs from datasets with limited known interactions. AVAILABILITY AND IMPLEMENTATION The source code and data are available at https://github.com/Yoyiming/GCLink.
Collapse
Affiliation(s)
- Weiming Yu
- Guangdong Provincial Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, College of Electronics and Information Engineering, Shenzhen University, Shenzhen 518060, China
| | - Zerun Lin
- Guangdong Provincial Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, College of Electronics and Information Engineering, Shenzhen University, Shenzhen 518060, China
| | - Miaofang Lan
- Guangdong Provincial Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, College of Electronics and Information Engineering, Shenzhen University, Shenzhen 518060, China
| | - Le Ou-Yang
- Guangdong Laboratory of Machine Perception and Intelligent Computing, Faculty of Engineering, Shenzhen MSU-BIT University, Shenzhen 518116, China
| |
Collapse
|
12
|
Stock M, Losert C, Zambon M, Popp N, Lubatti G, Hörmanseder E, Heinig M, Scialdone A. Leveraging prior knowledge to infer gene regulatory networks from single-cell RNA-sequencing data. Mol Syst Biol 2025; 21:214-230. [PMID: 39939367 PMCID: PMC11876610 DOI: 10.1038/s44320-025-00088-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Revised: 01/29/2025] [Accepted: 01/30/2025] [Indexed: 02/14/2025] Open
Abstract
Many studies have used single-cell RNA sequencing (scRNA-seq) to infer gene regulatory networks (GRNs), which are crucial for understanding complex cellular regulation. However, the inherent noise and sparsity of scRNA-seq data present significant challenges to accurate GRN inference. This review explores one promising approach that has been proposed to address these challenges: integrating prior knowledge into the inference process to enhance the reliability of the inferred networks. We categorize common types of prior knowledge, such as experimental data and curated databases, and discuss methods for representing priors, particularly through graph structures. In addition, we classify recent GRN inference algorithms based on their ability to incorporate these priors and assess their performance in different contexts. Finally, we propose a standardized benchmarking framework to evaluate algorithms more fairly, ensuring biologically meaningful comparisons. This review provides guidance for researchers selecting GRN inference methods and offers insights for developers looking to improve current approaches and foster innovation in the field.
Collapse
Affiliation(s)
- Marco Stock
- Helmholtz Center Munich Institute of Epigenetics und Stem Cells, Munich, Germany
- Helmholtz Center Munich Institute of Computational Biology, Munich, Germany
- Helmholtz Center Munich Institute of Functional Epigenetics, Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Corinna Losert
- Helmholtz Center Munich Institute of Computational Biology, Munich, Germany
- Department of Computer Science, TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Matteo Zambon
- Helmholtz Center Munich Institute of Epigenetics und Stem Cells, Munich, Germany
- Helmholtz Center Munich Institute of Computational Biology, Munich, Germany
- Helmholtz Center Munich Institute of Functional Epigenetics, Munich, Germany
| | - Niclas Popp
- Helmholtz Center Munich Institute of Epigenetics und Stem Cells, Munich, Germany
- Helmholtz Center Munich Institute of Computational Biology, Munich, Germany
- Helmholtz Center Munich Institute of Functional Epigenetics, Munich, Germany
| | - Gabriele Lubatti
- Helmholtz Center Munich Institute of Epigenetics und Stem Cells, Munich, Germany
- Helmholtz Center Munich Institute of Computational Biology, Munich, Germany
- Helmholtz Center Munich Institute of Functional Epigenetics, Munich, Germany
| | - Eva Hörmanseder
- Helmholtz Center Munich Institute of Epigenetics und Stem Cells, Munich, Germany
| | - Matthias Heinig
- Helmholtz Center Munich Institute of Computational Biology, Munich, Germany
- Department of Computer Science, TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- German Centre for Cardiovascular Research (DZHK), Munich Heart Association, Partner Site Munich, Berlin, Germany
| | - Antonio Scialdone
- Helmholtz Center Munich Institute of Epigenetics und Stem Cells, Munich, Germany.
- Helmholtz Center Munich Institute of Computational Biology, Munich, Germany.
- Helmholtz Center Munich Institute of Functional Epigenetics, Munich, Germany.
| |
Collapse
|
13
|
Dibaeinia P, Ojha A, Sinha S. Interpretable AI for inference of causal molecular relationships from omics data. SCIENCE ADVANCES 2025; 11:eadk0837. [PMID: 39951525 PMCID: PMC11827637 DOI: 10.1126/sciadv.adk0837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 01/14/2025] [Indexed: 02/16/2025]
Abstract
The discovery of molecular relationships from high-dimensional data is a major open problem in bioinformatics. Machine learning and feature attribution models have shown great promise in this context but lack causal interpretation. Here, we show that a popular feature attribution model, under certain assumptions, estimates an average of a causal quantity reflecting the direct influence of one variable on another. We leverage this insight to propose a precise definition of a gene regulatory relationship and implement a new tool, CIMLA (Counterfactual Inference by Machine Learning and Attribution Models), to identify differences in gene regulatory networks between biological conditions, a problem that has received great attention in recent years. Using extensive benchmarking on simulated data, we show that CIMLA is more robust to confounding variables and is more accurate than leading methods. Last, we use CIMLA to analyze a previously published single-cell RNA sequencing dataset from subjects with and without Alzheimer's disease (AD), discovering several potential regulators of AD.
Collapse
Affiliation(s)
- Payam Dibaeinia
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Abhishek Ojha
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Saurabh Sinha
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
- H. Milton School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
14
|
Morin A, Chu CP, Pavlidis P. Identifying Reproducible Transcription Regulator Coexpression Patterns with Single Cell Transcriptomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.02.15.580581. [PMID: 38559016 PMCID: PMC10979919 DOI: 10.1101/2024.02.15.580581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
The proliferation of single cell transcriptomics has potentiated our ability to unveil patterns that reflect dynamic cellular processes such as the regulation of gene transcription. In this study, we leverage a broad collection of single cell RNA-seq data to identify the gene partners whose expression is most coordinated with each human and mouse transcription regulator (TR). We assembled 120 human and 103 mouse scRNA-seq datasets from the literature (>28 million cells), constructing a single cell coexpression network for each. We aimed to understand the consistency of TR coexpression profiles across a broad sampling of biological contexts, rather than examine the preservation of context-specific signals. Our workflow therefore explicitly prioritizes the patterns that are most reproducible across cell types. Towards this goal, we characterize the similarity of each TR's coexpression within and across species. We create single cell coexpression rankings for each TR, demonstrating that this aggregated information recovers literature curated targets on par with ChIP-seq data. We then combine the coexpression and ChIP-seq information to identify candidate regulatory interactions supported across methods and species. Finally, we highlight interactions for the important neural TR ASCL1 to demonstrate how our compiled information can be adopted for community use.
Collapse
Affiliation(s)
- Alexander Morin
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, BC, Canada
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, BC, Canada
| | - C. Pan Chu
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, BC, Canada
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, BC, Canada
| | - Paul Pavlidis
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
- Department of Psychiatry, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
15
|
Liu F, Li N, Yan ZY, Chen X. Time-series transcriptome analysis reveals the cascade mechanism of biological processes following the perturbation of the MVA pathway in Salvia miltiorrhiza. PLANT MOLECULAR BIOLOGY 2025; 115:20. [PMID: 39821838 PMCID: PMC11742292 DOI: 10.1007/s11103-024-01547-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2024] [Accepted: 12/13/2024] [Indexed: 01/19/2025]
Abstract
Various biological processes are interconnected in plants. Transcription factors (TFs) often act as regulatory hubs to regulate plant growth and responses to stress by integrating various biological pathways. Despite extensive studies on TFs functions in various plant species, our understanding of the details of TFs regulation remains limited. In this study, clonal seedlings of Salvia miltiorrhiza were exposed to specific inhibitors for 12 h. Time-series transcriptome data, sampled hourly, were used to construct co-expression networks and gene regulatory networks (GRNs). Transcriptome dynamic analysis was utilized to capture the gene expression dynamics of various biological processes and decipher the potential molecular mechanisms that regulate these processes. The perturbation results showed the growth and development processes of S.miltiorrhiza were primarily affected at the early stage, whereas stress response-related biological processes were mainly influenced at the later stage. And there was a correlation between the series of key differentially expressed genes in terpenoid biosynthesis pathways and the topological distribution of these pathways. Furthermore, the GRNs based on TFs indicate that TFs play a crucial role in connecting various biological processes. In the cytoplasmic lysate gene regulatory module, SmWRKY48-SmTCP4-SmWRKY28 constituted a regulation hub regulating S.miltiorrhiza responses to perturbation of the MVA pathway. The regulation hub mediated various pathways, including pyruvate metabolism, glycolysis/gluconeogenesis, amino acid metabolism, and ubiquinone and other terpenoid-quinone biosynthesis.Our findings suggest that perturbation of a key biological pathway in S.miltiorrhiza has time-dependent effects on other biological processes. And SmWRKY48-SmTCP4-SmWRKY28 constitutes the regulatory hub in S.miltiorrhiza responses to perturbation of MVA pathway.
Collapse
Affiliation(s)
- Fang Liu
- School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan, China
| | - Nan Li
- School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan, China
| | - Zhu-Yun Yan
- School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan, China
| | - Xin Chen
- School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan, China.
| |
Collapse
|
16
|
Zhang Y, Zhao J, Sun X, Zheng Y, Chen T, Wang Z. Leveraging independent component analysis to unravel transcriptional regulatory networks: A critical review and future directions. Biotechnol Adv 2025; 78:108479. [PMID: 39577573 DOI: 10.1016/j.biotechadv.2024.108479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Revised: 11/11/2024] [Accepted: 11/14/2024] [Indexed: 11/24/2024]
Abstract
Transcriptional regulatory networks (TRNs) play a crucial role in exploring microbial life activities and complex regulatory mechanisms. The comprehensive reconstruction of TRNs requires the integration of large-scale experimental data, which poses significant challenges due to the complexity of regulatory relationships. The application of machine learning tools, such as clustering analysis, has been employed to investigate TRNs, but these methods have limitations in capturing both global and local co-expression effects. In contrast, Independent Component Analysis (ICA) has emerged as a powerful analysis algorithm for modularizing independently regulated gene sets in TRNs, allowing it to account for both global and local co-expression effects. In this review, we comprehensively summarize the application of ICA in unraveling TRNs and highlight the research progress in three key aspects: (1) extending TRNs with iModulon analysis; (2) elucidating the regulatory mechanisms triggered by environmental perturbation; and (3) exploring the mechanisms of transcriptional regulation triggered by changes in microbial physiological state. At the end of this review, we also address the challenges facing ICA in TRN analysis and outline future research directions to promote the advancement of ICA-based transcriptomics analysis in biotechnology and related fields.
Collapse
Affiliation(s)
- Yuhan Zhang
- Frontier Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China; SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
| | - Jianxiao Zhao
- Frontier Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China; SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
| | - Xi Sun
- Frontier Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China; SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China; School of Life Science, Ningxia University, Yinchuan 750021, China
| | - Yangyang Zheng
- Frontier Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China; SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
| | - Tao Chen
- Frontier Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China; SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
| | - Zhiwen Wang
- Frontier Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China; SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China; School of Life Science, Ningxia University, Yinchuan 750021, China.
| |
Collapse
|
17
|
Zhang L, Fang Y, Shi M, Ren K, Guan X, Younas W, Cheng Y, Zhang W, Wang Y, Xia XQ. Gonadal expression profiles reveal the underlying mechanisms of temperature effects on sex determination in the large-scale loach (Paramisgurnus dabryanus). Anim Reprod Sci 2025; 272:107661. [PMID: 39644765 DOI: 10.1016/j.anireprosci.2024.107661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Revised: 11/24/2024] [Accepted: 11/30/2024] [Indexed: 12/09/2024]
Abstract
The sex determination mechanism in large-scale loach (Paramisgurnus dabryanus) follows a ZZ/ZW system, with sexual differentiation regulated by both genotypic factors and temperature effects (GSD+TSD), where elevated temperatures result in a higher proportion of males. Currently, research on the sex determination mechanisms in large-scale loach is limited, and the specific gene expression profiles and the role of temperature in influencing sex remain largely unknown. This study investigated the impact of temperature on the sex ratio in cultured populations of the large-scale loach, and then identified a female-specific genetic marker by whole genome sequencing, facilitating the distinguishing of females, males, and pseudo-males within this population. Transcriptomic analysis was subsequently performed on these groups, and the data revealed a similar expression pattern between pseudo-males and true-males. The research combined differential expression analysis with WGCNA to construct a regulatory network of nine sex differentiation-related genes (SDG) (map3k4, trpv4, hsd17b12a, wt1, ar, dmrt1, bcar1, sox9a, cyp17a1), indicating that sex differentiation in large-scale loach is probably driven by the regulation of male-related genes. The transcriptomic analysis suggested that temperature significantly modified the expression of SDG in the ovaries, while in the testes, it predominantly affects metabolism-related pathways. We established a temperature-sensitive gene network in females, based on the correlation between gene expression and temperature, as well as the number of co-regulated genes in female data. We propose that, with increasing temperature, wt1 serves as a central regulator, leading to the down-regulation of foxl2a, cyp19a1a, and the cholesterol biosynthesis-related gene sqlea, ultimately resulting in the development of pseudo-males.
Collapse
Affiliation(s)
- Lei Zhang
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Hubei Hongshan Laboratory, Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture and Rural Affairs, The Innovation Academy of Seed Design, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Yutong Fang
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Hubei Hongshan Laboratory, Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture and Rural Affairs, The Innovation Academy of Seed Design, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; College of Fisheries and Life Science, Dalian Ocean University, Dalian 116023, China
| | - Mijuan Shi
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Hubei Hongshan Laboratory, Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture and Rural Affairs, The Innovation Academy of Seed Design, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing, China.
| | - Keyi Ren
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Hubei Hongshan Laboratory, Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture and Rural Affairs, The Innovation Academy of Seed Design, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; College of Fisheries and Life Science, Dalian Ocean University, Dalian 116023, China
| | - Xin Guan
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Hubei Hongshan Laboratory, Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture and Rural Affairs, The Innovation Academy of Seed Design, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; College of Fisheries and Life Science, Dalian Ocean University, Dalian 116023, China
| | - Waqar Younas
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Hubei Hongshan Laboratory, Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture and Rural Affairs, The Innovation Academy of Seed Design, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Yingyin Cheng
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Hubei Hongshan Laboratory, Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture and Rural Affairs, The Innovation Academy of Seed Design, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Wanting Zhang
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Hubei Hongshan Laboratory, Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture and Rural Affairs, The Innovation Academy of Seed Design, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Yaping Wang
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Hubei Hongshan Laboratory, Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture and Rural Affairs, The Innovation Academy of Seed Design, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Xiao-Qin Xia
- Key Laboratory of Breeding Biotechnology and Sustainable Aquaculture (CAS), Hubei Hongshan Laboratory, Key Laboratory of Aquaculture Disease Control, Ministry of Agriculture and Rural Affairs, The Innovation Academy of Seed Design, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China; College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
18
|
Yang G, Lei S, Yang G. Robust Model-Free Identification of the Causal Networks Underlying Complex Nonlinear Systems. ENTROPY (BASEL, SWITZERLAND) 2024; 26:1063. [PMID: 39766692 PMCID: PMC11675911 DOI: 10.3390/e26121063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2024] [Revised: 11/28/2024] [Accepted: 11/30/2024] [Indexed: 01/11/2025]
Abstract
Inferring causal networks from noisy observations is of vital importance in various fields. Due to the complexity of system modeling, the way in which universal and feasible inference algorithms are studied is a key challenge for network reconstruction. In this study, without any assumptions, we develop a novel model-free framework to uncover only the direct relationships in networked systems from observations of their nonlinear dynamics. Our proposed methods are termed multiple-order Polynomial Conditional Granger Causality (PCGC) and sparse PCGC (SPCGC). PCGC mainly adopts polynomial functions to approximate the whole system model, which can be used to judge the interactions among nodes through subsequent nonlinear Granger causality analysis. For SPCGC, Lasso optimization is first used for dimension reduction, and then PCGC is executed to obtain the final network. Specifically, the conditional variables are fused in this general, model-free framework regardless of their formulations in the system model, which could effectively reconcile the inference of direct interactions with an indirect influence. Based on many classical dynamical systems, the performances of PCGC and SPCGC are analyzed and verified. Generally, the proposed framework could be quite promising for the provision of certain guidance for data-driven modeling with an unknown model.
Collapse
Affiliation(s)
- Guanxue Yang
- School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China;
| | - Shimin Lei
- School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China;
| | - Guanxiao Yang
- College of Automation, Jiangsu University of Science and Technology, Zhenjiang 212100, China;
| |
Collapse
|
19
|
Jin H, Kim W, Yuan M, Li X, Yang H, Li M, Shi M, Turkez H, Uhlen M, Zhang C, Mardinoglu A. Identification of SPP1 + macrophages as an immune suppressor in hepatocellular carcinoma using single-cell and bulk transcriptomics. Front Immunol 2024; 15:1446453. [PMID: 39691723 PMCID: PMC11649653 DOI: 10.3389/fimmu.2024.1446453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Accepted: 11/19/2024] [Indexed: 12/19/2024] Open
Abstract
Introduction Macrophages and T cells play crucial roles in liver physiology, but their functional diversity in hepatocellular carcinoma (HCC) remains largely unknown. Methods Two bulk RNA-sequencing (RNA-seq) cohorts for HCC were analyzed using gene co-expression network analysis. Key gene modules and networks were mapped to single-cell RNA-sequencing (scRNA-seq) data of HCC. Cell type fraction of bulk RNA-seq data was estimated by deconvolution approach using single-cell RNA-sequencing data as a reference. Survival analysis was carried out to estimate the prognosis of different immune cell types in bulk RNA-seq cohorts. Cell-cell interaction analysis was performed to identify potential links between immune cell types in HCC. Results In this study, we analyzed RNA-seq data from two large-scale HCC cohorts, revealing a major and consensus gene co-expression cluster with significant implications for immunosuppression. Notably, these genes exhibited higher enrichment in liver macrophages than T cells, as confirmed by scRNA-seq data from HCC patients. Integrative analysis of bulk and single-cell RNA-seq data pinpointed SPP1 + macrophages as an unfavorable cell type, while VCAN + macrophages, C1QA + macrophages, and CD8 + T cells were associated with a more favorable prognosis for HCC patients. Subsequent scRNA-seq investigations and in vitro experiments elucidated that SPP1, predominantly secreted by SPP1 + macrophages, inhibits CD8 + T cell proliferation. Finally, targeting SPP1 in tumor-associated macrophages through inhibition led to a shift towards a favorable phenotype. Discussion This study underpins the potential of SPP1 as a translational target in immunotherapy for HCC.
Collapse
Affiliation(s)
- Han Jin
- Central Laboratory, Tianjin Medical University General Hospital, Tianjin, China
- Science for Life Laboratory, KTH – Royal Institute of Technology, Stockholm, Sweden
| | - Woonghee Kim
- Science for Life Laboratory, KTH – Royal Institute of Technology, Stockholm, Sweden
| | - Meng Yuan
- Science for Life Laboratory, KTH – Royal Institute of Technology, Stockholm, Sweden
| | - Xiangyu Li
- Science for Life Laboratory, KTH – Royal Institute of Technology, Stockholm, Sweden
| | - Hong Yang
- Science for Life Laboratory, KTH – Royal Institute of Technology, Stockholm, Sweden
| | - Mengzhen Li
- Science for Life Laboratory, KTH – Royal Institute of Technology, Stockholm, Sweden
| | - Mengnan Shi
- Science for Life Laboratory, KTH – Royal Institute of Technology, Stockholm, Sweden
| | - Hasan Turkez
- Department of Medical Biology, Faculty of Medicine, Atatürk University, Erzurum, Türkiye
| | - Mathias Uhlen
- Science for Life Laboratory, KTH – Royal Institute of Technology, Stockholm, Sweden
| | - Cheng Zhang
- Science for Life Laboratory, KTH – Royal Institute of Technology, Stockholm, Sweden
| | - Adil Mardinoglu
- Science for Life Laboratory, KTH – Royal Institute of Technology, Stockholm, Sweden
- Centre for Host-Microbiome Interactions, Faculty of Dentistry, Oral & Craniofacial Sciences, King’s College London, London, United Kingdom
| |
Collapse
|
20
|
Huang Y, Huang S, Zhang XF, Ou-Yang L, Liu C. NJGCG: A node-based joint Gaussian copula graphical model for gene networks inference across multiple states. Comput Struct Biotechnol J 2024; 23:3199-3210. [PMID: 39263209 PMCID: PMC11388165 DOI: 10.1016/j.csbj.2024.08.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2024] [Revised: 08/05/2024] [Accepted: 08/11/2024] [Indexed: 09/13/2024] Open
Abstract
Inferring the interactions between genes is essential for understanding the mechanisms underlying biological processes. Gene networks will change along with the change of environment and state. The accumulation of gene expression data from multiple states makes it possible to estimate the gene networks in various states based on computational methods. However, most existing gene network inference methods focus on estimating a gene network from a single state, ignoring the similarities between networks in different but related states. Moreover, in addition to individual edges, similarities and differences between different networks may also be driven by hub genes. But existing network inference methods rarely consider hub genes, which affects the accuracy of network estimation. In this paper, we propose a novel node-based joint Gaussian copula graphical (NJGCG) model to infer multiple gene networks from gene expression data containing heterogeneous samples jointly. Our model can handle various gene expression data with missing values. Furthermore, a tree-structured group lasso penalty is designed to identify the common and specific hub genes in different gene networks. Simulation studies show that our proposed method outperforms other compared methods in all cases. We also apply NJGCG to infer the gene networks for different stages of differentiation in mouse embryonic stem cells and different subtypes of breast cancer, and explore changes in gene networks across different stages of differentiation or different subtypes of breast cancer. The common and specific hub genes in the estimated gene networks are closely related to stem cell differentiation processes and heterogeneity within breast cancers.
Collapse
Affiliation(s)
- Yun Huang
- Department of Geriatrics, The First Affiliated Hospital of Fujian Medical University, Fuzhou 350005, China
- Clinical Research Center for Geriatric Hypertension Disease of Fujian province, The First Affiliated Hospital of Fujian Medical University, Fuzhou 350005, China
| | - Sen Huang
- Guangdong Key Laboratory of Intelligent Information Processing, College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, China
| | - Le Ou-Yang
- Guangdong Key Laboratory of Intelligent Information Processing, College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Chen Liu
- Department of Oncology, Molecular Oncology Research Institute, The First Affiliated Hospital of Fujian Medical University, Fuzhou 350005, China
- Department of Oncology, National Regional Medical Center, Binhai Campus of The First Affiliated Hospital, Fujian Medical University, Fuzhou 350212, China
- Fujian Key Laboratory of Precision Medicine for Cancer, The First Affiliated Hospital of Fujian Medical University, Fuzhou 350005, China
| |
Collapse
|
21
|
Grützmann K, Kraft T, Meinhardt M, Meier F, Westphal D, Seifert M. Network-based analysis of heterogeneous patient-matched brain and extracranial melanoma metastasis pairs reveals three homogeneous subgroups. Comput Struct Biotechnol J 2024; 23:1036-1050. [PMID: 38464935 PMCID: PMC10920107 DOI: 10.1016/j.csbj.2024.02.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/15/2024] [Accepted: 02/15/2024] [Indexed: 03/12/2024] Open
Abstract
Melanoma, the deadliest form of skin cancer, can metastasize to different organs. Molecular differences between brain and extracranial melanoma metastases are poorly understood. Here, promoter methylation and gene expression of 11 heterogeneous patient-matched pairs of brain and extracranial metastases were analyzed using melanoma-specific gene regulatory networks learned from public transcriptome and methylome data followed by network-based impact propagation of patient-specific alterations. This innovative data analysis strategy allowed to predict potential impacts of patient-specific driver candidate genes on other genes and pathways. The patient-matched metastasis pairs clustered into three robust subgroups with specific downstream targets with known roles in cancer, including melanoma (SG1: RBM38, BCL11B, SG2: GATA3, FES, SG3: SLAMF6, PYCARD). Patient subgroups and ranking of target gene candidates were confirmed in a validation cohort. Summarizing, computational network-based impact analyses of heterogeneous metastasis pairs predicted individual regulatory differences in melanoma brain metastases, cumulating into three consistent subgroups with specific downstream target genes.
Collapse
Affiliation(s)
- Konrad Grützmann
- Institute for Medical Informatics and Biometry, Faculty of Medicine, TU Dresden, 01307 Dresden, Germany
| | - Theresa Kraft
- Institute for Medical Informatics and Biometry, Faculty of Medicine, TU Dresden, 01307 Dresden, Germany
| | - Matthias Meinhardt
- Department of Pathology, University Hospital Carl Gustav Carus Dresden, TU Dresden, 01307 Dresden, Germany
| | - Friedegund Meier
- Department of Dermatology, University Hospital Carl Gustav Carus Dresden, TU Dresden, 01307 Dresden, Germany
- National Center for Tumor Diseases (NCT), D-01307 Dresden, Germany
| | - Dana Westphal
- Department of Dermatology, University Hospital Carl Gustav Carus Dresden, TU Dresden, 01307 Dresden, Germany
- National Center for Tumor Diseases (NCT), D-01307 Dresden, Germany
| | - Michael Seifert
- Institute for Medical Informatics and Biometry, Faculty of Medicine, TU Dresden, 01307 Dresden, Germany
- National Center for Tumor Diseases (NCT), D-01307 Dresden, Germany
| |
Collapse
|
22
|
Peng D, Cahan P. OneSC: a computational platform for recapitulating cell state transitions. Bioinformatics 2024; 40:btae703. [PMID: 39570626 PMCID: PMC11630913 DOI: 10.1093/bioinformatics/btae703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 11/13/2024] [Accepted: 11/19/2024] [Indexed: 11/22/2024] Open
Abstract
MOTIVATION Computational modeling of cell state transitions has been a great interest of many in the field of developmental biology, cancer biology, and cell fate engineering because it enables performing perturbation experiments in silico more rapidly and cheaply than could be achieved in a lab. Recent advancements in single-cell RNA-sequencing (scRNA-seq) allow the capture of high-resolution snapshots of cell states as they transition along temporal trajectories. Using these high-throughput datasets, we can train computational models to generate in silico "synthetic" cells that faithfully mimic the temporal trajectories. RESULTS Here we present OneSC, a platform that can simulate cell state transitions using systems of stochastic differential equations govern by a regulatory network of core transcription factors (TFs). Different from many current network inference methods, OneSC prioritizes on generating Boolean network that produces faithful cell state transitions and terminal cell states that mimic real biological systems. Applying OneSC to real data, we inferred a core TF network using a mouse myeloid progenitor scRNA-seq dataset and showed that the dynamical simulations of that network generate synthetic single-cell expression profiles that faithfully recapitulate the four myeloid differentiation trajectories going into differentiated cell states (erythrocytes, megakaryocytes, granulocytes, and monocytes). Finally, through the in silico perturbations of the mouse myeloid progenitor core network, we showed that OneSC can accurately predict cell fate decision biases of TF perturbations that closely match with previous experimental observations. AVAILABILITY AND IMPLEMENTATION OneSC is implemented as a Python package on GitHub (https://github.com/CahanLab/oneSC) and on Zenodo (https://zenodo.org/records/14052421).
Collapse
Affiliation(s)
- Da Peng
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, United States
| | - Patrick Cahan
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, United States
- Institute for Cell Engineering, Johns Hopkins University, Baltimore, MD 21205, United States
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD 21205, United States
| |
Collapse
|
23
|
Tucci A, Flores-Vergara MA, Franks RG. Machine Learning Inference of Gene Regulatory Networks in Developing Mimulus Seeds. PLANTS (BASEL, SWITZERLAND) 2024; 13:3297. [PMID: 39683091 DOI: 10.3390/plants13233297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Revised: 11/07/2024] [Accepted: 11/19/2024] [Indexed: 12/18/2024]
Abstract
The angiosperm seed represents a critical evolutionary breakthrough that has been shown to propel the reproductive success and radiation of flowering plants. Seeds promote the rapid diversification of angiosperms by establishing postzygotic reproductive barriers, such as hybrid seed inviability. While prezygotic barriers to reproduction tend to be transient, postzygotic barriers are often permanent and therefore can play a pivotal role in facilitating speciation. This property of the angiosperm seed is exemplified in the Mimulus genus. In order to further the understanding of the gene regulatory mechanisms important in the Mimulus seed, we performed gene regulatory network (GRN) inference analysis by using time-series RNA-seq data from developing hybrid seeds from a viable cross between Mimulus guttatus and Mimulus pardalis. GRN inference has the capacity to identify active regulatory mechanisms in a sample and highlight genes of potential biological importance. In our case, GRN inference also provided the opportunity to uncover active regulatory relationships and generate a reference set of putative gene regulations. We deployed two GRN inference algorithms-RTP-STAR and KBoost-on three different subsets of our transcriptomic dataset. While the two algorithms yielded GRNs with different regulations and topologies when working with the same data subset, there was still significant overlap in the specific gene regulations they inferred, and they both identified potential novel regulatory mechanisms that warrant further investigation.
Collapse
Affiliation(s)
- Albert Tucci
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC 27695, USA
| | - Miguel A Flores-Vergara
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC 27695, USA
| | - Robert G Franks
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC 27695, USA
| |
Collapse
|
24
|
Karamveer, Uzun Y. Approaches for Benchmarking Single-Cell Gene Regulatory Network Methods. Bioinform Biol Insights 2024; 18:11779322241287120. [PMID: 39502448 PMCID: PMC11536393 DOI: 10.1177/11779322241287120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 09/10/2024] [Indexed: 11/08/2024] Open
Abstract
Gene regulatory networks are powerful tools for modeling genetic interactions that control the expression of genes driving cell differentiation, and single-cell sequencing offers a unique opportunity to build these networks with high-resolution genomic data. There are many proposed computational methods to build these networks using single-cell data, and different approaches are used to benchmark these methods. However, a comprehensive discussion specifically focusing on benchmarking approaches is missing. In this article, we lay the GRN terminology, present an overview of common gold-standard studies and data sets, and define the performance metrics for benchmarking network construction methodologies. We also point out the advantages and limitations of different benchmarking approaches, suggest alternative ground truth data sets that can be used for benchmarking, and specify additional considerations in this context.
Collapse
Affiliation(s)
- Karamveer
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Yasin Uzun
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Penn State Cancer Institute, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| |
Collapse
|
25
|
Wang Y, Zheng P, Cheng YC, Wang Z, Aravkin A. WENDY: Covariance dynamics based gene regulatory network inference. Math Biosci 2024; 377:109284. [PMID: 39168402 DOI: 10.1016/j.mbs.2024.109284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 06/25/2024] [Accepted: 08/16/2024] [Indexed: 08/23/2024]
Abstract
Determining gene regulatory network (GRN) structure is a central problem in biology, with a variety of inference methods available for different types of data. For a widely prevalent and challenging use case, namely single-cell gene expression data measured after intervention at multiple time points with unknown joint distributions, there is only one known specifically developed method, which does not fully utilize the rich information contained in this data type. We develop an inference method for the GRN in this case, netWork infErence by covariaNce DYnamics, dubbed WENDY. The core idea of WENDY is to model the dynamics of the covariance matrix, and solve this dynamics as an optimization problem to determine the regulatory relationships. To evaluate its effectiveness, we compare WENDY with other inference methods using synthetic data and experimental data. Our results demonstrate that WENDY performs well across different data sets.
Collapse
Affiliation(s)
- Yue Wang
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, 10027, NY, USA.
| | - Peng Zheng
- Institute for Health Metrics and Evaluation, Seattle, 98195, WA, USA; Department of Health Metrics Sciences, University of Washington, Seattle, 98195, WA, USA
| | - Yu-Chen Cheng
- Department of Data Science, Dana-Farber Cancer Institute, Boston, 02215, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, 02115, MA, USA; Center for Cancer Evolution, Dana-Farber Cancer Institute, Boston, 02215, MA, USA; Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Zikun Wang
- Laboratory of Genetics, The Rockefeller University, New York, 10065, NY, USA
| | - Aleksandr Aravkin
- Department of Applied Mathematics, University of Washington, Seattle, 98195, WA, USA
| |
Collapse
|
26
|
Dong J, Li J, Wang F. Deep Learning in Gene Regulatory Network Inference: A Survey. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2089-2101. [PMID: 39137088 DOI: 10.1109/tcbb.2024.3442536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2024]
Abstract
Understanding the intricate regulatory relationships among genes is crucial for comprehending the development, differentiation, and cellular response in living systems. Consequently, inferring gene regulatory networks (GRNs) based on observed data has gained significant attention as a fundamental goal in biological applications. The proliferation and diversification of available data present both opportunities and challenges in accurately inferring GRNs. Deep learning, a highly successful technique in various domains, holds promise in aiding GRN inference. Several GRN inference methods employing deep learning models have been proposed; however, the selection of an appropriate method remains a challenge for life scientists. In this survey, we provide a comprehensive analysis of 12 GRN inference methods that leverage deep learning models. We trace the evolution of these major methods and categorize them based on the types of applicable data. We delve into the core concepts and specific steps of each method, offering a detailed evaluation of their effectiveness and scalability across different scenarios. These insights enable us to make informed recommendations. Moreover, we explore the challenges faced by GRN inference methods utilizing deep learning and discuss future directions, providing valuable suggestions for the advancement of data scientists in this field.
Collapse
|
27
|
Schrod S, Lück N, Lohmayer R, Solbrig S, Völkl D, Wipfler T, Shutta KH, Ben Guebila M, Schäfer A, Beißbarth T, Zacharias HU, Oefner PJ, Quackenbush J, Altenbuchinger M. Spatial Cellular Networks from omics data with SpaCeNet. Genome Res 2024; 34:1371-1383. [PMID: 39231609 PMCID: PMC11529864 DOI: 10.1101/gr.279125.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 08/27/2024] [Indexed: 09/06/2024]
Abstract
Advances in omics technologies have allowed spatially resolved molecular profiling of single cells, providing a window not only into the diversity and distribution of cell types within a tissue, but also into the effects of interactions between cells in shaping the transcriptional landscape. Cells send chemical and mechanical signals which are received by other cells, where they can subsequently initiate context-specific gene regulatory responses. These interactions and their responses shape the individual molecular phenotype of a cell in a given microenvironment. RNAs or proteins measured in individual cells, together with the cells' spatial distribution, provide invaluable information about these mechanisms and the regulation of genes beyond processes occurring independently in each individual cell. "SpaCeNet" is a method designed to elucidate both the intracellular molecular networks (how molecular variables affect each other within the cell) and the intercellular molecular networks (how cells affect molecular variables in their neighbors). This is achieved by estimating conditional independence (CI) relations between captured variables within individual cells and by disentangling these from CI relations between variables of different cells.
Collapse
Affiliation(s)
- Stefan Schrod
- Department of Medical Bioinformatics, University Medical Center Göttingen, 37077 Göttingen, Germany
| | - Niklas Lück
- Department of Medical Bioinformatics, University Medical Center Göttingen, 37077 Göttingen, Germany
| | - Robert Lohmayer
- Leibniz Institute for Immunotherapy, 93053 Regensburg, Germany
| | - Stefan Solbrig
- Institute of Theoretical Physics, University of Regensburg, 93053 Regensburg, Germany
| | - Dennis Völkl
- Institute of Theoretical Physics, University of Regensburg, 93053 Regensburg, Germany
| | - Tina Wipfler
- Institute of Theoretical Physics, University of Regensburg, 93053 Regensburg, Germany
| | - Katherine H Shutta
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA
| | - Marouen Ben Guebila
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA
| | - Andreas Schäfer
- Institute of Theoretical Physics, University of Regensburg, 93053 Regensburg, Germany
| | - Tim Beißbarth
- Department of Medical Bioinformatics, University Medical Center Göttingen, 37077 Göttingen, Germany
- Campus Institute Data Science (CIDAS), University of Göttingen, 37077 Göttingen, Germany
| | - Helena U Zacharias
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Hannover Medical School, 30625 Hannover, Germany
| | - Peter J Oefner
- Institute of Functional Genomics, University of Regensburg, 93053 Regensburg, Germany
| | - John Quackenbush
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA
| | - Michael Altenbuchinger
- Department of Medical Bioinformatics, University Medical Center Göttingen, 37077 Göttingen, Germany;
| |
Collapse
|
28
|
Kernfeld E, Yang Y, Weinstock J, Battle A, Cahan P. A systematic comparison of computational methods for expression forecasting. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.28.551039. [PMID: 37577640 PMCID: PMC10418073 DOI: 10.1101/2023.07.28.551039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Expression forecasting methods use machine learning models to predict how a cell will alter its transcriptome upon perturbation. Such methods are enticing because they promise to answer pressing questions in fields ranging from developmental genetics to cell fate engineering and because they are a fast, cheap, and accessible complement to the corresponding experiments. However, the absolute and relative accuracy of these methods is poorly characterized, limiting their informed use, their improvement, and the interpretation of their predictions. To address these issues, we created a benchmarking platform that combines a panel of 11 large-scale perturbation datasets with an expression forecasting software engine that encompasses or interfaces to a wide variety of methods. We used our platform to systematically assess methods, parameters, and sources of auxiliary data, finding that performance strongly depends on the choice of metric, and especially for simple metrics like mean squared error, it is uncommon for expression forecasting methods to out-perform simple baselines. Our platform will serve as a resource to improve methods and to identify contexts in which expression forecasting can succeed.
Collapse
|
29
|
Zhang J, Liu L, Wei X, Zhao C, Luo Y, Li J, Le TD. Scanning sample-specific miRNA regulation from bulk and single-cell RNA-sequencing data. BMC Biol 2024; 22:218. [PMID: 39334271 PMCID: PMC11438147 DOI: 10.1186/s12915-024-02020-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 09/24/2024] [Indexed: 09/30/2024] Open
Abstract
BACKGROUND RNA-sequencing technology provides an effective tool for understanding miRNA regulation in complex human diseases, including cancers. A large number of computational methods have been developed to make use of bulk and single-cell RNA-sequencing data to identify miRNA regulations at the resolution of multiple samples (i.e. group of cells or tissues). However, due to the heterogeneity of individual samples, there is a strong need to infer miRNA regulation specific to individual samples to uncover miRNA regulation at the single-sample resolution level. RESULTS Here, we develop a framework, Scan, for scanning sample-specific miRNA regulation. Since a single network inference method or strategy cannot perform well for all types of new data, Scan incorporates 27 network inference methods and two strategies to infer tissue-specific or cell-specific miRNA regulation from bulk or single-cell RNA-sequencing data. Results on bulk and single-cell RNA-sequencing data demonstrate the effectiveness of Scan in inferring sample-specific miRNA regulation. Moreover, we have found that incorporating the prior information of miRNA targets can generally improve the accuracy of miRNA target prediction. In addition, Scan can contribute to construct cell/tissue correlation networks and recover aggregate miRNA regulatory networks. Finally, the comparison results have shown that the performance of network inference methods is likely to be data-specific, and selecting optimal network inference methods is required for more accurate prediction of miRNA targets. CONCLUSIONS Scan provides a useful method to help infer sample-specific miRNA regulation for new data, benchmark new network inference methods and deepen the understanding of miRNA regulation at the resolution of individual samples.
Collapse
Affiliation(s)
- Junpeng Zhang
- School of Engineering, Dali University, Dali, 671003, Yunnan, China.
| | - Lin Liu
- UniSA STEM, University of South Australia, Mawson Lakes, SA, 5095, Australia
| | - Xuemei Wei
- School of Engineering, Dali University, Dali, 671003, Yunnan, China
| | - Chunwen Zhao
- School of Engineering, Dali University, Dali, 671003, Yunnan, China
| | - Yanbi Luo
- School of Engineering, Dali University, Dali, 671003, Yunnan, China
| | - Jiuyong Li
- UniSA STEM, University of South Australia, Mawson Lakes, SA, 5095, Australia
| | - Thuc Duy Le
- UniSA STEM, University of South Australia, Mawson Lakes, SA, 5095, Australia.
| |
Collapse
|
30
|
Bustad E, Petry E, Gu O, Griebel BT, Rustad TR, Sherman DR, Yang JH, Ma S. Predicting bacterial fitness in Mycobacterium tuberculosis with transcriptional regulatory network-informed interpretable machine learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.23.614645. [PMID: 39386570 PMCID: PMC11463588 DOI: 10.1101/2024.09.23.614645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Mycobacterium tuberculosis (Mtb) is the causative agent of tuberculosis disease, the greatest source of global mortality by a bacterial pathogen. Mtb adapts and responds to diverse stresses such as antibiotics by inducing transcriptional stress-response regulatory programs. Understanding how and when these mycobacterial regulatory programs are activated could enable novel treatment strategies for potentiating the efficacy of new and existing drugs. Here we sought to define and analyze Mtb regulatory programs that modulate bacterial fitness. We assembled a large Mtb RNA expression compendium and applied these to infer a comprehensive Mtb transcriptional regulatory network and compute condition-specific transcription factor activity profiles. We utilized transcriptomic and functional genomics data to train an interpretable machine learning model that can predict Mtb fitness from transcription factor activity profiles. We demonstrated that this transcription factor activity-based model can successfully predict Mtb growth arrest and growth resumption under hypoxia and reaeration using only RNA-seq expression data as a starting point. These integrative network modeling and machine learning analyses thus enable the prediction of mycobacterial fitness under different environmental and genetic contexts. We envision these models can potentially inform the future design of prognostic assays and therapeutic intervention that can cripple Mtb growth and survival to cure tuberculosis disease.
Collapse
Affiliation(s)
- Ethan Bustad
- Center for Global Infectious Disease Research, Seattle Children’s Research Institute, Seattle WA, USA
| | - Edson Petry
- Center for Emerging and Re-emerging Pathogens, Rutgers New Jersey Medical School, Newark NJ, USA
| | - Oliver Gu
- Center for Emerging and Re-emerging Pathogens, Rutgers New Jersey Medical School, Newark NJ, USA
| | - Braden T. Griebel
- Center for Global Infectious Disease Research, Seattle Children’s Research Institute, Seattle WA, USA
- Department of Chemical Engineering, University of Washington, Seattle WA, USA
| | | | - David R. Sherman
- Department of Microbiology, University of Washington, Seattle WA, USA
| | - Jason H. Yang
- Center for Emerging and Re-emerging Pathogens, Rutgers New Jersey Medical School, Newark NJ, USA
- Department of Microbiology, Biochemistry, & Molecular Genetics, Rutgers New Jersey Medical School, Newark NJ, USA
| | - Shuyi Ma
- Center for Global Infectious Disease Research, Seattle Children’s Research Institute, Seattle WA, USA
- Department of Chemical Engineering, University of Washington, Seattle WA, USA
- Department of Pediatrics, University of Washington, Seattle WA, USA
- Pathobiology Graduate Program, Department of Global Health, University of Washington, Seattle WA, USA
| |
Collapse
|
31
|
K Lodi M, Chernikov A, Ghosh P. COFFEE: consensus single cell-type specific inference for gene regulatory networks. Brief Bioinform 2024; 25:bbae457. [PMID: 39311699 PMCID: PMC11418232 DOI: 10.1093/bib/bbae457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 07/22/2024] [Accepted: 09/02/2024] [Indexed: 09/26/2024] Open
Abstract
The inference of gene regulatory networks (GRNs) is crucial to understanding the regulatory mechanisms that govern biological processes. GRNs may be represented as edges in a graph, and hence, it have been inferred computationally for scRNA-seq data. A wisdom of crowds approach to integrate edges from several GRNs to create one composite GRN has demonstrated improved performance when compared with individual algorithm implementations on bulk RNA-seq and microarray data. In an effort to extend this approach to scRNA-seq data, we present COFFEE (COnsensus single cell-type speciFic inFerence for gEnE regulatory networks), a Borda voting-based consensus algorithm that integrates information from 10 established GRN inference methods. We conclude that COFFEE has improved performance across synthetic, curated, and experimental datasets when compared with baseline methods. Additionally, we show that a modified version of COFFEE can be leveraged to improve performance on newer cell-type specific GRN inference methods. Overall, our results demonstrate that consensus-based methods with pertinent modifications continue to be valuable for GRN inference at the single cell level. While COFFEE is benchmarked on 10 algorithms, it is a flexible strategy that can incorporate any set of GRN inference algorithms according to user preference. A Python implementation of COFFEE may be found on GitHub: https://github.com/lodimk2/coffee.
Collapse
Affiliation(s)
- Musaddiq K Lodi
- Integrative Life Sciences, Virginia Commonwealth University, 1000 W Cary St, Richmond, VA 23284, United States
| | - Anna Chernikov
- Center for Biological Data Science, Virginia Commonwealth University, 1015 Floyd Ave, Richmond, VA 23284, United States
| | - Preetam Ghosh
- Department of Computer Science, Virginia Commonwealth University, 401 W Main St, Richmond, VA 23284, United States
| |
Collapse
|
32
|
Ji R, Geng Y, Quan X. Inferring gene regulatory networks with graph convolutional network based on causal feature reconstruction. Sci Rep 2024; 14:21342. [PMID: 39266676 PMCID: PMC11393083 DOI: 10.1038/s41598-024-71864-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Accepted: 09/02/2024] [Indexed: 09/14/2024] Open
Abstract
Inferring gene regulatory networks through deep learning and causal inference methods is a crucial task in the field of computational biology and bioinformatics. This study presents a novel approach that uses a Graph Convolutional Network (GCN) guided by causal information to infer Gene Regulatory Networks (GRN). The transfer entropy and reconstruction layer are utilized to achieve causal feature reconstruction, mitigating the information loss problem caused by multiple rounds of neighbor aggregation in GCN, resulting in a causal and integrated representation of node features. Separable features are extracted from gene expression data by the Gaussian-kernel Autoencoder to improve computational efficiency. Experimental results on the DREAM5 and the mDC dataset demonstrate that our method exhibits superior performance compared to existing algorithms, as indicated by the higher values of the AUPRC metrics. Furthermore, the incorporation of causal feature reconstruction enhances the inferred GRN, rendering them more reasonable, accurate, and reliable.
Collapse
Affiliation(s)
- Ruirui Ji
- School of Automation and Information Engineering, Xi 'an University of Technology, No.5, Jinhua South Road, Xi'an, 710048, Shaanxi, China.
- Key Laboratory of Shaanxi Province for Complex System Control and Intelligent Information Processing, Xi'an, 710048, Shaanxi, China.
| | - Yi Geng
- School of Automation and Information Engineering, Xi 'an University of Technology, No.5, Jinhua South Road, Xi'an, 710048, Shaanxi, China
| | - Xin Quan
- School of Automation and Information Engineering, Xi 'an University of Technology, No.5, Jinhua South Road, Xi'an, 710048, Shaanxi, China
| |
Collapse
|
33
|
Choquette EM, Forthman KL, Kirlic N, Stewart JL, Cannon MJ, Akeman E, McMillan N, Mesker M, Tarrasch M, Kuplicki R, Paulus MP, Aupperle RL. Impulsivity, trauma history, and interoceptive awareness contribute to completion of a criminal diversion substance use treatment program for women. Front Psychol 2024; 15:1390199. [PMID: 39295754 PMCID: PMC11408307 DOI: 10.3389/fpsyg.2024.1390199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 07/19/2024] [Indexed: 09/21/2024] Open
Abstract
Introduction In the US, women are one of the fastest-growing segments of the prison population and more than a quarter of women in state prison are incarcerated for drug offenses. Substance use criminal diversion programs can be effective. It may be beneficial to identify individuals who are most likely to complete the program versus terminate early as this can provide information regarding who may need additional or unique programming to improve the likelihood of successful program completion. Prior research investigating prediction of success in these programs has primarily focused on demographic factors in male samples. Methods The current study used machine learning (ML) to examine other non-demographic factors related to the likelihood of completing a substance use criminal diversion program for women. A total of 179 women who were enrolled in a criminal diversion program consented and completed neuropsychological, self-report symptom measures, criminal history and demographic surveys at baseline. Model one entered 145 variables into a machine learning (ML) ensemble model, using repeated, nested cross-validation, predicting subsequent graduation versus termination from the program. An identical ML analysis was conducted for model two, in which 34 variables were entered, including the Women's Risk/Needs Assessment (WRNA). Results ML models were unable to predict graduation at an individual level better than chance (AUC = 0.59 [SE = 0.08] and 0.54 [SE = 0.13]). Post-hoc analyses indicated measures of impulsivity, trauma history, interoceptive awareness, employment/financial risk, housing safety, antisocial friends, anger/hostility, and WRNA total score and risk scores exhibited medium to large effect sizes in predicting treatment completion (p < 0.05; ds = 0.29 to 0.81). Discussion Results point towards the complexity involved in attempting to predict treatment completion at the individual level but also provide potential targets to inform future research aiming to reduce recidivism.
Collapse
Affiliation(s)
| | | | - Namik Kirlic
- Laureate Institute for Brain Research, Tulsa, OK, United States
- Department of Community Medicine, University of Tulsa, Tulsa, OK, United States
| | - Jennifer L. Stewart
- Laureate Institute for Brain Research, Tulsa, OK, United States
- Department of Community Medicine, University of Tulsa, Tulsa, OK, United States
| | | | | | - Nick McMillan
- Women in Recovery, Family and Children’s Services, Tulsa, OK, United States
| | - Micah Mesker
- Women in Recovery, Family and Children’s Services, Tulsa, OK, United States
| | - Mimi Tarrasch
- Women in Recovery, Family and Children’s Services, Tulsa, OK, United States
| | - Rayus Kuplicki
- Laureate Institute for Brain Research, Tulsa, OK, United States
| | - Martin P. Paulus
- Laureate Institute for Brain Research, Tulsa, OK, United States
- Department of Community Medicine, University of Tulsa, Tulsa, OK, United States
| | - Robin L. Aupperle
- Laureate Institute for Brain Research, Tulsa, OK, United States
- Department of Community Medicine, University of Tulsa, Tulsa, OK, United States
| |
Collapse
|
34
|
Segura-Ortiz A, García-Nieto J, Aldana-Montes JF, Navas-Delgado I. Multi-objective context-guided consensus of a massive array of techniques for the inference of Gene Regulatory Networks. Comput Biol Med 2024; 179:108850. [PMID: 39013340 DOI: 10.1016/j.compbiomed.2024.108850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 07/03/2024] [Accepted: 07/03/2024] [Indexed: 07/18/2024]
Abstract
BACKGROUND AND OBJECTIVE Gene Regulatory Network (GRN) inference is a fundamental task in biology and medicine, as it enables a deeper understanding of the intricate mechanisms of gene expression present in organisms. This bioinformatics problem has been addressed in the literature through multiple computational approaches. Techniques developed for inferring from expression data have employed Bayesian networks, ordinary differential equations (ODEs), machine learning, information theory measures and neural networks, among others. The diversity of implementations and their respective customization have led to the emergence of many tools and multiple specialized domains derived from them, understood as subsets of networks with specific characteristics that are challenging to detect a priori. This specialization has introduced significant uncertainty when choosing the most appropriate technique for a particular dataset. This proposal, named MO-GENECI, builds upon the basic idea of the previous proposal GENECI and optimizes consensus among different inference techniques, through a carefully refined multi-objective evolutionary algorithm guided by various objective functions, linked to the biological context at hand. METHODS MO-GENECI has been tested on an extensive and diverse academic benchmark of 106 gene regulatory networks from multiple sources and sizes. The evaluation of MO-GENECI compared its performance to individual techniques using key metrics (AUROC and AUPR) for gene regulatory network inference. Friedman's statistical ranking provided an ordered classification, followed by non-parametric Holm tests to determine statistical significance. RESULTS MO-GENECI's Pareto front approximation facilitates easy selection of an appropriate solution based on generic input data characteristics. The best solution consistently emerged as the winner in all statistical tests, and in many cases, the median precision solution showed no statistically significant difference compared to the winner. CONCLUSIONS MO-GENECI has not only demonstrated achieving more accurate results than individual techniques, but has also overcome the uncertainty associated with the initial choice due to its flexibility and adaptability. It is shown intelligently to select the most suitable techniques for each case. The source code is hosted in a public repository at GitHub under MIT license: https://github.com/AdrianSeguraOrtiz/MO-GENECI. Moreover, to facilitate its installation and use, the software associated with this implementation has been encapsulated in a Python package available at PyPI: https://pypi.org/project/geneci/.
Collapse
Affiliation(s)
- Adrián Segura-Ortiz
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain.
| | - José García-Nieto
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| | - José F Aldana-Montes
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| | - Ismael Navas-Delgado
- Department de Lenguajes y Ciencias de la Computación, ITIS Software, Universidad de Málaga, Málaga, 29071, Spain; Biomedical Research Institute of Málaga (IBIMA), Universidad de Málaga, Málaga, Spain
| |
Collapse
|
35
|
Wei PJ, Bao JJ, Gao Z, Tan JY, Cao RF, Su Y, Zheng CH, Deng L. MEFFGRN: Matrix enhancement and feature fusion-based method for reconstructing the gene regulatory network of epithelioma papulosum cyprini cells by spring viremia of carp virus infection. Comput Biol Med 2024; 179:108835. [PMID: 38996550 DOI: 10.1016/j.compbiomed.2024.108835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 06/05/2024] [Accepted: 06/29/2024] [Indexed: 07/14/2024]
Abstract
Gene regulatory networks (GRNs) are crucial for understanding organismal molecular mechanisms and processes. Construction of GRN in the epithelioma papulosum cyprini (EPC) cells of cyprinid fish by spring viremia of carp virus (SVCV) infection helps understand the immune regulatory mechanisms that enhance the survival capabilities of cyprinid fish. Although many computational methods have been used to infer GRNs, specialized approaches for predicting the GRN of EPC cells following SVCV infection are lacking. In addition, most existing methods focus primarily on gene expression features, neglecting the valuable network structural information in known GRNs. In this study, we propose a novel supervised deep neural network, named MEFFGRN (Matrix Enhancement- and Feature Fusion-based method for Gene Regulatory Network inference), to accurately predict the GRN of EPC cells following SVCV infection. MEFFGRN considers both gene expression data and network structure information of known GRN and introduces a matrix enhancement method to address the sparsity issue of known GRN, extracting richer network structure information. To optimize the benefits of CNN (Convolutional Neural Network) in image processing, gene expression and enhanced GRN data were transformed into histogram images for each gene pair respectively. Subsequently, these histograms were separately fed into CNNs for training to obtain the corresponding gene expression and network structural features. Furthermore, a feature fusion mechanism was introduced to comprehensively integrate the gene expression and network structural features. This integration considers the specificity of each feature and their interactive information, resulting in a more comprehensive and precise feature representation during the fusion process. Experimental results from both real-world and benchmark datasets demonstrate that MEFFGRN achieves competitive performance compared with state-of-the-art computational methods. Furthermore, study findings from SVCV-infected EPC cells suggest that MEFFGRN can predict novel gene regulatory relationships.
Collapse
Affiliation(s)
- Pi-Jing Wei
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Jin-Jin Bao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Zhen Gao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Jing-Yun Tan
- Shenzhen Key Laboratory of Microbial Genetic Engineering, College of Life Sciences and Oceanology, Shenzhen University, Shenzhen, 518055, Guangdong, China
| | - Rui-Fen Cao
- Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Yansen Su
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China
| | - Chun-Hou Zheng
- School of Artificial Intelligence, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China.
| | - Li Deng
- Shenzhen Key Laboratory of Microbial Genetic Engineering, College of Life Sciences and Oceanology, Shenzhen University, Shenzhen, 518055, Guangdong, China.
| |
Collapse
|
36
|
White BS, de Reyniès A, Newman AM, Waterfall JJ, Lamb A, Petitprez F, Lin Y, Yu R, Guerrero-Gimenez ME, Domanskyi S, Monaco G, Chung V, Banerjee J, Derrick D, Valdeolivas A, Li H, Xiao X, Wang S, Zheng F, Yang W, Catania CA, Lang BJ, Bertus TJ, Piermarocchi C, Caruso FP, Ceccarelli M, Yu T, Guo X, Bletz J, Coller J, Maecker H, Duault C, Shokoohi V, Patel S, Liliental JE, Simon S, Saez-Rodriguez J, Heiser LM, Guinney J, Gentles AJ. Community assessment of methods to deconvolve cellular composition from bulk gene expression. Nat Commun 2024; 15:7362. [PMID: 39191725 PMCID: PMC11350143 DOI: 10.1038/s41467-024-50618-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 07/11/2024] [Indexed: 08/29/2024] Open
Abstract
We evaluate deconvolution methods, which infer levels of immune infiltration from bulk expression of tumor samples, through a community-wide DREAM Challenge. We assess six published and 22 community-contributed methods using in vitro and in silico transcriptional profiles of admixed cancer and healthy immune cells. Several published methods predict most cell types well, though they either were not trained to evaluate all functional CD8+ T cell states or do so with low accuracy. Several community-contributed methods address this gap, including a deep learning-based approach, whose strong performance establishes the applicability of this paradigm to deconvolution. Despite being developed largely using immune cells from healthy tissues, deconvolution methods predict levels of tumor-derived immune cells well. Our admixed and purified transcriptional profiles will be a valuable resource for developing deconvolution methods, including in response to common challenges we observe across methods, such as sensitive identification of functional CD4+ T cell states.
Collapse
Affiliation(s)
- Brian S White
- Sage Bionetworks, Seattle, WA, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Aurélien de Reyniès
- Centre de Recherche des Cordeliers, INSERM U1138, Université Paris Cité, Paris, France
| | - Aaron M Newman
- Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Joshua J Waterfall
- INSERM U830 and Translational Research Department, Institut Curie, PSL Research University, Paris, France
| | | | - Florent Petitprez
- Programme Cartes d'Identité des Tumeurs, Ligue Nationale Contre le Cancer, Paris, France
- MRC Centre for Reproductive Health, the Queen's Medical Research Institute, University of Edinburgh, Edinburgh, UK
| | - Yating Lin
- Xiamen University, Xiamen, Fujian, China
| | | | - Martin E Guerrero-Gimenez
- Institute of Biochemistry and Biotechnology, School of Medicine, National University of Cuyo, Mendoza, Argentina
| | | | - Gianni Monaco
- BIOGEM Institute of Molecular Biology and Genetics, Ariano Irpino, AV, Italy
| | | | | | - Daniel Derrick
- Department of Biomedical Engineering, Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
| | - Alberto Valdeolivas
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Haojun Li
- Xiamen University, Xiamen, Fujian, China
| | - Xu Xiao
- Xiamen University, Xiamen, Fujian, China
| | - Shun Wang
- Department of Pathology, Cancer Hospital, Chinese Aacdemy of Medical Science, Beijing, China
| | | | | | - Carlos A Catania
- Laboratory of Intelligent Systems (LABSIN), Engineering School, National University of Cuyo, Mendoza, Argentina
| | - Benjamin J Lang
- Department of Radiation Oncology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
| | | | | | - Francesca P Caruso
- BIOGEM Institute of Molecular Biology and Genetics, Ariano Irpino, AV, Italy
| | - Michele Ceccarelli
- BIOGEM Institute of Molecular Biology and Genetics, Ariano Irpino, AV, Italy
- Sylvester Comprehensive Cancer Center, Department of Public Health Sciences, University of Miami Miller School of Medicine, Miami, Florida, USA
| | | | | | | | - John Coller
- Stanford Functional Genomics Facility, Stanford University School of Medicine, Stanford, CA, USA
| | - Holden Maecker
- Institute for Immunity, Transplantation, and Infection, Stanford University School of Medicine, Stanford, CA, USA
| | - Caroline Duault
- Institute for Immunity, Transplantation, and Infection, Stanford University School of Medicine, Stanford, CA, USA
| | - Vida Shokoohi
- Stanford Functional Genomics Facility, Stanford University School of Medicine, Stanford, CA, USA
| | - Shailja Patel
- Translational Applications Service Center, Stanford University School of Medicine, Stanford, CA, USA
| | - Joanna E Liliental
- Translational Applications Service Center, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Laura M Heiser
- Department of Biomedical Engineering, Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
| | | | - Andrew J Gentles
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA.
- Department of Pathology, Stanford University, Stanford, CA, USA.
| |
Collapse
|
37
|
Kernfeld E, Keener R, Cahan P, Battle A. Transcriptome data are insufficient to control false discoveries in regulatory network inference. Cell Syst 2024; 15:709-724.e13. [PMID: 39173585 PMCID: PMC11642480 DOI: 10.1016/j.cels.2024.07.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 05/31/2024] [Accepted: 07/22/2024] [Indexed: 08/24/2024]
Abstract
Inference of causal transcriptional regulatory networks (TRNs) from transcriptomic data suffers notoriously from false positives. Approaches to control the false discovery rate (FDR), for example, via permutation, bootstrapping, or multivariate Gaussian distributions, suffer from several complications: difficulty in distinguishing direct from indirect regulation, nonlinear effects, and causal structure inference requiring "causal sufficiency," meaning experiments that are free of any unmeasured, confounding variables. Here, we use a recently developed statistical framework, model-X knockoffs, to control the FDR while accounting for indirect effects, nonlinear dose-response, and user-provided covariates. We adjust the procedure to estimate the FDR correctly even when measured against incomplete gold standards. However, benchmarking against chromatin immunoprecipitation (ChIP) and other gold standards reveals higher observed than reported FDR. This indicates that unmeasured confounding is a major driver of FDR in TRN inference. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Eric Kernfeld
- Department of Biomedical Engineering, Johns Hopkins University, 3400 N. Charles Street, Wyman Park Building, Suite 400 West, Baltimore, MD 21218, USA
| | - Rebecca Keener
- Department of Biomedical Engineering, Johns Hopkins University, 3400 N. Charles Street, Wyman Park Building, Suite 400 West, Baltimore, MD 21218, USA
| | - Patrick Cahan
- Department of Biomedical Engineering, Johns Hopkins University, 3400 N. Charles Street, Wyman Park Building, Suite 400 West, Baltimore, MD 21218, USA; Institute for Cell Engineering, Johns Hopkins Medicine, Baltimore, MD, USA; Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA.
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, 3400 N. Charles Street, Wyman Park Building, Suite 400 West, Baltimore, MD 21218, USA; Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA; Department of Genetic Medicine, Johns Hopkins Medicine, Baltimore, MD, USA; Malone Center for Engineering and Healthcare, Johns Hopkins University, Baltimore, MD, USA; Data Science and AI Institute, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
38
|
Zitnik M, Li MM, Wells A, Glass K, Morselli Gysi D, Krishnan A, Murali TM, Radivojac P, Roy S, Baudot A, Bozdag S, Chen DZ, Cowen L, Devkota K, Gitter A, Gosline SJC, Gu P, Guzzi PH, Huang H, Jiang M, Kesimoglu ZN, Koyuturk M, Ma J, Pico AR, Pržulj N, Przytycka TM, Raphael BJ, Ritz A, Sharan R, Shen Y, Singh M, Slonim DK, Tong H, Yang XH, Yoon BJ, Yu H, Milenković T. Current and future directions in network biology. BIOINFORMATICS ADVANCES 2024; 4:vbae099. [PMID: 39143982 PMCID: PMC11321866 DOI: 10.1093/bioadv/vbae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 05/31/2024] [Accepted: 07/08/2024] [Indexed: 08/16/2024]
Abstract
Summary Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology, focusing on molecular/cellular networks but also on other biological network types such as biomedical knowledge graphs, patient similarity networks, brain networks, and social/contact networks relevant to disease spread. In more detail, we highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on future directions of network biology. Additionally, we discuss scientific communities, educational initiatives, and the importance of fostering diversity within the field. This article establishes a roadmap for an immediate and long-term vision for network biology. Availability and implementation Not applicable.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Aydin Wells
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
| | - Deisy Morselli Gysi
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
- Department of Statistics, Federal University of Paraná, Curitiba, Paraná 81530-015, Brazil
- Department of Physics, Northeastern University, Boston, MA 02115, United States
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
| | - Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Wisconsin Institute for Discovery, Madison, WI 53715, United States
| | - Anaïs Baudot
- Aix Marseille Université, INSERM, MMG, Marseille, France
| | - Serdar Bozdag
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- Department of Mathematics, University of North Texas, Denton, TX 76203, United States
| | - Danny Z Chen
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Lenore Cowen
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Kapil Devkota
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Morgridge Institute for Research, Madison, WI 53715, United States
| | - Sara J C Gosline
- Biological Sciences Division, Pacific Northwest National Laboratory, Seattle, WA 98109, United States
| | - Pengfei Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Pietro H Guzzi
- Department of Medical and Surgical Sciences, University Magna Graecia of Catanzaro, Catanzaro, 88100, Italy
| | - Heng Huang
- Department of Computer Science, University of Maryland College Park, College Park, MD 20742, United States
| | - Meng Jiang
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Ziynet Nesibe Kesimoglu
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Mehmet Koyuturk
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA 94158, United States
| | - Nataša Pržulj
- Department of Computer Science, University College London, London, WC1E 6BT, England
- ICREA, Catalan Institution for Research and Advanced Studies, Barcelona, 08010, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, 08034, Spain
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
| | - Anna Ritz
- Department of Biology, Reed College, Portland, OR 97202, United States
| | - Roded Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, United States
| | - Donna K Slonim
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Hanghang Tong
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
| | - Xinan Holly Yang
- Department of Pediatrics, University of Chicago, Chicago, IL 60637, United States
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, United States
| | - Haiyuan Yu
- Department of Computational Biology, Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, United States
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| |
Collapse
|
39
|
Priego Espinosa D, Espinal-Enríquez J, Aldana A, Aldana M, Martínez-Mekler G, Carneiro J, Darszon A. Reviewing mathematical models of sperm signaling networks. Mol Reprod Dev 2024; 91:e23766. [PMID: 39175359 DOI: 10.1002/mrd.23766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 07/22/2024] [Indexed: 08/24/2024]
Abstract
Dave Garbers' work significantly contributed to our understanding of sperm's regulated motility, capacitation, and the acrosome reaction. These key sperm functions involve complex multistep signaling pathways engaging numerous finely orchestrated elements. Despite significant progress, many parameters and interactions among these elements remain elusive. Mathematical modeling emerges as a potent tool to study sperm physiology, providing a framework to integrate experimental results and capture functional dynamics considering biochemical, biophysical, and cellular elements. Depending on research objectives, different modeling strategies, broadly categorized into continuous and discrete approaches, reveal valuable insights into cell function. These models allow the exploration of hypotheses regarding molecules, conditions, and pathways, whenever they become challenging to evaluate experimentally. This review presents an overview of current theoretical and experimental efforts to understand sperm motility regulation, capacitation, and the acrosome reaction. We discuss the strengths and weaknesses of different modeling strategies and highlight key findings and unresolved questions. Notable discoveries include the importance of specific ion channels, the role of intracellular molecular heterogeneity in capacitation and the acrosome reaction, and the impact of pH changes on acrosomal exocytosis. Ultimately, this review underscores the crucial importance of mathematical frameworks in advancing our understanding of sperm physiology and guiding future experimental investigations.
Collapse
Affiliation(s)
| | - Jesús Espinal-Enríquez
- Computational Genomics Division, National Institute of Genomic Medicine (INMEGEN), Mexico City, Mexico
| | - Andrés Aldana
- Network Science Institute, Northeastern University, Boston, Massachusetts, USA
| | - Maximino Aldana
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México (UNAM), Mexico City, México
- Instituto de Ciencias Físicas, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Gustavo Martínez-Mekler
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México (UNAM), Mexico City, México
- Instituto de Ciencias Físicas, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Jorge Carneiro
- Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Lisboa, Portugal
| | - Alberto Darszon
- Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, México
| |
Collapse
|
40
|
Yi X, Liu S, Wu Y, McCloskey D, Meng Z. BPP: a platform for automatic biochemical pathway prediction. Brief Bioinform 2024; 25:bbae355. [PMID: 39082653 PMCID: PMC11289738 DOI: 10.1093/bib/bbae355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 05/16/2024] [Accepted: 07/09/2024] [Indexed: 08/03/2024] Open
Abstract
A biochemical pathway consists of a series of interconnected biochemical reactions to accomplish specific life activities. The participating reactants and resultant products of a pathway, including gene fragments, proteins, and small molecules, coalesce to form a complex reaction network. Biochemical pathways play a critical role in the biochemical domain as they can reveal the flow of biochemical reactions in living organisms, making them essential for understanding life processes. Existing studies of biochemical pathway networks are mainly based on experimentation and pathway database analysis methods, which are plagued by substantial cost constraints. Inspired by the success of representation learning approaches in biomedicine, we develop the biochemical pathway prediction (BPP) platform, which is an automatic BPP platform to identify potential links or attributes within biochemical pathway networks. Our BPP platform incorporates a variety of representation learning models, including the latest hypergraph neural networks technology to model biochemical reactions in pathways. In particular, BPP contains the latest biochemical pathway-based datasets and enables the prediction of potential participants or products of biochemical reactions in biochemical pathways. Additionally, BPP is equipped with an SHAP explainer to explain the predicted results and to calculate the contributions of each participating element. We conduct extensive experiments on our collected biochemical pathway dataset to benchmark the effectiveness of all models available on BPP. Furthermore, our detailed case studies based on the chronological pattern of our dataset demonstrate the effectiveness of our platform. Our BPP web portal, source code and datasets are freely accessible at https://github.com/Glasgow-AI4BioMed/BPP.
Collapse
Affiliation(s)
- Xinhao Yi
- School of Computing Science, University of Glasgow, 18 Lilybank Gardens, Glasgow G12 8RZ, United Kingdom
| | - Siwei Liu
- Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence, Building 1B, Masdar City, Abu Dhabi 000000, United Arab Emirates
| | - Yu Wu
- School of Mathematical Sciences, Fudan University, 220 Handan Rd, Yangpu District, Shanghai 200438, China
| | - Douglas McCloskey
- Artificial Intelligence, BioMed X Institute, Im Neuenheimer Feld 515, Heidelberg 69120, Germany
| | - Zaiqiao Meng
- School of Computing Science, University of Glasgow, 18 Lilybank Gardens, Glasgow G12 8RZ, United Kingdom
| |
Collapse
|
41
|
Loers JU, Vermeirssen V. A single-cell multimodal view on gene regulatory network inference from transcriptomics and chromatin accessibility data. Brief Bioinform 2024; 25:bbae382. [PMID: 39207727 PMCID: PMC11359808 DOI: 10.1093/bib/bbae382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 06/27/2024] [Accepted: 07/23/2024] [Indexed: 09/04/2024] Open
Abstract
Eukaryotic gene regulation is a combinatorial, dynamic, and quantitative process that plays a vital role in development and disease and can be modeled at a systems level in gene regulatory networks (GRNs). The wealth of multi-omics data measured on the same samples and even on the same cells has lifted the field of GRN inference to the next stage. Combinations of (single-cell) transcriptomics and chromatin accessibility allow the prediction of fine-grained regulatory programs that go beyond mere correlation of transcription factor and target gene expression, with enhancer GRNs (eGRNs) modeling molecular interactions between transcription factors, regulatory elements, and target genes. In this review, we highlight the key components for successful (e)GRN inference from (sc)RNA-seq and (sc)ATAC-seq data exemplified by state-of-the-art methods as well as open challenges and future developments. Moreover, we address preprocessing strategies, metacell generation and computational omics pairing, transcription factor binding site detection, and linear and three-dimensional approaches to identify chromatin interactions as well as dynamic and causal eGRN inference. We believe that the integration of transcriptomics together with epigenomics data at a single-cell level is the new standard for mechanistic network inference, and that it can be further advanced with integrating additional omics layers and spatiotemporal data, as well as with shifting the focus towards more quantitative and causal modeling strategies.
Collapse
Affiliation(s)
- Jens Uwe Loers
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Corneel Heymanslaan 10, 9000 Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Zwijnaarde-Technologiepark 71, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| | - Vanessa Vermeirssen
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Corneel Heymanslaan 10, 9000 Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Zwijnaarde-Technologiepark 71, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| |
Collapse
|
42
|
Tian H, Tang L, Yang Z, Xiang Y, Min Q, Yin M, You H, Xiao Z, Shen J. Current understanding of functional peptides encoded by lncRNA in cancer. Cancer Cell Int 2024; 24:252. [PMID: 39030557 PMCID: PMC11265036 DOI: 10.1186/s12935-024-03446-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 07/09/2024] [Indexed: 07/21/2024] Open
Abstract
Dysregulated gene expression and imbalance of transcriptional regulation are typical features of cancer. RNA always plays a key role in these processes. Human transcripts contain many RNAs without long open reading frames (ORF, > 100 aa) and that are more than 200 bp in length. They are usually regarded as long non-coding RNA (lncRNA) which play an important role in cancer regulation, including chromatin remodeling, transcriptional regulation, translational regulation and as miRNA sponges. With the advancement of ribosome profiling and sequencing technologies, increasing research evidence revealed that some ORFs in lncRNA can also encode peptides and participate in the regulation of multiple organ tumors, which undoubtedly opens a new chapter in the field of lncRNA and oncology research. In this review, we discuss the biological function of lncRNA in tumors, the current methods to evaluate their coding potential and the role of functional small peptides encoded by lncRNA in cancers. Investigating the small peptides encoded by lncRNA and understanding the regulatory mechanisms of these functional peptides may contribute to a deeper understanding of cancer and the development of new targeted anticancer therapies.
Collapse
Affiliation(s)
- Hua Tian
- Laboratory of Molecular Pharmacology, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China
- Cell Therapy and Cell Drugs of Luzhou Key Laboratory, Luzhou, 646000, China
- South Sichuan Institute of Translational Medicine, Luzhou, 646000, China
- School of Nursing, Chongqing College of Humanities, Science & Technology, Chongqing, China
| | - Lu Tang
- Laboratory of Molecular Pharmacology, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China
- Cell Therapy and Cell Drugs of Luzhou Key Laboratory, Luzhou, 646000, China
- South Sichuan Institute of Translational Medicine, Luzhou, 646000, China
| | - Zihan Yang
- Department of Pathology, The Affiliated Hospital of Southwest Medical University, Luzhou, China, 646000
| | - Yanxi Xiang
- Laboratory of Molecular Pharmacology, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China
- Cell Therapy and Cell Drugs of Luzhou Key Laboratory, Luzhou, 646000, China
- South Sichuan Institute of Translational Medicine, Luzhou, 646000, China
| | - Qi Min
- Laboratory of Molecular Pharmacology, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China
- Cell Therapy and Cell Drugs of Luzhou Key Laboratory, Luzhou, 646000, China
- South Sichuan Institute of Translational Medicine, Luzhou, 646000, China
| | - Mengshuang Yin
- Laboratory of Molecular Pharmacology, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China
- Cell Therapy and Cell Drugs of Luzhou Key Laboratory, Luzhou, 646000, China
- South Sichuan Institute of Translational Medicine, Luzhou, 646000, China
| | - Huili You
- Laboratory of Molecular Pharmacology, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China
- Cell Therapy and Cell Drugs of Luzhou Key Laboratory, Luzhou, 646000, China
- South Sichuan Institute of Translational Medicine, Luzhou, 646000, China
| | - Zhangang Xiao
- Laboratory of Molecular Pharmacology, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China.
- Cell Therapy and Cell Drugs of Luzhou Key Laboratory, Luzhou, 646000, China.
- South Sichuan Institute of Translational Medicine, Luzhou, 646000, China.
- Gulin Traditional Chinese Medicine Hospital, Luzhou, China.
- Department of Pharmacology, School of Pharmacy, Sichuan College of Traditional Chinese Medicine, Mianyang, China.
| | - Jing Shen
- Laboratory of Molecular Pharmacology, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, 646000, China.
- Cell Therapy and Cell Drugs of Luzhou Key Laboratory, Luzhou, 646000, China.
- South Sichuan Institute of Translational Medicine, Luzhou, 646000, China.
| |
Collapse
|
43
|
Peng H, Xu J, Liu K, Liu F, Zhang A, Zhang X. EIEPCF: accurate inference of functional gene regulatory networks by eliminating indirect effects from confounding factors. Brief Funct Genomics 2024; 23:373-383. [PMID: 37642217 DOI: 10.1093/bfgp/elad040] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 07/07/2023] [Accepted: 08/14/2023] [Indexed: 08/31/2023] Open
Abstract
Reconstructing functional gene regulatory networks (GRNs) is a primary prerequisite for understanding pathogenic mechanisms and curing diseases in animals, and it also provides an important foundation for cultivating vegetable and fruit varieties that are resistant to diseases and corrosion in plants. Many computational methods have been developed to infer GRNs, but most of the regulatory relationships between genes obtained by these methods are biased. Eliminating indirect effects in GRNs remains a significant challenge for researchers. In this work, we propose a novel approach for inferring functional GRNs, named EIEPCF (eliminating indirect effects produced by confounding factors), which eliminates indirect effects caused by confounding factors. This method eliminates the influence of confounding factors on regulatory factors and target genes by measuring the similarity between their residuals. The validation results of the EIEPCF method on simulation studies, the gold-standard networks provided by the DREAM3 Challenge and the real gene networks of Escherichia coli demonstrate that it achieves significantly higher accuracy compared to other popular computational methods for inferring GRNs. As a case study, we utilized the EIEPCF method to reconstruct the cold-resistant specific GRN from gene expression data of cold-resistant in Arabidopsis thaliana. The source code and data are available at https://github.com/zhanglab-wbgcas/EIEPCF.
Collapse
Affiliation(s)
- Huixiang Peng
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Jing Xu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Kangchen Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Fang Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
| | - Aidi Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan 430074, China
| |
Collapse
|
44
|
Yu J, Leng J, Yuan F, Sun D, Wu LY. Reverse network diffusion to remove indirect noise for better inference of gene regulatory networks. Bioinformatics 2024; 40:btae435. [PMID: 38963312 PMCID: PMC11236096 DOI: 10.1093/bioinformatics/btae435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2024] [Revised: 06/24/2024] [Accepted: 07/03/2024] [Indexed: 07/05/2024] Open
Abstract
MOTIVATION Gene regulatory networks (GRNs) are vital tools for delineating regulatory relationships between transcription factors and their target genes. The boom in computational biology and various biotechnologies has made inferring GRNs from multi-omics data a hot topic. However, when networks are constructed from gene expression data, they often suffer from false-positive problem due to the transitive effects of correlation. The presence of spurious noise edges obscures the real gene interactions, which makes downstream analyses, such as detecting gene function modules and predicting disease-related genes, difficult and inefficient. Therefore, there is an urgent and compelling need to develop network denoising methods to improve the accuracy of GRN inference. RESULTS In this study, we proposed a novel network denoising method named REverse Network Diffusion On Random walks (RENDOR). RENDOR is designed to enhance the accuracy of GRNs afflicted by indirect effects. RENDOR takes noisy networks as input, models higher-order indirect interactions between genes by transitive closure, eliminates false-positive effects using the inverse network diffusion method, and produces refined networks as output. We conducted a comparative assessment of GRN inference accuracy before and after denoising on simulated networks and real GRNs. Our results emphasized that the network derived from RENDOR more accurately and effectively captures gene interactions. This study demonstrates the significance of removing network indirect noise and highlights the effectiveness of the proposed method in enhancing the signal-to-noise ratio of noisy networks. AVAILABILITY AND IMPLEMENTATION The R package RENDOR is provided at https://github.com/Wu-Lab/RENDOR and other source code and data are available at https://github.com/Wu-Lab/RENDOR-reproduce.
Collapse
Affiliation(s)
- Jiating Yu
- School of Mathematics and Statistics, Nanjing University of Information Science & Technology, Nanjing 210044, China
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiacheng Leng
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- Zhejiang Lab, Hangzhou 311121, China
| | - Fan Yuan
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Duanchen Sun
- School of Mathematics, Shandong University, Jinan 250100, China
| | - Ling-Yun Wu
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
45
|
Cassan O, Lecellier CH, Martin A, Bréhélin L, Lèbre S. Optimizing data integration improves gene regulatory network inference in Arabidopsis thaliana. Bioinformatics 2024; 40:btae415. [PMID: 38913855 PMCID: PMC11227367 DOI: 10.1093/bioinformatics/btae415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 06/12/2024] [Accepted: 06/21/2024] [Indexed: 06/26/2024] Open
Abstract
MOTIVATIONS Gene regulatory networks (GRNs) are traditionally inferred from gene expression profiles monitoring a specific condition or treatment. In the last decade, integrative strategies have successfully emerged to guide GRN inference from gene expression with complementary prior data. However, datasets used as prior information and validation gold standards are often related and limited to a subset of genes. This lack of complete and independent evaluation calls for new criteria to robustly estimate the optimal intensity of prior data integration in the inference process. RESULTS We address this issue for two regression-based GRN inference models, a weighted random forest (weigthedRF) and a generalized linear model estimated under a weighted LASSO penalty with stability selection (weightedLASSO). These approaches are applied to data from the root response to nitrate induction in Arabidopsis thaliana. For each gene, we measure how the integration of transcription factor binding motifs influences model prediction. We propose a new approach, DIOgene, that uses model prediction error and a simulated null hypothesis in order to optimize data integration strength in a hypothesis-driven, gene-specific manner. This integration scheme reveals a strong diversity of optimal integration intensities between genes, and offers good performance in minimizing prediction error as well as retrieving experimental interactions. Experimental results show that DIOgene compares favorably against state-of-the-art approaches and allows to recover master regulators of nitrate induction. AVAILABILITY AND IMPLEMENTATION The R code and notebooks demonstrating the use of the proposed approaches are available in the repository https://github.com/OceaneCsn/integrative_GRN_N_induction.
Collapse
Affiliation(s)
- Océane Cassan
- LIRMM, Univ Montpellier, CNRS, Montpellier, 34095, France
| | - Charles-Henri Lecellier
- LIRMM, Univ Montpellier, CNRS, Montpellier, 34095, France
- IGMM, Univ Montpellier, CNRS, Montpellier, 34090, France
| | - Antoine Martin
- IPSIM, CNRS, INRAE, Institut Agro, Univ Montpellier, 34060, Montpellier, France
| | | | - Sophie Lèbre
- LIRMM, Univ Montpellier, CNRS, Montpellier, 34095, France
- IMAG, Univ Montpellier, CNRS, Montpellier, 34090, France
- Université Paul-Valéry-Montpellier 3, Montpellier, 34090, France
| |
Collapse
|
46
|
Moeckel C, Mouratidis I, Chantzi N, Uzun Y, Georgakopoulos-Soares I. Advances in computational and experimental approaches for deciphering transcriptional regulatory networks: Understanding the roles of cis-regulatory elements is essential, and recent research utilizing MPRAs, STARR-seq, CRISPR-Cas9, and machine learning has yielded valuable insights. Bioessays 2024; 46:e2300210. [PMID: 38715516 PMCID: PMC11444527 DOI: 10.1002/bies.202300210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 04/22/2024] [Accepted: 04/23/2024] [Indexed: 05/16/2024]
Abstract
Understanding the influence of cis-regulatory elements on gene regulation poses numerous challenges given complexities stemming from variations in transcription factor (TF) binding, chromatin accessibility, structural constraints, and cell-type differences. This review discusses the role of gene regulatory networks in enhancing understanding of transcriptional regulation and covers construction methods ranging from expression-based approaches to supervised machine learning. Additionally, key experimental methods, including MPRAs and CRISPR-Cas9-based screening, which have significantly contributed to understanding TF binding preferences and cis-regulatory element functions, are explored. Lastly, the potential of machine learning and artificial intelligence to unravel cis-regulatory logic is analyzed. These computational advances have far-reaching implications for precision medicine, therapeutic target discovery, and the study of genetic variations in health and disease.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Yasin Uzun
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
- Department of Pediatrics, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
47
|
Ahsen ME, Vogel R, Stolovitzky G. Optimal linear ensemble of binary classifiers. BIOINFORMATICS ADVANCES 2024; 4:vbae093. [PMID: 39011276 PMCID: PMC11249386 DOI: 10.1093/bioadv/vbae093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 05/03/2024] [Accepted: 06/13/2024] [Indexed: 07/17/2024]
Abstract
Motivation The integration of vast, complex biological data with computational models offers profound insights and predictive accuracy. Yet, such models face challenges: poor generalization and limited labeled data. Results To overcome these difficulties in binary classification tasks, we developed the Method for Optimal Classification by Aggregation (MOCA) algorithm, which addresses the problem of generalization by virtue of being an ensemble learning method and can be used in problems with limited or no labeled data. We developed both an unsupervised (uMOCA) and a supervised (sMOCA) variant of MOCA. For uMOCA, we show how to infer the MOCA weights in an unsupervised way, which are optimal under the assumption of class-conditioned independent classifier predictions. When it is possible to use labels, sMOCA uses empirically computed MOCA weights. We demonstrate the performance of uMOCA and sMOCA using simulated data as well as actual data previously used in Dialogue on Reverse Engineering and Methods (DREAM) challenges. We also propose an application of sMOCA for transfer learning where we use pre-trained computational models from a domain where labeled data are abundant and apply them to a different domain with less abundant labeled data. Availability and implementation GitHub repository, https://github.com/robert-vogel/moca.
Collapse
Affiliation(s)
- Mehmet Eren Ahsen
- Department of Business Administration, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, United States
- Department of Biomedical and Translational Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, United States
| | - Robert Vogel
- Thomas J. Watson Research Center, IBM, New York, NY 10598, United States
- Department of Integrated Structural and Computational Biology, Scripps Research, La Jolla, CA 92037, United States
| | | |
Collapse
|
48
|
Nouri N, Gaglia G, Mattoo H, de Rinaldis E, Savova V. GENIX enables comparative network analysis of single-cell RNA sequencing to reveal signatures of therapeutic interventions. CELL REPORTS METHODS 2024; 4:100794. [PMID: 38861988 PMCID: PMC11228368 DOI: 10.1016/j.crmeth.2024.100794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 02/28/2024] [Accepted: 05/20/2024] [Indexed: 06/13/2024]
Abstract
Single-cell RNA sequencing (scRNA-seq) has transformed our understanding of cellular responses to perturbations such as therapeutic interventions and vaccines. Gene relevance to such perturbations is often assessed through differential expression analysis (DEA), which offers a one-dimensional view of the transcriptomic landscape. This method potentially overlooks genes with modest expression changes but profound downstream effects and is susceptible to false positives. We present GENIX (gene expression network importance examination), a computational framework that transcends DEA by constructing gene association networks and employing a network-based comparative model to identify topological signature genes. We benchmark GENIX using both synthetic and experimental datasets, including analysis of influenza vaccine-induced immune responses in peripheral blood mononuclear cells (PBMCs) from recovered COVID-19 patients. GENIX successfully emulates key characteristics of biological networks and reveals signature genes that are missed by classical DEA, thereby broadening the scope of target gene discovery in precision medicine.
Collapse
Affiliation(s)
- Nima Nouri
- Precision Medicine and Computational Biology, Sanofi, 350 Water Street, Cambridge, MA 02141, USA.
| | - Giorgio Gaglia
- Precision Medicine and Computational Biology, Sanofi, 350 Water Street, Cambridge, MA 02141, USA
| | - Hamid Mattoo
- Precision Medicine and Computational Biology, Sanofi, 350 Water Street, Cambridge, MA 02141, USA
| | - Emanuele de Rinaldis
- Precision Medicine and Computational Biology, Sanofi, 350 Water Street, Cambridge, MA 02141, USA
| | - Virginia Savova
- Precision Medicine and Computational Biology, Sanofi, 350 Water Street, Cambridge, MA 02141, USA.
| |
Collapse
|
49
|
Liu J, Xiang T, Song XC, Zhang S, Wu Q, Gao J, Lv M, Shi C, Yang X, Liu Y, Fu J, Shi W, Fang M, Qu G, Yu H, Jiang G. High-Efficiency Effect-Directed Analysis Leveraging Five High Level Advancements: A Critical Review. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:9925-9944. [PMID: 38820315 DOI: 10.1021/acs.est.3c10996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2024]
Abstract
Organic contaminants are ubiquitous in the environment, with mounting evidence unequivocally connecting them to aquatic toxicity, illness, and increased mortality, underscoring their substantial impacts on ecological security and environmental health. The intricate composition of sample mixtures and uncertain physicochemical features of potential toxic substances pose challenges to identify key toxicants in environmental samples. Effect-directed analysis (EDA), establishing a connection between key toxicants found in environmental samples and associated hazards, enables the identification of toxicants that can streamline research efforts and inform management action. Nevertheless, the advancement of EDA is constrained by the following factors: inadequate extraction and fractionation of environmental samples, limited bioassay endpoints and unknown linkage to higher order impacts, limited coverage of chemical analysis (i.e., high-resolution mass spectrometry, HRMS), and lacking effective linkage between bioassays and chemical analysis. This review proposes five key advancements to enhance the efficiency of EDA in addressing these challenges: (1) multiple adsorbents for comprehensive coverage of chemical extraction, (2) high-resolution microfractionation and multidimensional fractionation for refined fractionation, (3) robust in vivo/vitro bioassays and omics, (4) high-performance configurations for HRMS analysis, and (5) chemical-, data-, and knowledge-driven approaches for streamlined toxicant identification and validation. We envision that future EDA will integrate big data and artificial intelligence based on the development of quantitative omics, cutting-edge multidimensional microfractionation, and ultraperformance MS to identify environmental hazard factors, serving for broader environmental governance.
Collapse
Affiliation(s)
- Jifu Liu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tongtong Xiang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- College of Sciences, Northeastern University, Shenyang 110004, China
| | - Xue-Chao Song
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shaoqing Zhang
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Qi Wu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jie Gao
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Meilin Lv
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- College of Sciences, Northeastern University, Shenyang 110004, China
| | - Chunzhen Shi
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Xiaoxi Yang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Yanna Liu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Jianjie Fu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wei Shi
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Mingliang Fang
- Department of Environmental Science and Engineering, Fudan University, Shanghai 200433, China
| | - Guangbo Qu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- Institute of Environment and Health, Jianghan University, Wuhan, Hubei 430056, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hongxia Yu
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Guibin Jiang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
- College of Sciences, Northeastern University, Shenyang 110004, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
50
|
Huo Q, Song R, Ma Z. Recent advances in exploring transcriptional regulatory landscape of crops. FRONTIERS IN PLANT SCIENCE 2024; 15:1421503. [PMID: 38903438 PMCID: PMC11188431 DOI: 10.3389/fpls.2024.1421503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 05/23/2024] [Indexed: 06/22/2024]
Abstract
Crop breeding entails developing and selecting plant varieties with improved agronomic traits. Modern molecular techniques, such as genome editing, enable more efficient manipulation of plant phenotype by altering the expression of particular regulatory or functional genes. Hence, it is essential to thoroughly comprehend the transcriptional regulatory mechanisms that underpin these traits. In the multi-omics era, a large amount of omics data has been generated for diverse crop species, including genomics, epigenomics, transcriptomics, proteomics, and single-cell omics. The abundant data resources and the emergence of advanced computational tools offer unprecedented opportunities for obtaining a holistic view and profound understanding of the regulatory processes linked to desirable traits. This review focuses on integrated network approaches that utilize multi-omics data to investigate gene expression regulation. Various types of regulatory networks and their inference methods are discussed, focusing on recent advancements in crop plants. The integration of multi-omics data has been proven to be crucial for the construction of high-confidence regulatory networks. With the refinement of these methodologies, they will significantly enhance crop breeding efforts and contribute to global food security.
Collapse
Affiliation(s)
| | | | - Zeyang Ma
- State Key Laboratory of Maize Bio-breeding, Frontiers Science Center for Molecular Design Breeding, Joint International Research Laboratory of Crop Molecular Breeding, National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, China
| |
Collapse
|