1
|
Interstitial macrophages are a focus of viral takeover and inflammation in COVID-19 initiation in human lung. J Exp Med 2024; 221:e20232192. [PMID: 38597954 PMCID: PMC11009983 DOI: 10.1084/jem.20232192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 02/09/2024] [Accepted: 03/04/2024] [Indexed: 04/11/2024] Open
Abstract
Early stages of deadly respiratory diseases including COVID-19 are challenging to elucidate in humans. Here, we define cellular tropism and transcriptomic effects of SARS-CoV-2 virus by productively infecting healthy human lung tissue and using scRNA-seq to reconstruct the transcriptional program in "infection pseudotime" for individual lung cell types. SARS-CoV-2 predominantly infected activated interstitial macrophages (IMs), which can accumulate thousands of viral RNA molecules, taking over 60% of the cell transcriptome and forming dense viral RNA bodies while inducing host profibrotic (TGFB1, SPP1) and inflammatory (early interferon response, CCL2/7/8/13, CXCL10, and IL6/10) programs and destroying host cell architecture. Infected alveolar macrophages (AMs) showed none of these extreme responses. Spike-dependent viral entry into AMs used ACE2 and Sialoadhesin/CD169, whereas IM entry used DC-SIGN/CD209. These results identify activated IMs as a prominent site of viral takeover, the focus of inflammation and fibrosis, and suggest targeting CD209 to prevent early pathology in COVID-19 pneumonia. This approach can be generalized to any human lung infection and to evaluate therapeutics.
Collapse
|
2
|
SPLASH2 provides ultra-efficient, scalable, and unsupervised discovery on raw sequencing reads. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.03.17.533189. [PMID: 36993432 PMCID: PMC10055302 DOI: 10.1101/2023.03.17.533189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
Abstract
SPLASH is an unsupervised, reference-free, and unifying algorithm that discovers regulated sequence variation through statistical analysis of k-mer composition, subsuming many application-specific methods. Here, we introduce SPLASH2, a fast, scalable implementation of SPLASH based on an efficient k-mer counting approach. SPLASH2 enables rapid analysis of massive datasets from a wide range of sequencing technologies and biological contexts, delivering unparalleled scale and speed. The SPLASH2 algorithm unveils new biology (without tuning) in single-cell RNA-sequencing data from human muscle cells, as well as bulk RNA-seq from the entire Cancer Cell Line Encyclopedia (CCLE), including substantial unannotated alternative splicing in cancer transcriptome. The same untuned SPLASH2 algorithm recovers the BCR-ABL gene fusion, and detects circRNA sensitively and specifically, underscoring SPLASH2's unmatched precision and scalability across diverse RNA-seq detection tasks.
Collapse
|
3
|
Large-scale benchmarking of circRNA detection tools reveals large differences in sensitivity but not in precision. Nat Methods 2023; 20:1159-1169. [PMID: 37443337 PMCID: PMC10870000 DOI: 10.1038/s41592-023-01944-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 06/12/2023] [Indexed: 07/15/2023]
Abstract
The detection of circular RNA molecules (circRNAs) is typically based on short-read RNA sequencing data processed using computational tools. Numerous such tools have been developed, but a systematic comparison with orthogonal validation is missing. Here, we set up a circRNA detection tool benchmarking study, in which 16 tools detected more than 315,000 unique circRNAs in three deeply sequenced human cell types. Next, 1,516 predicted circRNAs were validated using three orthogonal methods. Generally, tool-specific precision is high and similar (median of 98.8%, 96.3% and 95.5% for qPCR, RNase R and amplicon sequencing, respectively) whereas the sensitivity and number of predicted circRNAs (ranging from 1,372 to 58,032) are the most significant differentiators. Of note, precision values are lower when evaluating low-abundance circRNAs. We also show that the tools can be used complementarily to increase detection sensitivity. Finally, we offer recommendations for future circRNA detection and validation.
Collapse
|
4
|
ReadZS detects cell type-specific and developmentally regulated RNA processing programs in single-cell RNA-seq. Genome Biol 2022; 23:226. [PMID: 36284317 PMCID: PMC9594907 DOI: 10.1186/s13059-022-02795-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 10/13/2022] [Indexed: 11/13/2022] Open
Abstract
RNA processing, including splicing and alternative polyadenylation, is crucial to gene function and regulation, but methods to detect RNA processing from single-cell RNA sequencing data are limited by reliance on pre-existing annotations, peak calling heuristics, and collapsing measurements by cell type. We introduce ReadZS, an annotation-free statistical approach to identify regulated RNA processing in single cells. ReadZS discovers cell type-specific RNA processing in human lung and conserved, developmentally regulated RNA processing in mammalian spermatogenesis-including global 3' UTR shortening in human spermatogenesis. ReadZS also discovers global 3' UTR lengthening in Arabidopsis development, highlighting the usefulness of this method in under-annotated transcriptomes.
Collapse
|
5
|
TGS1 impacts snRNA 3'-end processing, ameliorates survival motor neuron-dependent neurological phenotypes in vivo and prevents neurodegeneration. Nucleic Acids Res 2022; 50:12400-12424. [PMID: 35947650 PMCID: PMC9757054 DOI: 10.1093/nar/gkac659] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 07/21/2022] [Indexed: 12/24/2022] Open
Abstract
Trimethylguanosine synthase 1 (TGS1) is a highly conserved enzyme that converts the 5'-monomethylguanosine cap of small nuclear RNAs (snRNAs) to a trimethylguanosine cap. Here, we show that loss of TGS1 in Caenorhabditis elegans, Drosophila melanogaster and Danio rerio results in neurological phenotypes similar to those caused by survival motor neuron (SMN) deficiency. Importantly, expression of human TGS1 ameliorates the SMN-dependent neurological phenotypes in both flies and worms, revealing that TGS1 can partly counteract the effects of SMN deficiency. TGS1 loss in HeLa cells leads to the accumulation of immature U2 and U4atac snRNAs with long 3' tails that are often uridylated. snRNAs with defective 3' terminations also accumulate in Drosophila Tgs1 mutants. Consistent with defective snRNA maturation, TGS1 and SMN mutant cells also exhibit partially overlapping transcriptome alterations that include aberrantly spliced and readthrough transcripts. Together, these results identify a neuroprotective function for TGS1 and reinforce the view that defective snRNA maturation affects neuronal viability and function.
Collapse
|
6
|
Abstract
Molecular characterization of cell types using single-cell transcriptome sequencing is revolutionizing cell biology and enabling new insights into the physiology of human organs. We created a human reference atlas comprising nearly 500,000 cells from 24 different tissues and organs, many from the same donor. This atlas enabled molecular characterization of more than 400 cell types, their distribution across tissues, and tissue-specific variation in gene expression. Using multiple tissues from a single donor enabled identification of the clonal distribution of T cells between tissues, identification of the tissue-specific mutation rate in B cells, and analysis of the cell cycle state and proliferative potential of shared cell types across tissues. Cell type-specific RNA splicing was discovered and analyzed across tissues within an individual.
Collapse
|
7
|
RNA splicing programs define tissue compartments and cell types at single-cell resolution. eLife 2021; 10:e70692. [PMID: 34515025 PMCID: PMC8563012 DOI: 10.7554/elife.70692] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 09/10/2021] [Indexed: 02/05/2023] Open
Abstract
The extent splicing is regulated at single-cell resolution has remained controversial due to both available data and methods to interpret it. We apply the SpliZ, a new statistical approach, to detect cell-type-specific splicing in >110K cells from 12 human tissues. Using 10X Chromium data for discovery, 9.1% of genes with computable SpliZ scores are cell-type-specifically spliced, including ubiquitously expressed genes MYL6 and RPS24. These results are validated with RNA FISH, single-cell PCR, and Smart-seq2. SpliZ analysis reveals 170 genes with regulated splicing during human spermatogenesis, including examples conserved in mouse and mouse lemur. The SpliZ allows model-based identification of subpopulations indistinguishable based on gene expression, illustrated by subpopulation-specific splicing of classical monocytes involving an ultraconserved exon in SAT1. Together, this analysis of differential splicing across multiple organs establishes that splicing is regulated cell-type-specifically.
Collapse
|
8
|
Abstract
Precise splice junction calls are currently unavailable in scRNA-seq pipelines such as the 10x Chromium platform but are critical for understanding single-cell biology. Here, we introduce SICILIAN, a new method that assigns statistical confidence to splice junctions from a spliced aligner to improve precision. SICILIAN is a general method that can be applied to bulk or single-cell data, but has particular utility for single-cell analysis due to that data's unique challenges and opportunities for discovery. SICILIAN's precise splice detection achieves high accuracy on simulated data, improves concordance between matched single-cell and bulk datasets, and increases agreement between biological replicates. SICILIAN detects unannotated splicing in single cells, enabling the discovery of novel splicing regulation through single-cell analysis workflows.
Collapse
|
9
|
Abstract 3378: SICILIAN: Precise and unbiased detection of gene fusions at the resolution of single cells using improved statistical modeling. Cancer Res 2020. [DOI: 10.1158/1538-7445.am2020-3378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Gene fusions are drivers in a multitude of hematological and solid tumors and hold great promise for developing therapeutic and diagnostic procedures in the clinic, e.g., BCR-ABL1 and TMPRSS2-ERG fusions in chronic myeloid leukemia and prostate cancers, respectively. Recently, our group has established computational evidence that rare and private gene fusions are un-appreciated drivers of 30% of tumors (Dehghannasiri et. al., 2019). However, the function for the vast majority of gene fusions remains unknown. In principle, single-cell RNA-Seq (scRNA-Seq) provides a method to determine the gene expression perturbations resulting from fusion expression. However, current computational methodology cannot precisely call gene fusions at the single-cell level mainly due to the small amount of transcriptomic information in each cell and substantial sequencing noise. To address these challenges, we introduce SIngle Cell precIse spLice estImAtioN (SICILIAN), a highly specific statistically driven fusion detection algorithm that implemented on top of traditional splice aligners. SICILIAN detect a diverse set of RNA splicing events, such as linear and circular RNAs and specifically gene fusions at annotated or un-annotated exonic boundaries. For detecting fusions, SICILIAN takes the spliced alignment information and employs a generalized linear model (GLM) based on alignment features from the alignment file. We use junctions categorized as likely TPs or likely FPs (via orthogonal measures) as training data. After training the model, SICILIAN assigns a statistical score to each fusion junction. Only considering fusions with high enough statistical scores can dramatically increase the precision of detection over typical detection strategies, such as filtering on the number of aligned reads or using ontology-level heuristic filters. Moreover, the assignment of statistical scores facilitates the application of false discovery rate control techniques using the statistical strength across thousands of single-cell samples to identify false positives due to multiple hypothesis testing. SICILIAN has a tunable statistical score for fusion calls and expands the scope of fusion detection to un-annotated exons and sequences while achieving high AUC performance on third-party simulated data. SICILIAN is currently being used to analyze massive single-cell samples from diverse tumor types for systematic profiling of fusions at the resolution of single-cells, an analysis made only possible through unprecedented precision and scale achieved by SICILIAN. In addition to its potential to reveal heterogeneity in tumor fusion expression, SICILIAN promises to enable new discovery of the function of fusion expression.
Citation Format: Roozbeh Dehghannasiri, Julia Eve Olivieri, Julia Salzman. SICILIAN: Precise and unbiased detection of gene fusions at the resolution of single cells using improved statistical modeling [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 3378.
Collapse
|
10
|
Ambiguous splice sites distinguish circRNA and linear splicing in the human genome. Bioinformatics 2020; 35:1263-1268. [PMID: 30192918 DOI: 10.1093/bioinformatics/bty785] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2018] [Revised: 08/04/2018] [Accepted: 09/04/2018] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Identification of splice sites is critical to gene annotation and to determine which sequences control circRNA biogenesis. Full-length RNA transcripts could in principle complete annotations of introns and exons in genomes without external ontologies, i.e., ab initio. However, whether it is possible to reconstruct genomic positions where splicing occurs from full-length transcripts, even if sampled in the absence of noise, depends on the genome sequence composition. If it is not, there exist provable limits on the use of RNA-Seq to define splice locations (linear or circular) in the genome. RESULTS We provide a formal definition of splice site ambiguity due to the genomic sequence by introducing equivalent junction, which is the set of local genomic positions resulting in the same RNA sequence when joined through RNA splicing. We show that equivalent junctions are prevalent in diverse eukaryotic genomes and occur in 88.64% and 78.64% of annotated human splice sites in linear and circRNA junctions, respectively. The observed fractions of equivalent junctions and the frequency of many individual motifs are statistically significant when compared against the null distribution computed via simulation or closed-form. The frequency of equivalent junctions establishes a fundamental limit on the possibility of ab initio reconstruction of RNA transcripts without appealing to the ontology of "GT-AG" boundaries defining introns. Said differently, completely ab initio is impossible in the vast majority of splice sites in annotated circRNAs and linear transcripts. AVAILABILITY AND IMPLEMENTATION Two python scripts generating an equivalent junction sequence per junction are available at: https://github.com/salzmanlab/Equivalent-Junctions. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
11
|
Abstract 2468: Towards precise and cost-effective fusion discovery: A landscape of druggable gene fusions across TCGA cancers. Cancer Res 2019. [DOI: 10.1158/1538-7445.am2019-2468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Gene fusions are one of the hallmarks of cancer and are among the most powerful biomarkers and drug targets in translational cancer genomics. We deploy sMACHETE (scalable MACHETE), a precise and sensitive fusion detection algorithm, particularly engineered for mining massive cancer sequencing databases, to provide a landscape of fusions across human primary cancers. sMACHETE consists of two main computational components: MACHETE-based component and Sequence Bloom Tree (SBT) checkpoint. MACHETE (Hsieh et al., 2017) is a precise fusion algorithm, which employs a statistical model to identify fusion junctions. The first component in sMACHETE is built on MACHETE and has undergone major algorithmic and computational improvements, such as the inclusion of well-known cancer fusions and a cloud-based implementation in Common Workflow Language, which makes the pipeline a good fit for large-scale studies. To control for false positives due to multiple testing in large datasets, the fusions called by the first component are then queried via SBT (Solomon and Kingsford, 2016), which is a kmer-based query algorithm. The fusions whose detection frequencies by MACHETE and SBT are statistically consistent could pass the checkpoint and are called by sMACHETE. sMACHETE achieved 100% positive predictive value, higher than any other top performing algorithm and comparable sensitivity on simulated benchmarking datasets. We have used sMACHETE to systematically analyze fusions in The Cancer Genome Atlas (TCGA) RNA-seq data datasets. sMACHETE calls 31,546 highly confident fusions in 9,946 TCGA tumor samples spanning 33 cancer types. Sarcoma (10 fusions per sample) and Esophageal Carcinoma (8 fusions per sample) have the highest abundance of fusions. We found 525 recurrent fusions, observed in at least 2 samples within a cancer type, in 12% of tumor samples. Our statistical analysis reveals a signature of selection for recurrent fusions and also for recurrent genes, which partner with more than one gene in fusions and are observed in 40% of samples, suggesting an evidence for their oncogenic role in tumorigenesis. Thyroid, Ovarian, Esophageal, and Lung Adenocarcinoma have rates of kinase fusions that exceed expectation by chance, strong evidence that they are unappreciated drivers of the disease. Having integrated our detected fusions with OncoKB database (Chakravarty et al., 2017), we detected druggable fusions in 3% of tumor samples. Our systematic and functional analysis highlights the substantial role of fusions as cancer drivers and their clinical implication in cancer treatment.
Citation Format: Roozbeh Dehghannasiri, Milos Jordanski, Donald E. Freeman, Gillian L. Hsieh, Jonathan M. Howard, Erik Lehnert, Julia Salzman. Towards precise and cost-effective fusion discovery: A landscape of druggable gene fusions across TCGA cancers [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 2468.
Collapse
|
12
|
An experimental design framework for Markovian gene regulatory networks under stationary control policy. BMC SYSTEMS BIOLOGY 2018; 12:137. [PMID: 30577732 PMCID: PMC6302376 DOI: 10.1186/s12918-018-0649-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
BACKGROUND A fundamental problem for translational genomics is to find optimal therapies based on gene regulatory intervention. Dynamic intervention involves a control policy that optimally reduces a cost function based on phenotype by externally altering the state of the network over time. When a gene regulatory network (GRN) model is fully known, the problem is addressed using classical dynamic programming based on the Markov chain associated with the network. When the network is uncertain, a Bayesian framework can be applied, where policy optimality is with respect to both the dynamical objective and the uncertainty, as characterized by a prior distribution. In the presence of uncertainty, it is of great practical interest to develop an experimental design strategy and thereby select experiments that optimally reduce a measure of uncertainty. RESULTS In this paper, we employ mean objective cost of uncertainty (MOCU), which quantifies uncertainty based on the degree to which uncertainty degrades the operational objective, that being the cost owing to undesirable phenotypes. We assume that a number of conditional probabilities characterizing regulatory relationships among genes are unknown in the Markovian GRN. In sum, there is a prior distribution which can be updated to a posterior distribution by observing a regulatory trajectory, and an optimal control policy, known as an "intrinsically Bayesian robust" (IBR) policy. To obtain a better IBR policy, we select an experiment that minimizes the MOCU remaining after applying its output to the network. At this point, we can either stop and find the resulting IBR policy or proceed to determine more unknown conditional probabilities via regulatory observation and find the IBR policy from the resulting posterior distribution. For sequential experimental design this entire process is iterated. Owing to the computational complexity of experimental design, which requires computation of many potential IBR policies, we implement an approximate method utilizing mean first passage times (MFPTs) - but only in experimental design, the final policy being an IBR policy. CONCLUSIONS Comprehensive performance analysis based on extensive simulations on synthetic and real GRNs demonstrate the efficacy of the proposed method, including the accuracy and computational advantage of the approximate MFPT-based design.
Collapse
|
13
|
Sequential Experimental Design for Optimal Structural Intervention in Gene Regulatory Networks Based on the Mean Objective Cost of Uncertainty. Cancer Inform 2018; 17:1176935118790247. [PMID: 30093796 PMCID: PMC6080085 DOI: 10.1177/1176935118790247] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Accepted: 06/25/2018] [Indexed: 11/16/2022] Open
Abstract
Scientists are attempting to use models of ever-increasing complexity, especially in medicine, where gene-based diseases such as cancer require better modeling of cell regulation. Complex models suffer from uncertainty and experiments are needed to reduce this uncertainty. Because experiments can be costly and time-consuming, it is desirable to determine experiments providing the most useful information. If a sequence of experiments is to be performed, experimental design is needed to determine the order. A classical approach is to maximally reduce the overall uncertainty in the model, meaning maximal entropy reduction. A recently proposed method takes into account both model uncertainty and the translational objective, for instance, optimal structural intervention in gene regulatory networks, where the aim is to alter the regulatory logic to maximally reduce the long-run likelihood of being in a cancerous state. The mean objective cost of uncertainty (MOCU) quantifies uncertainty based on the degree to which model uncertainty affects the objective. Experimental design involves choosing the experiment that yields the greatest reduction in MOCU. This article introduces finite-horizon dynamic programming for MOCU-based sequential experimental design and compares it with the greedy approach, which selects one experiment at a time without consideration of the full horizon of experiments. A salient aspect of the article is that it demonstrates the advantage of MOCU-based design over the widely used entropy-based design for both greedy and dynamic programming strategies and investigates the effect of model conditions on the comparative performances.
Collapse
|
14
|
Optimal Objective-Based Experimental Design for Uncertain Dynamical Gene Networks with Experimental Error. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:218-230. [PMID: 27576263 PMCID: PMC5845823 DOI: 10.1109/tcbb.2016.2602873] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
In systems biology, network models are often used to study interactions among cellular components, a salient aim being to develop drugs and therapeutic mechanisms to change the dynamical behavior of the network to avoid undesirable phenotypes. Owing to limited knowledge, model uncertainty is commonplace and network dynamics can be updated in different ways, thereby giving multiple dynamic trajectories, that is, dynamics uncertainty. In this manuscript, we propose an experimental design method that can effectively reduce the dynamics uncertainty and improve performance in an interaction-based network. Both dynamics uncertainty and experimental error are quantified with respect to the modeling objective, herein, therapeutic intervention. The aim of experimental design is to select among a set of candidate experiments the experiment whose outcome, when applied to the network model, maximally reduces the dynamics uncertainty pertinent to the intervention objective.
Collapse
|
15
|
Erratum to: Efficient experimental design for uncertainty reduction in gene regulatory networks. BMC Bioinformatics 2015; 16:410. [PMID: 26652981 PMCID: PMC4677434 DOI: 10.1186/s12859-015-0839-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2015] [Accepted: 12/02/2015] [Indexed: 11/21/2022] Open
|
16
|
Abstract
Background An accurate understanding of interactions among genes plays a major role in developing therapeutic intervention methods. Gene regulatory networks often contain a significant amount of uncertainty. The process of prioritizing biological experiments to reduce the uncertainty of gene regulatory networks is called experimental design. Under such a strategy, the experiments with high priority are suggested to be conducted first. Results The authors have already proposed an optimal experimental design method based upon the objective for modeling gene regulatory networks, such as deriving therapeutic interventions. The experimental design method utilizes the concept of mean objective cost of uncertainty (MOCU). MOCU quantifies the expected increase of cost resulting from uncertainty. The optimal experiment to be conducted first is the one which leads to the minimum expected remaining MOCU subsequent to the experiment. In the process, one must find the optimal intervention for every gene regulatory network compatible with the prior knowledge, which can be prohibitively expensive when the size of the network is large. In this paper, we propose a computationally efficient experimental design method. This method incorporates a network reduction scheme by introducing a novel cost function that takes into account the disruption in the ranking of potential experiments. We then estimate the approximate expected remaining MOCU at a lower computational cost using the reduced networks. Conclusions Simulation results based on synthetic and real gene regulatory networks show that the proposed approximate method has close performance to that of the optimal method but at lower computational cost. The proposed approximate method also outperforms the random selection policy significantly. A MATLAB software implementing the proposed experimental design method is available at http://gsp.tamu.edu/Publications/supplementary/roozbeh15a/.
Collapse
|
17
|
Optimal Experimental Design for Gene Regulatory Networks in the Presence of Uncertainty. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:938-50. [PMID: 26357334 DOI: 10.1109/tcbb.2014.2377733] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Of major interest to translational genomics is the intervention in gene regulatory networks (GRNs) to affect cell behavior; in particular, to alter pathological phenotypes. Owing to the complexity of GRNs, accurate network inference is practically challenging and GRN models often contain considerable amounts of uncertainty. Considering the cost and time required for conducting biological experiments, it is desirable to have a systematic method for prioritizing potential experiments so that an experiment can be chosen to optimally reduce network uncertainty. Moreover, from a translational perspective it is crucial that GRN uncertainty be quantified and reduced in a manner that pertains to the operational cost that it induces, such as the cost of network intervention. In this work, we utilize the concept of mean objective cost of uncertainty (MOCU) to propose a novel framework for optimal experimental design. In the proposed framework, potential experiments are prioritized based on the MOCU expected to remain after conducting the experiment. Based on this prioritization, one can select an optimal experiment with the largest potential to reduce the pertinent uncertainty present in the current network model. We demonstrate the effectiveness of the proposed method via extensive simulations based on synthetic and real gene regulatory networks.
Collapse
|