1
|
Feature engineering from meta-data for prediction of differentially expressed genes: An investigation of Mus musculus exposed to space-conditions. Comput Biol Chem 2024; 109:108026. [PMID: 38335853 DOI: 10.1016/j.compbiolchem.2024.108026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 12/29/2023] [Accepted: 02/02/2024] [Indexed: 02/12/2024]
Abstract
Transcription profiling is a key process that can reveal those biological mechanisms driving the response to various exposure conditions or gene perturbations. In this work, we investigate the prediction of differentially expressed genes (DEGs) when exposed to conditions in space from a set of diverse engineered features. To do this, we collected DEGs and non-differentially expressed genes (NDEGs) of Mus musculus-based experiments on the GeneLab database. We engineered a diverse set of features from factors reported in the literature to affect gene expression. An extreme gradient boosting (XGBoost) model was trained to predict if a given gene would be differentially expressed at various levels of differential expression. The test results on a separate holdout dataset showed an area under the receiver operating characteristics curves (AUCs) of 0.90±0.07, averaged across the five selected percentages of the most and least differentially expressed genes. Subsequently, we investigated the impact of selection of features, both individually with a correlation-based feature-selection procedure and in groups with a combination procedure, on the prediction performance. The feature selection confirmed some known drivers of adaptation to radiation and highlighted some new transcription factors and micro RNAs (miRNAs). Finally, gene ontology (GO) analysis revealed biological processes that tend to have expression patterns most suitable for this approach. This work highlights the potential of detection of differentially expressed genes using a machine learning (ML) approach, and provides some evidence of gene expression changes being captured by a diverse feature set not related to the condition under study.
Collapse
|
2
|
Methods used in microbial forensics and epidemiological investigations for stronger health systems. Forensic Sci Res 2023; 7:650-661. [PMID: 36817258 PMCID: PMC9930754 DOI: 10.1080/20961790.2021.2023272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open
Abstract
This review discusses microbial forensics as an emerging science that finds application in protecting human health. It is important to distinguish naturally acquired infections from those caused by the intentional release of microorganisms to the environment. This information is crucial in formulating procedures against the spread of infectious diseases and prosecuting persons who may be involved in acts of biocrime, bioterrorism, or biowarfare. A comparison between epidemiological investigations and microbial forensic investigations is provided. In addition, a discussion on how microbial forensics strengthens health systems is included in this review. Microbial forensic investigations and epidemiologic examinations employ similar concepts and involve identifying and characterising the microbe of interest. Both fields require formulating an appropriate case definition, determining a pathogen's mode of transmission, and identifying the source(s) of infection. However, the two subdisciplines differ in their objectives. An epidemiological investigation aims to identify the pathogen's source to prevent the spread of the disease. Microbial forensics focuses on source-tracking to facilitate the prosecution of persons responsible for the spread of a pathogen. Both fields use molecular techniques in analysing and comparing DNA, gene products, and biomolecules to identify and characterise the microorganisms of interest. We included case studies to show methods used in microbial forensic investigations, a brief discussion of the public significance of microbial forensic systems, and a roadmap for establishing a system at a national level. This system is expected to strengthen a country's capacity to respond to public health emergencies. Several factors must be considered in establishing national microbial forensic systems. First is the inherent ubiquity, diversity, and adaptability of microorganisms that warrants the use of robust and accurate molecular typing systems. Second, the availability of facilities and scientists who have been trained in epidemiology, molecular biology, bioinformatics, and data analytics. Human resources and infrastructure are critical requirements because formulating strategies and allocating resources in times of infectious disease outbreaks must be data-driven. Establishing and maintaining a national microbial forensic system to strengthen capacities in conducting forensic and epidemiological investigations should be prioritised by all countries, accompanied by a national policy that sets the legislative framework and provides for the system's financial requirements.
Collapse
|
3
|
Compendium-Wide Analysis of Pseudomonas aeruginosa Core and Accessory Genes Reveals Transcriptional Patterns across Strains PAO1 and PA14. mSystems 2023; 8:e0034222. [PMID: 36541762 PMCID: PMC9948736 DOI: 10.1128/msystems.00342-22] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Pseudomonas aeruginosa is an opportunistic pathogen that causes difficult-to-treat infections. Two well-studied divergent P. aeruginosa strain types, PAO1 and PA14, have significant genomic heterogeneity, including diverse accessory genes present in only some strains. Genome content comparisons find core genes that are conserved across both PAO1 and PA14 strains and accessory genes that are present in only a subset of PAO1 and PA14 strains. Here, we use recently assembled transcriptome compendia of publicly available P. aeruginosa RNA sequencing (RNA-seq) samples to create two smaller compendia consisting of only strain PAO1 or strain PA14 samples with each aligned to their cognate reference genome. We confirmed strain annotations and identified other samples for inclusion by assessing each sample's median expression of PAO1-only or PA14-only accessory genes. We then compared the patterns of core gene expression in each strain. To do so, we developed a method by which we analyzed genes in terms of which genes showed similar expression patterns across strain types. We found that some core genes had consistent correlated expression patterns across both compendia, while others were less stable in an interstrain comparison. For each accessory gene, we also determined core genes with correlated expression patterns. We found that stable core genes had fewer coexpressed neighbors that were accessory genes. Overall, this approach for analyzing expression patterns across strain types can be extended to other groups of genes, like phage genes, or applied for analyzing patterns beyond groups of strains, such as samples with different traits, to reveal a deeper understanding of regulation. IMPORTANCE Pseudomonas aeruginosa is a ubiquitous pathogen. There is much diversity among P. aeruginosa strains, including two divergent but well-studied strains, PAO1 and PA14. Understanding how these different strain-level traits manifest is important for identifying targets that regulate different traits of interest. With the availability of thousands of PAO1 and PA14 samples, we created two strain-specific RNA-seq compendia where each one contains hundreds of samples from PAO1 or PA14 strains and used them to compare the expression patterns of core genes that are conserved in both strain types and to determine which core genes have expression patterns that are similar to those of accessory genes that are unique to one strain or the other using an approach that we developed. We found a subset of core genes with different transcriptional patterns across PAO1 and PA14 strains and identified those core genes with expression patterns similar to those of strain-specific accessory genes.
Collapse
|
4
|
Supervised Machine Learning Enables Geospatial Microbial Provenance. Genes (Basel) 2022; 13:1914. [PMID: 36292799 PMCID: PMC9601318 DOI: 10.3390/genes13101914] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 10/14/2022] [Accepted: 10/18/2022] [Indexed: 11/04/2022] Open
Abstract
The recent increase in publicly available metagenomic datasets with geospatial metadata has made it possible to determine location-specific, microbial fingerprints from around the world. Such fingerprints can be useful for comparing microbial niches for environmental research, as well as for applications within forensic science and public health. To determine the regional specificity for environmental metagenomes, we examined 4305 shotgun-sequenced samples from the MetaSUB Consortium dataset-the most extensive public collection of urban microbiomes, spanning 60 different cities, 30 countries, and 6 continents. We were able to identify city-specific microbial fingerprints using supervised machine learning (SML) on the taxonomic classifications, and we also compared the performance of ten SML classifiers. We then further evaluated the five algorithms with the highest accuracy, with the city and continental accuracy ranging from 85-89% to 90-94%, respectively. Thereafter, we used these results to develop Cassandra, a random-forest-based classifier that identifies bioindicator species to aid in fingerprinting and can infer higher-order microbial interactions at each site. We further tested the Cassandra algorithm on the Tara Oceans dataset, the largest collection of marine-based microbial genomes, where it classified the oceanic sample locations with 83% accuracy. These results and code show the utility of SML methods and Cassandra to identify bioindicator species across both oceanic and urban environments, which can help guide ongoing efforts in biotracing, environmental monitoring, and microbial forensics (MF).
Collapse
|
5
|
Using genome-wide expression compendia to study microorganisms. Comput Struct Biotechnol J 2022; 20:4315-4324. [PMID: 36016717 PMCID: PMC9396250 DOI: 10.1016/j.csbj.2022.08.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 08/07/2022] [Accepted: 08/07/2022] [Indexed: 11/30/2022] Open
Abstract
A gene expression compendium is a heterogeneous collection of gene expression experiments assembled from data collected for diverse purposes. The widely varied experimental conditions and genetic backgrounds across samples creates a tremendous opportunity for gaining a systems level understanding of the transcriptional responses that influence phenotypes. Variety in experimental design is particularly important for studying microbes, where the transcriptional responses integrate many signals and demonstrate plasticity across strains including response to what nutrients are available and what microbes are present. Advances in high-throughput measurement technology have made it feasible to construct compendia for many microbes. In this review we discuss how these compendia are constructed and analyzed to reveal transcriptional patterns.
Collapse
|
6
|
A computational workflow for the expansion of heterologous biosynthetic pathways to natural product derivatives. Nat Commun 2021; 12:1760. [PMID: 33741955 PMCID: PMC7979880 DOI: 10.1038/s41467-021-22022-5] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 02/24/2021] [Indexed: 01/31/2023] Open
Abstract
Plant natural products (PNPs) and their derivatives are important but underexplored sources of pharmaceutical molecules. To access this untapped potential, the reconstitution of heterologous PNP biosynthesis pathways in engineered microbes provides a valuable starting point to explore and produce novel PNP derivatives. Here, we introduce a computational workflow to systematically screen the biochemical vicinity of a biosynthetic pathway for pharmaceutical compounds that could be produced by derivatizing pathway intermediates. We apply our workflow to the biosynthetic pathway of noscapine, a benzylisoquinoline alkaloid (BIA) with a long history of medicinal use. Our workflow identifies pathways and enzyme candidates for the production of (S)-tetrahydropalmatine, a known analgesic and anxiolytic, and three additional derivatives. We then construct pathways for these compounds in yeast, resulting in platforms for de novo biosynthesis of BIA derivatives and demonstrating the value of cheminformatic tools to predict reactions, pathways, and enzymes in synthetic biology and metabolic engineering.
Collapse
|
7
|
Independent component analysis recovers consistent regulatory signals from disparate datasets. PLoS Comput Biol 2021; 17:e1008647. [PMID: 33529205 PMCID: PMC7888660 DOI: 10.1371/journal.pcbi.1008647] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Revised: 02/17/2021] [Accepted: 12/18/2020] [Indexed: 01/03/2023] Open
Abstract
The availability of bacterial transcriptomes has dramatically increased in recent years. This data deluge could result in detailed inference of underlying regulatory networks, but the diversity of experimental platforms and protocols introduces critical biases that could hinder scalable analysis of existing data. Here, we show that the underlying structure of the E. coli transcriptome, as determined by Independent Component Analysis (ICA), is conserved across multiple independent datasets, including both RNA-seq and microarray datasets. We subsequently combined five transcriptomics datasets into a large compendium containing over 800 expression profiles and discovered that its underlying ICA-based structure was still comparable to that of the individual datasets. With this understanding, we expanded our analysis to over 3,000 E. coli expression profiles and predicted three high-impact regulons that respond to oxidative stress, anaerobiosis, and antibiotic treatment. ICA thus enables deep analysis of disparate data to uncover new insights that were not visible in the individual datasets. Cells adapt to diverse environments by regulating gene expression. Genome-wide measurements of gene expression levels have exponentially increased in recent years, but successful integration and analysis of these datasets are limited. Recently, we showed that independent component analysis (ICA), a signal deconvolution algorithm, can separate a large bacterial gene expression dataset into groups of co-regulated genes. This previous study focused on data generated by a standardized pipeline and did not address whether ICA extracts the same quantitative co-expression signals across expression profiling platforms. In this study, we show that ICA finds similar co-regulation patterns underlying multiple gene expression datasets and can be used as a tool to integrate and interpret diverse datasets. Using a dataset containing over 3,000 expression profiles, we predicted three new regulons and characterized their activities. Since large, standardized expression datasets only exist for a few bacterial strains, these results broaden the possible applications of this tool to better understand transcriptional regulation across a wide range of microbes.
Collapse
|
8
|
Pleione: A tool for statistical and multi-objective calibration of Rule-based models. Sci Rep 2019; 9:15104. [PMID: 31641245 PMCID: PMC6805871 DOI: 10.1038/s41598-019-51546-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Accepted: 09/24/2019] [Indexed: 11/17/2022] Open
Abstract
Mathematical models based on Ordinary Differential Equations (ODEs) are frequently used to describe and simulate biological systems. Nevertheless, such models are often difficult to understand. Unlike ODE models, Rule-Based Models (RBMs) utilise formal language to describe reactions as a cumulative number of statements that are easier to understand and correct. They are also gaining popularity because of their conciseness and simulation flexibility. However, RBMs generally lack tools to perform further analysis that requires simulation. This situation arises because exact and approximate simulations are computationally intensive. Translating RBMs into ODEs is commonly used to reduce simulation time, but this technique may be prohibitive due to combinatorial explosion. Here, we present the software called Pleione to calibrate RBMs. Parameter calibration is essential given the incomplete experimental determination of reaction rates and the goal of using models to reproduce experimental data. The software distributes stochastic simulations and calculations and incorporates equivalence tests to determine the fitness of RBMs compared with data. The primary features of Pleione were thoroughly tested on a model of gene regulation in Escherichia coli. Pleione yielded satisfactory results regarding calculation time and error reduction for multiple simulators, models, parameter search strategies, and computing infrastructures.
Collapse
|
9
|
|
10
|
Population collapse and adaptive rescue during long‐term chemostat fermentation. Biotechnol Bioeng 2019; 116:693-703. [DOI: 10.1002/bit.26898] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Revised: 11/02/2018] [Accepted: 12/06/2018] [Indexed: 11/09/2022]
|
11
|
Predicting the evolution of Escherichia coli by a data-driven approach. Nat Commun 2018; 9:3562. [PMID: 30177705 PMCID: PMC6120903 DOI: 10.1038/s41467-018-05807-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Accepted: 06/12/2018] [Indexed: 12/31/2022] Open
Abstract
A tantalizing question in evolutionary biology is whether evolution can be predicted from past experiences. To address this question, we created a coherent compendium of more than 15,000 mutation events for the bacterium Escherichia coli under 178 distinct environmental settings. Compendium analysis provides a comprehensive view of the explored environments, mutation hotspots and mutation co-occurrence. While the mutations shared across all replicates decrease with the number of replicates, our results argue that the pairwise overlapping ratio remains the same, regardless of the number of replicates. An ensemble of predictors trained on the mutation compendium and tested in forward validation over 35 evolution replicates achieves a 49.2 ± 5.8% (mean ± std) precision and 34.5 ± 5.7% recall in predicting mutation targets. This work demonstrates how integrated datasets can be harnessed to create predictive models of evolution at a gene level and elucidate the effect of evolutionary processes in well-defined environments. How reproducible evolutionary processes are remains an important question in evolutionary biology. Here, the authors compile a compendium of more than 15,000 mutation events for Escherichia coli under 178 distinct environmental settings, and develop an ensemble of predictors to predict evolution at a gene level.
Collapse
|
12
|
Microbial Forensics: Bioterrorism and Biocrime. BIOMEDICAL SCIENCE LETTERS 2018; 24:55-63. [DOI: 10.15616/bsl.2018.24.2.55] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Revised: 03/16/2018] [Accepted: 05/29/2018] [Indexed: 09/01/2023]
|
13
|
Effects of preservation method on canine ( Canis lupus familiaris) fecal microbiota. PeerJ 2018; 6:e4827. [PMID: 29844978 PMCID: PMC5970549 DOI: 10.7717/peerj.4827] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Accepted: 04/30/2018] [Indexed: 12/14/2022] Open
Abstract
Studies involving gut microbiome analysis play an increasing role in the evaluation of health and disease in humans and animals alike. Fecal sampling methods for DNA preservation in laboratory, clinical, and field settings can greatly influence inferences of microbial composition and diversity, but are often inconsistent and under-investigated between studies. Many laboratories have utilized either temperature control or preservation buffers for optimization of DNA preservation, but few studies have evaluated the effects of combining both methods to preserve fecal microbiota. To determine the optimal method for fecal DNA preservation, we collected fecal samples from one canine donor and stored aliquots in RNAlater, 70% ethanol, 50:50 glycerol:PBS, or without buffer at 25 °C, 4 °C, and −80 °C. Fecal DNA was extracted, quantified, and 16S rRNA gene analysis performed on Days 0, 7, 14, and 56 to evaluate changes in DNA concentration, purity, and bacterial diversity and composition over time. We detected overall effects on bacterial community of storage buffer (F-value = 6.87, DF = 3, P < 0.001), storage temperature (F-value=1.77, DF = 3, P = 0.037), and duration of sample storage (F-value = 3.68, DF = 3, P < 0.001). Changes in bacterial composition were observed in samples stored in −80 °C without buffer, a commonly used method for fecal DNA storage, suggesting that simply freezing samples may be suboptimal for bacterial analysis. Fecal preservation with 70% ethanol and RNAlater closely resembled that of fresh samples, though RNAlater yielded significantly lower DNA concentrations (DF = 8.57, P < 0.001). Although bacterial composition varied with temperature and buffer storage, 70% ethanol was the best method for preserving bacterial DNA in canine feces, yielding the highest DNA concentration and minimal changes in bacterial diversity and composition. The differences observed between samples highlight the need to consider optimized post-collection methods in microbiome research.
Collapse
|
14
|
Abstract
We provide an overview of opportunities and challenges in multi-omics predictive analytics with particular emphasis on data integration and machine learning methods.
Collapse
|
15
|
Hiding in Plain Sight: Mining Bacterial Species Records for Phenotypic Trait Information. mSphere 2017; 2:mSphere00237-17. [PMID: 28776041 PMCID: PMC5541158 DOI: 10.1128/msphere.00237-17] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 07/17/2017] [Indexed: 01/01/2023] Open
Abstract
Cultivation in the laboratory is key for understanding the phenotypic characteristics, growth requirements, metabolism, and environmental preferences of bacteria. However, oftentimes, phenotypic information is not easily accessible. Here, we compiled phenotypic and environmental tolerance information for >5,000 bacterial strains described in the International Journal of Systematic and Evolutionary Microbiology (IJSEM). We demonstrate how this database can be used to link bacterial taxonomy, phylogeny, or specific genes to measured phenotypic traits and environmental preferences. The phenotypic database can be freely accessed (https://doi.org/10.6084/m9.figshare.4272392), and we have included instructions for researchers interested in adding new entries or curating existing ones. Cultivation in the laboratory is essential for understanding the phenotypic characteristics and environmental preferences of bacteria. However, basic phenotypic information is not readily accessible. Here, we compiled phenotypic and environmental tolerance information for >5,000 bacterial strains described in the International Journal of Systematic and Evolutionary Microbiology (IJSEM) with all information made publicly available in an updatable database. Although the data span 23 different bacterial phyla, most entries described aerobic, mesophilic, neutrophilic strains from Proteobacteria (mainly Alpha- and Gammaproteobacteria), Actinobacteria, Firmicutes, and Bacteroidetes isolated from soils, marine habitats, and plants. Most of the routinely measured traits tended to show a significant phylogenetic signal, although this signal was weak for environmental preferences. We demonstrated how this database could be used to link genomic attributes to differences in pH and salinity optima. We found that adaptations to high salinity or high-pH conditions are related to cell surface transporter genes, along with previously uncharacterized genes that might play a role in regulating environmental tolerances. Together, this work highlights the utility of this database for associating bacterial taxonomy, phylogeny, or specific genes to measured phenotypic traits and emphasizes the need for more comprehensive and consistent measurements of traits across a broader diversity of bacteria. IMPORTANCE Cultivation in the laboratory is key for understanding the phenotypic characteristics, growth requirements, metabolism, and environmental preferences of bacteria. However, oftentimes, phenotypic information is not easily accessible. Here, we compiled phenotypic and environmental tolerance information for >5,000 bacterial strains described in the International Journal of Systematic and Evolutionary Microbiology (IJSEM). We demonstrate how this database can be used to link bacterial taxonomy, phylogeny, or specific genes to measured phenotypic traits and environmental preferences. The phenotypic database can be freely accessed (https://doi.org/10.6084/m9.figshare.4272392), and we have included instructions for researchers interested in adding new entries or curating existing ones.
Collapse
|
16
|
Abstract
Networks have become instrumental in deciphering how information is processed and transferred within systems in almost every scientific field today. Nearly all network analyses, however, have relied on humans to devise structural features of networks believed to be most discriminative for an application. We present a framework for comparing and classifying networks without human-crafted features using deep learning. After training, autoencoders contain hidden units that encode a robust structural vocabulary for succinctly describing graphs. We use this feature vocabulary to tackle several network mining problems and find improved predictive performance versus many popular features used today. These problems include uncovering growth mechanisms driving the evolution of networks, predicting protein network fragility, and identifying environmental niches for metabolic networks. Deep learning offers a principled approach for mining complex networks and tackling graph-theoretic problems.
Collapse
|
17
|
Correction: Microbial Forensics: Predicting Phenotypic Characteristics and Environmental Conditions from Large-Scale Gene Expression Profiles. PLoS Comput Biol 2015; 11:e1004617. [PMID: 26588851 PMCID: PMC4654482 DOI: 10.1371/journal.pcbi.1004617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
18
|
Abstract
Organisms from all domains of life use gene regulation networks to control cell growth, identity, function, and responses to environmental challenges. Although accurate global regulatory models would provide critical evolutionary and functional insights, they remain incomplete, even for the best studied organisms. Efforts to build comprehensive networks are confounded by challenges including network scale, degree of connectivity, complexity of organism–environment interactions, and difficulty of estimating the activity of regulatory factors. Taking advantage of the large number of known regulatory interactions in Bacillus subtilis and two transcriptomics datasets (including one with 38 separate experiments collected specifically for this study), we use a new combination of network component analysis and model selection to simultaneously estimate transcription factor activities and learn a substantially expanded transcriptional regulatory network for this bacterium. In total, we predict 2,258 novel regulatory interactions and recall 74% of the previously known interactions. We obtained experimental support for 391 (out of 635 evaluated) novel regulatory edges (62% accuracy), thus significantly increasing our understanding of various cell processes, such as spore formation.
Collapse
|
19
|
Abstract
In computer-aided biological design, the trifecta of characterized part libraries, accurate models and optimal design parameters is crucial for producing reliable designs. As the number of parts and model complexity increase, however, it becomes exponentially more difficult for any optimization method to search the solution space, hence creating a trade-off that hampers efficient design. To address this issue, we present a hierarchical computer-aided design architecture that uses a two-step approach for biological design. First, a simple model of low computational complexity is used to predict circuit behavior and assess candidate circuit branches through branch-and-bound methods. Then, a complex, nonlinear circuit model is used for a fine-grained search of the reduced solution space, thus achieving more accurate results. Evaluation with a benchmark of 11 circuits and a library of 102 experimental designs with known characterization parameters demonstrates a speed-up of 3 orders of magnitude when compared to other design methods that provide optimality guarantees.
Collapse
|