1
|
Wang X, Rai N, Merchel Piovesan Pereira B, Eetemadi A, Tagkopoulos I. Accelerated knowledge discovery from omics data by optimal experimental design. Nat Commun 2020; 11:5026. [PMID: 33024104 PMCID: PMC7538421 DOI: 10.1038/s41467-020-18785-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Accepted: 08/27/2020] [Indexed: 12/15/2022] Open
Abstract
How to design experiments that accelerate knowledge discovery on complex biological landscapes remains a tantalizing question. We present an optimal experimental design method (coined OPEX) to identify informative omics experiments using machine learning models for both experimental space exploration and model training. OPEX-guided exploration of Escherichia coli’s populations exposed to biocide and antibiotic combinations lead to more accurate predictive models of gene expression with 44% less data. Analysis of the proposed experiments shows that broad exploration of the experimental space followed by fine-tuning emerges as the optimal strategy. Additionally, analysis of the experimental data reveals 29 cases of cross-stress protection and 4 cases of cross-stress vulnerability. Further validation reveals the central role of chaperones, stress response proteins and transport pumps in cross-stress exposure. This work demonstrates how active learning can be used to guide omics data collection for training predictive models, making evidence-driven decisions and accelerating knowledge discovery in life sciences. How to design experiments that accelerate knowledge discovery on complex biological landscapes remains a tantalizing question. Here, the authors present OPEX, an optimal experimental design method to identify informative omics experiments for both experimental space exploration and model training.
Collapse
Affiliation(s)
- Xiaokang Wang
- Department of Biomedical Engineering, University of California, Davis, CA, 95616, USA.,Genome Center, University of California, Davis, CA, 95616, USA
| | - Navneet Rai
- Genome Center, University of California, Davis, CA, 95616, USA.,Department of Computer Science, University of California, Davis, CA, 95616, USA
| | - Beatriz Merchel Piovesan Pereira
- Genome Center, University of California, Davis, CA, 95616, USA.,Microbiology Graduate Group, University of California, Davis, CA, 95616, USA
| | - Ameen Eetemadi
- Genome Center, University of California, Davis, CA, 95616, USA.,Department of Computer Science, University of California, Davis, CA, 95616, USA
| | - Ilias Tagkopoulos
- Genome Center, University of California, Davis, CA, 95616, USA. .,Department of Computer Science, University of California, Davis, CA, 95616, USA.
| |
Collapse
|
2
|
Pratapa A, Adames N, Kraikivski P, Franzese N, Tyson JJ, Peccoud J, Murali TM. CrossPlan: systematic planning of genetic crosses to validate mathematical models. Bioinformatics 2019; 34:2237-2244. [PMID: 29432533 DOI: 10.1093/bioinformatics/bty072] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 02/07/2018] [Indexed: 12/27/2022] Open
Abstract
Motivation Mathematical models of cellular processes can systematically predict the phenotypes of novel combinations of multi-gene mutations. Searching for informative predictions and prioritizing them for experimental validation is challenging since the number of possible combinations grows exponentially in the number of mutations. Moreover, keeping track of the crosses needed to make new mutants and planning sequences of experiments is unmanageable when the experimenter is deluged by hundreds of potentially informative predictions to test. Results We present CrossPlan, a novel methodology for systematically planning genetic crosses to make a set of target mutants from a set of source mutants. We base our approach on a generic experimental workflow used in performing genetic crosses in budding yeast. We prove that the CrossPlan problem is NP-complete. We develop an integer-linear-program (ILP) to maximize the number of target mutants that we can make under certain experimental constraints. We apply our method to a comprehensive mathematical model of the protein regulatory network controlling cell division in budding yeast. We also extend our solution to incorporate other experimental conditions such as a delay factor that decides the availability of a mutant and genetic markers to confirm gene deletions. The experimental flow that underlies our work is quite generic and our ILP-based algorithm is easy to modify. Hence, our framework should be relevant in plant and animal systems as well. Availability and implementation CrossPlan code is freely available under GNU General Public Licence v3.0 at https://github.com/Murali-group/crossplan. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Aditya Pratapa
- Department of Computer Science, Virginia Tech, Blacksburg, USA
| | - Neil Adames
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, USA
| | - Pavel Kraikivski
- Department of Biological Sciences, Virginia Tech, Blacksburg, USA
| | | | - John J Tyson
- Department of Biological Sciences, Virginia Tech, Blacksburg, USA
| | - Jean Peccoud
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, USA
| |
Collapse
|
3
|
Determining Relative Dynamic Stability of Cell States Using Boolean Network Model. Sci Rep 2018; 8:12077. [PMID: 30104572 PMCID: PMC6089891 DOI: 10.1038/s41598-018-30544-0] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Accepted: 08/02/2018] [Indexed: 01/05/2023] Open
Abstract
Cell state transition is at the core of biological processes in metazoan, which includes cell differentiation, epithelial-to-mesenchymal transition (EMT) and cell reprogramming. In these cases, it is important to understand the molecular mechanism of cellular stability and how the transitions happen between different cell states, which is controlled by a gene regulatory network (GRN) hard-wired in the genome. Here we use Boolean modeling of GRN to study the cell state transition of EMT and systematically compare four available methods to calculate the cellular stability of three cell states in EMT in both normal and genetically mutated cases. The results produced from four methods generally agree but do not totally agree with each other. We show that distribution of one-degree neighborhood of cell states, which are the nearest states by Hamming distance, causes the difference among the methods. From that, we propose a new method based on one-degree neighborhood, which is the simplest one and agrees with other methods to estimate the cellular stability in all scenarios of our EMT model. This new method will help the researchers in the field of cell differentiation and cell reprogramming to calculate cellular stability using Boolean model, and then rationally design their experimental protocols to manipulate the cell state transition.
Collapse
|
4
|
Awdeh A, Phenix H, Karn M, Perkins TJ. Dynamics in Epistasis Analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:878-891. [PMID: 28092574 DOI: 10.1109/tcbb.2017.2653110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Finding regulatory relationships between genes, including the direction and nature of influence between them, is a fundamental challenge in the field of molecular genetics. One classical approach to this problem is epistasis analysis. Broadly speaking, epistasis analysis infers the regulatory relationships between a pair of genes in a genetic pathway by considering the patterns of change in an observable trait resulting from single and double deletion of genes. While classical epistasis analysis has yielded deep insights on numerous genetic pathways, it is not without limitations. Here, we explore the possibility of dynamic epistasis analysis, in which, in addition to performing genetic perturbations of a pathway, we drive the pathway by a time-varying upstream signal. We explore the theoretical power of dynamical epistasis analysis by conducting an identifiability analysis of Boolean models of genetic pathways, comparing static and dynamic approaches. We find that even relatively simple input dynamics greatly increases the power of epistasis analysis to discriminate alternative network structures. Further, we explore the question of experiment design, and show that a subset of short time-varying signals, which we call dynamic primitives, allow maximum discriminative power with a reduced number of experiments.
Collapse
|
5
|
Keef E, Zhang LA, Swigon D, Urbano A, Ermentrout GB, Matuszewski M, Toapanta FR, Ross TM, Parker RS, Clermont G. Discrete Dynamical Modeling of Influenza Virus Infection Suggests Age-Dependent Differences in Immunity. J Virol 2017; 91:e00395-17. [PMID: 28904202 PMCID: PMC5686742 DOI: 10.1128/jvi.00395-17] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2017] [Accepted: 08/23/2017] [Indexed: 01/09/2023] Open
Abstract
Immunosenescence, an age-related decline in immune function, is a major contributor to morbidity and mortality in the elderly. Older hosts exhibit a delayed onset of immunity and prolonged inflammation after an infection, leading to excess damage and a greater likelihood of death. Our study applies a rule-based model to infer which components of the immune response are most changed in an aged host. Two groups of BALB/c mice (aged 12 to 16 weeks and 72 to 76 weeks) were infected with 2 inocula: a survivable dose of 50 PFU and a lethal dose of 500 PFU. Data were measured at 10 points over 19 days in the sublethal case and at 6 points over 7 days in the lethal case, after which all mice had died. Data varied primarily in the onset of immunity, particularly the inflammatory response, which led to a 2-day delay in the clearance of the virus from older hosts in the sublethal cohort. We developed a Boolean model to describe the interactions between the virus and 21 immune components, including cells, chemokines, and cytokines, of innate and adaptive immunity. The model identifies distinct sets of rules for each age group by using Boolean operators to describe the complex series of interactions that activate and deactivate immune components. Our model accurately simulates the immune responses of mice of both ages and with both inocula included in the data (95% accurate for younger mice and 94% accurate for older mice) and shows distinct rule choices for the innate immunity arm of the model between younger and aging mice in response to influenza A virus infection.IMPORTANCE Influenza virus infection causes high morbidity and mortality rates every year, especially in the elderly. The elderly tend to have a delayed onset of many immune responses as well as prolonged inflammatory responses, leading to an overall weakened response to infection. Many of the details of immune mechanisms that change with age are currently not well understood. We present a rule-based model of the intrahost immune response to influenza virus infection. The model is fit to experimental data for young and old mice infected with influenza virus. We generated distinct sets of rules for each age group to capture the temporal differences seen in the immune responses of these mice. These rules describe a network of interactions leading to either clearance of the virus or death of the host, depending on the initial dosage of the virus. Our models clearly demonstrate differences in these two age groups, particularly in the innate immune responses.
Collapse
Affiliation(s)
- Ericka Keef
- Department of Mathematics, Carlow University, Pittsburgh, Pennsylvania, USA
| | - Li Ang Zhang
- Department of Chemical and Petroleum Engineering, Swanson School of Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - David Swigon
- Department of Mathematics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- McGowan Institute for Regenerative Medicine, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, USA
| | - Alisa Urbano
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - G Bard Ermentrout
- Department of Mathematics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Michael Matuszewski
- Department of Chemical and Petroleum Engineering, Swanson School of Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Franklin R Toapanta
- Center for Vaccine Research, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Ted M Ross
- Center for Vaccine Research, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Robert S Parker
- Department of Chemical and Petroleum Engineering, Swanson School of Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- Department of Critical Care Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- Department of Bioengineering, Swanson School of Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- McGowan Institute for Regenerative Medicine, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, USA
| | - Gilles Clermont
- Department of Chemical and Petroleum Engineering, Swanson School of Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- Department of Critical Care Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- Department of Bioengineering, Swanson School of Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- McGowan Institute for Regenerative Medicine, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
6
|
Sverchkov Y, Craven M. A review of active learning approaches to experimental design for uncovering biological networks. PLoS Comput Biol 2017; 13:e1005466. [PMID: 28570593 PMCID: PMC5453429 DOI: 10.1371/journal.pcbi.1005466] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Various types of biological knowledge describe networks of interactions among elementary entities. For example, transcriptional regulatory networks consist of interactions among proteins and genes. Current knowledge about the exact structure of such networks is highly incomplete, and laboratory experiments that manipulate the entities involved are conducted to test hypotheses about these networks. In recent years, various automated approaches to experiment selection have been proposed. Many of these approaches can be characterized as active machine learning algorithms. Active learning is an iterative process in which a model is learned from data, hypotheses are generated from the model to propose informative experiments, and the experiments yield new data that is used to update the model. This review describes the various models, experiment selection strategies, validation techniques, and successful applications described in the literature; highlights common themes and notable distinctions among methods; and identifies likely directions of future research and open problems in the area.
Collapse
Affiliation(s)
- Yuriy Sverchkov
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Mark Craven
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| |
Collapse
|
7
|
Active Interaction Mapping Reveals the Hierarchical Organization of Autophagy. Mol Cell 2017; 65:761-774.e5. [PMID: 28132844 DOI: 10.1016/j.molcel.2016.12.024] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2016] [Revised: 11/21/2016] [Accepted: 12/22/2016] [Indexed: 12/15/2022]
Abstract
We have developed a general progressive procedure, Active Interaction Mapping, to guide assembly of the hierarchy of functions encoding any biological system. Using this process, we assemble an ontology of functions comprising autophagy, a central recycling process implicated in numerous diseases. A first-generation model, built from existing gene networks in Saccharomyces, captures most known autophagy components in broad relation to vesicle transport, cell cycle, and stress response. Systematic analysis identifies synthetic-lethal interactions as most informative for further experiments; consequently, we saturate the model with 156,364 such measurements across autophagy-activating conditions. These targeted interactions provide more information about autophagy than all previous datasets, producing a second-generation ontology of 220 functions. Approximately half are previously unknown; we confirm roles for Gyp1 at the phagophore-assembly site, Atg24 in cargo engulfment, Atg26 in cytoplasm-to-vacuole targeting, and Ssd1, Did4, and others in selective and non-selective autophagy. The procedure and autophagy hierarchy are at http://atgo.ucsd.edu/.
Collapse
|
8
|
Videla S, Konokotina I, Alexopoulos LG, Saez-Rodriguez J, Schaub T, Siegel A, Guziolowski C. Designing Experiments to Discriminate Families of Logic Models. Front Bioeng Biotechnol 2015; 3:131. [PMID: 26389116 PMCID: PMC4560026 DOI: 10.3389/fbioe.2015.00131] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2015] [Accepted: 08/17/2015] [Indexed: 11/13/2022] Open
Abstract
Logic models of signaling pathways are a promising way of building effective in silico functional models of a cell, in particular of signaling pathways. The automated learning of Boolean logic models describing signaling pathways can be achieved by training to phosphoproteomics data, which is particularly useful if it is measured upon different combinations of perturbations in a high-throughput fashion. However, in practice, the number and type of allowed perturbations are not exhaustive. Moreover, experimental data are unavoidably subjected to noise. As a result, the learning process results in a family of feasible logical networks rather than in a single model. This family is composed of logic models implementing different internal wirings for the system and therefore the predictions of experiments from this family may present a significant level of variability, and hence uncertainty. In this paper, we introduce a method based on Answer Set Programming to propose an optimal experimental design that aims to narrow down the variability (in terms of input-output behaviors) within families of logical models learned from experimental data. We study how the fitness with respect to the data can be improved after an optimal selection of signaling perturbations and how we learn optimal logic models with minimal number of experiments. The methods are applied on signaling pathways in human liver cells and phosphoproteomics experimental data. Using 25% of the experiments, we obtained logical models with fitness scores (mean square error) 15% close to the ones obtained using all experiments, illustrating the impact that our approach can have on the design of experiments for efficient model calibration.
Collapse
Affiliation(s)
- Santiago Videla
- UMR 6074 IRISA, CNRS, Campus de Beaulieu , Rennes , France ; Dyliss project, INRIA, Campus de Beaulieu , Rennes , France ; Institut für Informatik, Universität Potsdam , Potsdam , Germany ; LBSI, Fundación Instituto Leloir, CONICET , Buenos Aires , Argentina
| | - Irina Konokotina
- IRCCyN UMR CNRS 6597, École Centrale de Nantes , Nantes , France
| | - Leonidas G Alexopoulos
- Department of Mechanical Engineering, National Technical University of Athens , Athens , Greece
| | - Julio Saez-Rodriguez
- European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton , UK
| | - Torsten Schaub
- Institut für Informatik, Universität Potsdam , Potsdam , Germany
| | - Anne Siegel
- UMR 6074 IRISA, CNRS, Campus de Beaulieu , Rennes , France ; Dyliss project, INRIA, Campus de Beaulieu , Rennes , France
| | | |
Collapse
|