1
|
Lux MW, Strychalski EA, Vora GJ. Advancing reproducibility can ease the 'hard truths' of synthetic biology. Synth Biol (Oxf) 2023; 8:ysad014. [PMID: 38022744 PMCID: PMC10640854 DOI: 10.1093/synbio/ysad014] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 07/26/2023] [Accepted: 10/04/2023] [Indexed: 12/01/2023] Open
Abstract
Reproducibility has been identified as an outstanding challenge in science, and the field of synthetic biology is no exception. Meeting this challenge is critical to allow the transformative technological capabilities emerging from this field to reach their full potential to benefit the society. We discuss the current state of reproducibility in synthetic biology and how improvements can address some of the central shortcomings in the field. We argue that the successful adoption of reproducibility as a routine aspect of research and development requires commitment spanning researchers and relevant institutions via education, incentivization and investment in related infrastructure. The urgency of this topic pervades synthetic biology as it strives to advance fundamental insights and unlock new capabilities for safe, secure and scalable applications of biotechnology. Graphical Abstract.
Collapse
Affiliation(s)
- Matthew W Lux
- Research & Operations Directorate, U.S. Army Combat Capabilities Development Command Chemical Biological Center, APG, MD 21010, USA
| | - Elizabeth A Strychalski
- Cellular Engineering Group, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA
| | - Gary J Vora
- Center for Bio/Molecular Science & Engineering, U.S. Naval Research Laboratory, Washington, DC 20375, USA
| |
Collapse
|
2
|
Pandey A, Rodriguez ML, Poole W, Murray RM. Characterization of Integrase and Excisionase Activity in a Cell-Free Protein Expression System Using a Modeling and Analysis Pipeline. ACS Synth Biol 2023; 12:511-523. [PMID: 36715625 DOI: 10.1021/acssynbio.2c00534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
We present a full-stack modeling, analysis, and parameter identification pipeline to guide the modeling and design of biological systems starting from specifications to circuit implementations and parametrizations. We demonstrate this pipeline by characterizing the integrase and excisionase activity in a cell-free protein expression system. We build on existing Python tools─BioCRNpyler, AutoReduce, and Bioscrape─to create this pipeline. For enzyme-mediated DNA recombination in a cell-free system, we create detailed chemical reaction network models from simple high-level descriptions of the biological circuits and their context using BioCRNpyler. We use Bioscrape to show that the output of the detailed model is sensitive to many parameters. However, parameter identification is infeasible for this high-dimensional model; hence, we use AutoReduce to automatically obtain reduced models that have fewer parameters. This results in a hierarchy of reduced models under different assumptions to finally arrive at a minimal ODE model for each circuit. Then, we run sensitivity analysis-guided Bayesian inference using Bioscrape for each circuit to identify the model parameters. This process allows us to quantify integrase and excisionase activity in cell extracts enabling complex-circuit designs that depend on accurate control over protein expression levels through DNA recombination. The automated pipeline presented in this paper opens up a new approach to complex circuit design, modeling, reduction, and parametrization.
Collapse
Affiliation(s)
- Ayush Pandey
- Control and Dynamical Systems, California Institute of Technology, Pasadena, California91125, United States
| | - Makena L Rodriguez
- Biology and Biological Engineering, California Institute of Technology, Pasadena, California91125, United States
| | - William Poole
- Altos Laboratories, Redwood City, California94065, United States
| | - Richard M Murray
- Control and Dynamical Systems, California Institute of Technology, Pasadena, California91125, United States.,Biology and Biological Engineering, California Institute of Technology, Pasadena, California91125, United States
| |
Collapse
|
3
|
Garcia BJ, Urrutia J, Zheng G, Becker D, Corbet C, Maschhoff P, Cristofaro A, Gaffney N, Vaughn M, Saxena U, Chen YP, Gordon DB, Eslami M. A toolkit for enhanced reproducibility of RNASeq analysis for synthetic biologists. SYNTHETIC BIOLOGY (OXFORD, ENGLAND) 2022; 7:ysac012. [PMID: 36035514 PMCID: PMC9408027 DOI: 10.1093/synbio/ysac012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 06/17/2022] [Accepted: 08/22/2022] [Indexed: 11/13/2022]
Abstract
Sequencing technologies, in particular RNASeq, have become critical tools in the design, build, test and learn cycle of synthetic biology. They provide a better understanding of synthetic designs, and they help identify ways to improve and select designs. While these data are beneficial to design, their collection and analysis is a complex, multistep process that has implications on both discovery and reproducibility of experiments. Additionally, tool parameters, experimental metadata, normalization of data and standardization of file formats present challenges that are computationally intensive. This calls for high-throughput pipelines expressly designed to handle the combinatorial and longitudinal nature of synthetic biology. In this paper, we present a pipeline to maximize the analytical reproducibility of RNASeq for synthetic biologists. We also explore the impact of reproducibility on the validation of machine learning models. We present the design of a pipeline that combines traditional RNASeq data processing tools with structured metadata tracking to allow for the exploration of the combinatorial design in a high-throughput and reproducible manner. We then demonstrate utility via two different experiments: a control comparison experiment and a machine learning model experiment. The first experiment compares datasets collected from identical biological controls across multiple days for two different organisms. It shows that a reproducible experimental protocol for one organism does not guarantee reproducibility in another. The second experiment quantifies the differences in experimental runs from multiple perspectives. It shows that the lack of reproducibility from these different perspectives can place an upper bound on the validation of machine learning models trained on RNASeq data.
Graphical Abstract
Collapse
Affiliation(s)
- Benjamin J Garcia
- Department of Biological Engineering, Synthetic Biology Center, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Joshua Urrutia
- Texas Advanced Computing Center, University of Texas at Austin, Austin, TX, USA
| | | | | | | | | | - Alexander Cristofaro
- Department of Biological Engineering, Synthetic Biology Center, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Niall Gaffney
- Texas Advanced Computing Center, University of Texas at Austin, Austin, TX, USA
| | - Matthew Vaughn
- Texas Advanced Computing Center, University of Texas at Austin, Austin, TX, USA
| | - Uma Saxena
- Department of Biological Engineering, Synthetic Biology Center, Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | - D Benjamin Gordon
- Department of Biological Engineering, Synthetic Biology Center, Massachusetts Institute of Technology, Cambridge, MA, USA
| | | |
Collapse
|