1
|
Wang JY, Xie ZX, Cui YZ, Li BZ, Yuan YJ. Artificial design of the genome: from sequences to the 3D structure of chromosomes. Trends Biotechnol 2025; 43:304-317. [PMID: 39299833 DOI: 10.1016/j.tibtech.2024.08.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 07/18/2024] [Accepted: 08/27/2024] [Indexed: 09/22/2024]
Abstract
Genome design is the foundation of genome synthesis, which provides a new platform for deepening our understanding of biological systems by exploring the fundamental components and structure of the genome. Artificial genome designs can endow unnatural genomes with desired functions. We provide a comprehensive overview of genome design principles ranging from DNA sequences to the 3D structure of chromosomes. Furthermore, we highlight applications of genome design in gene expression, genome structure, genome function, and biocontainment, and discuss the potential of artificial intelligence (AI) in genome design.
Collapse
Affiliation(s)
- Jun-Yi Wang
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China; Frontiers Research Institute for Synthetic Biology, Tianjin University, Tianjin 300072, China
| | - Ze-Xiong Xie
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China; Frontiers Research Institute for Synthetic Biology, Tianjin University, Tianjin 300072, China
| | - You-Zhi Cui
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China; Frontiers Research Institute for Synthetic Biology, Tianjin University, Tianjin 300072, China
| | - Bing-Zhi Li
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China; Frontiers Research Institute for Synthetic Biology, Tianjin University, Tianjin 300072, China.
| | - Ying-Jin Yuan
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China; Frontiers Research Institute for Synthetic Biology, Tianjin University, Tianjin 300072, China
| |
Collapse
|
2
|
Yan X, He Q, Geng B, Yang S. Microbial Cell Factories in the Bioeconomy Era: From Discovery to Creation. BIODESIGN RESEARCH 2024; 6:0052. [PMID: 39434802 PMCID: PMC11491672 DOI: 10.34133/bdr.0052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Revised: 09/02/2024] [Accepted: 09/18/2024] [Indexed: 10/23/2024] Open
Abstract
Microbial cell factories (MCFs) are extensively used to produce a wide array of bioproducts, such as bioenergy, biochemical, food, nutrients, and pharmaceuticals, and have been regarded as the "chips" of biomanufacturing that will fuel the emerging bioeconomy era. Biotechnology advances have led to the screening, investigation, and engineering of an increasing number of microorganisms as diverse MCFs, which are the workhorses of biomanufacturing and help develop the bioeconomy. This review briefly summarizes the progress and strategies in the development of robust and efficient MCFs for sustainable and economic biomanufacturing. First, a comprehensive understanding of microbial chassis cells, including accurate genome sequences and corresponding annotations; metabolic and regulatory networks governing substances, energy, physiology, and information; and their similarity and uniqueness compared with those of other microorganisms, is needed. Moreover, the development and application of effective and efficient tools is crucial for engineering both model and nonmodel microbial chassis cells into efficient MCFs, including the identification and characterization of biological parts, as well as the design, synthesis, assembly, editing, and regulation of genes, circuits, and pathways. This review also highlights the necessity of integrating automation and artificial intelligence (AI) with biotechnology to facilitate the development of future customized artificial synthetic MCFs to expedite the industrialization process of biomanufacturing and the bioeconomy.
Collapse
Affiliation(s)
| | | | - Binan Geng
- State Key Laboratory of Biocatalysis and Enzyme Engineering, and School of Life Sciences,
Hubei University, Wuhan 430062, China
| | - Shihui Yang
- State Key Laboratory of Biocatalysis and Enzyme Engineering, and School of Life Sciences,
Hubei University, Wuhan 430062, China
| |
Collapse
|
3
|
Barbero-Aparicio JA, Olivares-Gil A, Díez-Pastor JF, García-Osorio C. Deep learning and support vector machines for transcription start site identification. PeerJ Comput Sci 2023; 9:e1340. [PMID: 37346545 PMCID: PMC10280436 DOI: 10.7717/peerj-cs.1340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Accepted: 03/21/2023] [Indexed: 06/23/2023]
Abstract
Recognizing transcription start sites is key to gene identification. Several approaches have been employed in related problems such as detecting translation initiation sites or promoters, many of the most recent ones based on machine learning. Deep learning methods have been proven to be exceptionally effective for this task, but their use in transcription start site identification has not yet been explored in depth. Also, the very few existing works do not compare their methods to support vector machines (SVMs), the most established technique in this area of study, nor provide the curated dataset used in the study. The reduced amount of published papers in this specific problem could be explained by this lack of datasets. Given that both support vector machines and deep neural networks have been applied in related problems with remarkable results, we compared their performance in transcription start site predictions, concluding that SVMs are computationally much slower, and deep learning methods, specially long short-term memory neural networks (LSTMs), are best suited to work with sequences than SVMs. For such a purpose, we used the reference human genome GRCh38. Additionally, we studied two different aspects related to data processing: the proper way to generate training examples and the imbalanced nature of the data. Furthermore, the generalization performance of the models studied was also tested using the mouse genome, where the LSTM neural network stood out from the rest of the algorithms. To sum up, this article provides an analysis of the best architecture choices in transcription start site identification, as well as a method to generate transcription start site datasets including negative instances on any species available in Ensembl. We found that deep learning methods are better suited than SVMs to solve this problem, being more efficient and better adapted to long sequences and large amounts of data. We also create a transcription start site (TSS) dataset large enough to be used in deep learning experiments.
Collapse
Affiliation(s)
| | - Alicia Olivares-Gil
- Departamento de Ingeniería Informática, Universidad de Burgos, Burgos, Spain
| | - José F. Díez-Pastor
- Departamento de Ingeniería Informática, Universidad de Burgos, Burgos, Spain
| | - César García-Osorio
- Departamento de Ingeniería Informática, Universidad de Burgos, Burgos, Spain
| |
Collapse
|
4
|
Patra P, B R D, Kundu P, Das M, Ghosh A. Recent advances in machine learning applications in metabolic engineering. Biotechnol Adv 2023; 62:108069. [PMID: 36442697 DOI: 10.1016/j.biotechadv.2022.108069] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Revised: 10/18/2022] [Accepted: 11/22/2022] [Indexed: 11/27/2022]
Abstract
Metabolic engineering encompasses several widely-used strategies, which currently hold a high seat in the field of biotechnology when its potential is manifesting through a plethora of research and commercial products with a strong societal impact. The genomic revolution that occurred almost three decades ago has initiated the generation of large omics-datasets which has helped in gaining a better understanding of cellular behavior. The itinerary of metabolic engineering that has occurred based on these large datasets has allowed researchers to gain detailed insights and a reasonable understanding of the intricacies of biosystems. However, the existing trail-and-error approaches for metabolic engineering are laborious and time-intensive when it comes to the production of target compounds with high yields through genetic manipulations in host organisms. Machine learning (ML) coupled with the available metabolic engineering test instances and omics data brings a comprehensive and multidisciplinary approach that enables scientists to evaluate various parameters for effective strain design. This vast amount of biological data should be standardized through knowledge engineering to train different ML models for providing accurate predictions in gene circuits designing, modification of proteins, optimization of bioprocess parameters for scaling up, and screening of hyper-producing robust cell factories. This review briefs on the premise of ML, followed by mentioning various ML methods and algorithms alongside the numerous omics datasets available to train ML models for predicting metabolic outcomes with high-accuracy. The combinative interplay between the ML algorithms and biological datasets through knowledge engineering have guided the recent advancements in applications such as CRISPR/Cas systems, gene circuits, protein engineering, metabolic pathway reconstruction, and bioprocess engineering. Finally, this review addresses the probable challenges of applying ML in metabolic engineering which will guide the researchers toward novel techniques to overcome the limitations.
Collapse
Affiliation(s)
- Pradipta Patra
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Disha B R
- B.M.S College of Engineering, Basavanagudi, Bengaluru, Karnataka 560019, India
| | - Pritam Kundu
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Manali Das
- School of Bioscience, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Amit Ghosh
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India; P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India.
| |
Collapse
|
5
|
Fordjour E, Mensah EO, Hao Y, Yang Y, Liu X, Li Y, Liu CL, Bai Z. Toward improved terpenoids biosynthesis: strategies to enhance the capabilities of cell factories. BIORESOUR BIOPROCESS 2022; 9:6. [PMID: 38647812 PMCID: PMC10992668 DOI: 10.1186/s40643-022-00493-8] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Accepted: 01/04/2022] [Indexed: 02/22/2023] Open
Abstract
Terpenoids form the most diversified class of natural products, which have gained application in the pharmaceutical, food, transportation, and fine and bulk chemical industries. Extraction from naturally occurring sources does not meet industrial demands, whereas chemical synthesis is often associated with poor enantio-selectivity, harsh working conditions, and environmental pollutions. Microbial cell factories come as a suitable replacement. However, designing efficient microbial platforms for isoprenoid synthesis is often a challenging task. This has to do with the cytotoxic effects of pathway intermediates and some end products, instability of expressed pathways, as well as high enzyme promiscuity. Also, the low enzymatic activity of some terpene synthases and prenyltransferases, and the lack of an efficient throughput system to screen improved high-performing strains are bottlenecks in strain development. Metabolic engineering and synthetic biology seek to overcome these issues through the provision of effective synthetic tools. This review sought to provide an in-depth description of novel strategies for improving cell factory performance. We focused on improving transcriptional and translational efficiencies through static and dynamic regulatory elements, enzyme engineering and high-throughput screening strategies, cellular function enhancement through chromosomal integration, metabolite tolerance, and modularization of pathways.
Collapse
Affiliation(s)
- Eric Fordjour
- National Engineering Laboratory for Cereal Fermentation Technology, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China
- Jiangsu Provincial Research Centre for Bioactive Product Processing Technology, Jiangnan University, Wuxi, China
| | - Emmanuel Osei Mensah
- National Engineering Laboratory for Cereal Fermentation Technology, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China
- Jiangsu Provincial Research Centre for Bioactive Product Processing Technology, Jiangnan University, Wuxi, China
| | - Yunpeng Hao
- National Engineering Laboratory for Cereal Fermentation Technology, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China
- Jiangsu Provincial Research Centre for Bioactive Product Processing Technology, Jiangnan University, Wuxi, China
| | - Yankun Yang
- National Engineering Laboratory for Cereal Fermentation Technology, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China
- Jiangsu Provincial Research Centre for Bioactive Product Processing Technology, Jiangnan University, Wuxi, China
| | - Xiuxia Liu
- National Engineering Laboratory for Cereal Fermentation Technology, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China
- Jiangsu Provincial Research Centre for Bioactive Product Processing Technology, Jiangnan University, Wuxi, China
| | - Ye Li
- National Engineering Laboratory for Cereal Fermentation Technology, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China
- Jiangsu Provincial Research Centre for Bioactive Product Processing Technology, Jiangnan University, Wuxi, China
| | - Chun-Li Liu
- National Engineering Laboratory for Cereal Fermentation Technology, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China.
- Jiangsu Provincial Research Centre for Bioactive Product Processing Technology, Jiangnan University, Wuxi, China.
| | - Zhonghu Bai
- National Engineering Laboratory for Cereal Fermentation Technology, Jiangnan University, 1800 Lihu Road, Wuxi, 214122, Jiangsu, China.
- Jiangsu Provincial Research Centre for Bioactive Product Processing Technology, Jiangnan University, Wuxi, China.
| |
Collapse
|
6
|
Zhao M, Yuan Z, Wu L, Zhou S, Deng Y. Precise Prediction of Promoter Strength Based on a De Novo Synthetic Promoter Library Coupled with Machine Learning. ACS Synth Biol 2022; 11:92-102. [PMID: 34927418 DOI: 10.1021/acssynbio.1c00117] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Promoters are one of the most critical regulatory elements controlling metabolic pathways. However, the fast and accurate prediction of promoter strength remains challenging, leading to time- and labor-consuming promoter construction and characterization processes. This dilemma is caused by the lack of a big promoter library that has gradient strengths, broad dynamic ranges, and clear sequence profiles that can be used to train an artificial intelligence model of promoter strength prediction. To overcome this challenge, we constructed and characterized a mutant library of Trc promoters (Ptrc) using 83 rounds of mutation-construction-screening-characterization engineering cycles. After excluding invalid mutation sites, we established a synthetic promoter library that consisted of 3665 different variants, displaying an intensity range of more than two orders of magnitude. The strongest variant was ∼69-fold stronger than the original Ptrc and 1.52-fold stronger than a 1 mM isopropyl-β-d-thiogalactoside-driven PT7 promoter, with an ∼454-fold difference between the strongest and weakest expression levels. Using this synthetic promoter library, different machine learning models were built and optimized to explore the relationships between promoter sequences and transcriptional strength. Finally, our XgBoost model exhibited optimal performance, and we utilized this approach to precisely predict the strength of artificially designed promoter sequences (R2 = 0.88, mean absolute error = 0.15, and Pearson correlation coefficient = 0.94). Our work provides a powerful platform that enables the predictable tuning of promoters to achieve optimal transcriptional strength.
Collapse
Affiliation(s)
- Mei Zhao
- National Engineering Laboratory for Cereal Fermentation Technology (NELCF), Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- School of Food and Biological Engineering, Jiangsu University, 301 Xuefu Road, Zhenjiang, Jiangsu 212013, China
| | - Zhenqi Yuan
- School of Artificial Intelligence and Computer Science, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Longtao Wu
- College of Physics and Optoelectronics, Taiyuan University of Technology, Taiyuan 030024, China
| | - Shenghu Zhou
- National Engineering Laboratory for Cereal Fermentation Technology (NELCF), Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Yu Deng
- National Engineering Laboratory for Cereal Fermentation Technology (NELCF), Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| |
Collapse
|
7
|
Cui X, Ma X, Prather K, Zhou K. Controlling protein expression by using intron-aided promoters in Saccharomyces cerevisiae. Biochem Eng J 2021. [DOI: 10.1016/j.bej.2021.108197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
8
|
Mey F, Clauwaert J, Van Huffel K, Waegeman W, De Mey M. Improving the performance of machine learning models for biotechnology: The quest for deus ex machina. Biotechnol Adv 2021; 53:107858. [PMID: 34695560 DOI: 10.1016/j.biotechadv.2021.107858] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 10/13/2021] [Accepted: 10/14/2021] [Indexed: 11/24/2022]
Abstract
Machine learning is becoming an integral part of the Design-Build-Test-Learn cycle in biotechnology. Machine learning models learn from collected datasets such as omics data and predict a defined outcome, which has led to both production improvements and predictive tools in the field. Robust prediction of the behavior of microbial cell factories and production processes not only greatly increases our understanding of the function of such systems, but also provides significant savings of development time. However, many pitfalls when modeling biological data - bad fit, noisy data, model instability, low data quantity and imbalances in the data - cause models to suffer in their performance. Here we provide an accessible, in-depth analysis on the problems created by these pitfalls, as well as means of their detection and mediation, with a focus on supervised learning. Assessing the state of the art, we show that, currently, in-depth analyses of model performance are often absent and must be improved. This review provides a toolbox for the analysis of model robustness and performance, and simultaneously proposes a standard for the community to facilitate future work. It is further accompanied by an interactive online tutorial on the discussed issues.
Collapse
Affiliation(s)
- Friederike Mey
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Jim Clauwaert
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, 9000 Ghent, Belgium
| | - Kirsten Van Huffel
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Willem Waegeman
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, 9000 Ghent, Belgium
| | - Marjan De Mey
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, 9000 Ghent, Belgium.
| |
Collapse
|
9
|
Liebal UW, Köbbing S, Netze L, Schweidtmann AM, Mitsos A, Blank LM. Insight to Gene Expression From Promoter Libraries With the Machine Learning Workflow Exp2Ipynb. FRONTIERS IN BIOINFORMATICS 2021; 1:747428. [PMID: 36303772 PMCID: PMC9581000 DOI: 10.3389/fbinf.2021.747428] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 09/23/2021] [Indexed: 11/16/2022] Open
Abstract
Metabolic engineering relies on modifying gene expression to regulate protein concentrations and reaction activities. The gene expression is controlled by the promoter sequence, and sequence libraries are used to scan expression activities and to identify correlations between sequence and activity. We introduce a computational workflow called Exp2Ipynb to analyze promoter libraries maximizing information retrieval and promoter design with desired activity. We applied Exp2Ipynb to seven prokaryotic expression libraries to identify optimal experimental design principles. The workflow is open source, available as Jupyter Notebooks and covers the steps to 1) generate a statistical overview to sequence and activity, 2) train machine-learning algorithms, such as random forest, gradient boosting trees and support vector machines, for prediction and extraction of feature importance, 3) evaluate the performance of the estimator, and 4) to design new sequences with a desired activity using numerical optimization. The workflow can perform regression or classification on multiple promoter libraries, across species or reporter proteins. The most accurate predictions in the sample libraries were achieved when the promoters in the library were recognized by a single sigma factor and a unique reporter system. The prediction confidence mostly depends on sample size and sequence diversity, and we present a relationship to estimate their respective effects. The workflow can be adapted to process sequence libraries from other expression-related problems and increase insight to the growing application of high-throughput experiments, providing support for efficient strain engineering.
Collapse
Affiliation(s)
- Ulf W. Liebal
- iAMB-Institute of Applied Microbiology, ABBT, RWTH Aachen University, Aachen, Germany
| | - Sebastian Köbbing
- iAMB-Institute of Applied Microbiology, ABBT, RWTH Aachen University, Aachen, Germany
| | - Linus Netze
- AVT-Process Systems Engineering, RWTH Aachen University, Aachen, Germany
| | - Artur M. Schweidtmann
- Department of Chemical Engineering, Delft University of Technology, Delft, Netherlands
| | - Alexander Mitsos
- AVT-Process Systems Engineering, RWTH Aachen University, Aachen, Germany
| | - Lars M. Blank
- iAMB-Institute of Applied Microbiology, ABBT, RWTH Aachen University, Aachen, Germany
| |
Collapse
|
10
|
Mao N, Aggarwal N, Poh CL, Cho BK, Kondo A, Liu C, Yew WS, Chang MW. Future trends in synthetic biology in Asia. ADVANCED GENETICS (HOBOKEN, N.J.) 2021; 2:e10038. [PMID: 36618442 PMCID: PMC9744534 DOI: 10.1002/ggn2.10038] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 01/10/2021] [Accepted: 01/21/2021] [Indexed: 05/06/2023]
Abstract
Synthetic biology research and technology translation has garnered increasing interest from the governments and private investors in Asia, where the technology has great potential in driving a sustainable bio-based economy. This Perspective reviews the latest developments in the key enabling technologies of synthetic biology and its application in bio-manufacturing, medicine, food and agriculture in Asia. Asia-centric strengths in synthetic biology to grow the bio-based economy, such as advances in genome editing and the presence of biofoundries combined with the availability of natural resources and vast markets, are also highlighted. The potential barriers to the sustainable development of the field, including inadequate infrastructure and policies, with suggestions to overcome these by building public-private partnerships, more effective multi-lateral collaborations and well-developed governance framework, are presented. Finally, the roles of technology, education and regulation in mitigating potential biosecurity risks are examined. Through these discussions, stakeholders from different groups, including academia, industry and government, are expectantly better positioned to contribute towards the establishment of innovation and bio-economy hubs in Asia.
Collapse
Affiliation(s)
- Ning Mao
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI)National University of SingaporeSingaporeSingapore
| | - Nikhil Aggarwal
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI)National University of SingaporeSingaporeSingapore
- Synthetic Biology Translational Research Program and Department of Biochemistry, Yong Loo Ling School of MedicineNational University of SingaporeSingaporeSingapore
| | - Chueh Loo Poh
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI)National University of SingaporeSingaporeSingapore
- Department of Biomedical EngineeringNational University of SingaporeSingaporeSingapore
| | - Byung Kwan Cho
- Department of Biological Sciences, and KI for the BioCenturyKorea Advanced Institute of Science and TechnologyDaejeonSouth Korea
| | - Akihiko Kondo
- Graduate School of Science, Technology and Innovation, and Engineering Biology Research CenterKobe UniversityKobeJapan
| | - Chenli Liu
- CAS Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced TechnologyChinese Academy of SciencesShenzhenChina
| | - Wen Shan Yew
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI)National University of SingaporeSingaporeSingapore
- Synthetic Biology Translational Research Program and Department of Biochemistry, Yong Loo Ling School of MedicineNational University of SingaporeSingaporeSingapore
| | - Matthew Wook Chang
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI)National University of SingaporeSingaporeSingapore
- Synthetic Biology Translational Research Program and Department of Biochemistry, Yong Loo Ling School of MedicineNational University of SingaporeSingaporeSingapore
- Department of Biomedical EngineeringNational University of SingaporeSingaporeSingapore
| |
Collapse
|
11
|
Van Brempt M, Clauwaert J, Mey F, Stock M, Maertens J, Waegeman W, De Mey M. Predictive design of sigma factor-specific promoters. Nat Commun 2020; 11:5822. [PMID: 33199691 PMCID: PMC7670410 DOI: 10.1038/s41467-020-19446-w] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 10/13/2020] [Indexed: 02/07/2023] Open
Abstract
To engineer synthetic gene circuits, molecular building blocks are developed which can modulate gene expression without interference, mutually or with the host's cell machinery. As the complexity of gene circuits increases, automated design tools and tailored building blocks to ensure perfect tuning of all components in the network are required. Despite the efforts to develop prediction tools that allow forward engineering of promoter transcription initiation frequency (TIF), such a tool is still lacking. Here, we use promoter libraries of E. coli sigma factor 70 (σ70)- and B. subtilis σB-, σF- and σW-dependent promoters to construct prediction models, capable of both predicting promoter TIF and orthogonality of the σ-specific promoters. This is achieved by training a convolutional neural network with high-throughput DNA sequencing data from fluorescence-activated cell sorted promoter libraries. This model functions as the base of the online promoter design tool (ProD), providing tailored promoters for tailored genetic systems.
Collapse
Affiliation(s)
- Maarten Van Brempt
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, 9000, Ghent, Belgium
| | - Jim Clauwaert
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, 9000, Ghent, Belgium
| | - Friederike Mey
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, 9000, Ghent, Belgium
| | - Michiel Stock
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, 9000, Ghent, Belgium
| | - Jo Maertens
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, 9000, Ghent, Belgium
| | - Willem Waegeman
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, 9000, Ghent, Belgium
| | - Marjan De Mey
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, 9000, Ghent, Belgium.
| |
Collapse
|
12
|
Wang Y, Wang H, Wei L, Li S, Liu L, Wang X. Synthetic promoter design in Escherichia coli based on a deep generative network. Nucleic Acids Res 2020; 48:6403-6412. [PMID: 32424410 PMCID: PMC7337522 DOI: 10.1093/nar/gkaa325] [Citation(s) in RCA: 108] [Impact Index Per Article: 21.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2019] [Revised: 04/05/2020] [Accepted: 04/22/2020] [Indexed: 01/11/2023] Open
Abstract
Promoter design remains one of the most important considerations in metabolic engineering and synthetic biology applications. Theoretically, there are 450 possible sequences for a 50-nt promoter, of which naturally occurring promoters make up only a small subset. To explore the vast number of potential sequences, we report a novel AI-based framework for de novo promoter design in Escherichia coli. The model, which was guided by sequence features learned from natural promoters, could capture interactions between nucleotides at different positions and design novel synthetic promoters in silico. We combined a deep generative model that guides the search for artificial sequences with a predictive model to preselect the most promising promoters. The AI-designed promoters were optimized based on the promoter activity in E. coli and the predictive model. After two rounds of optimization, up to 70.8% of the AI-designed promoters were experimentally demonstrated to be functional, and few of them shared significant sequence similarity with the E. coli genome. Our work provided an end-to-end approach to the de novo design of novel promoter elements, indicating the potential to apply deep learning methods to de novo genetic element design.
Collapse
Affiliation(s)
- Ye Wang
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing 100084, China
| | - Haochen Wang
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing 100084, China
| | - Lei Wei
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing 100084, China
| | - Shuailin Li
- School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Liyang Liu
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaowo Wang
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing 100084, China
| |
Collapse
|
13
|
Volk MJ, Lourentzou I, Mishra S, Vo LT, Zhai C, Zhao H. Biosystems Design by Machine Learning. ACS Synth Biol 2020; 9:1514-1533. [PMID: 32485108 DOI: 10.1021/acssynbio.0c00129] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Biosystems such as enzymes, pathways, and whole cells have been increasingly explored for biotechnological applications. However, the intricate connectivity and resulting complexity of biosystems poses a major hurdle in designing biosystems with desirable features. As -omics and other high throughput technologies have been rapidly developed, the promise of applying machine learning (ML) techniques in biosystems design has started to become a reality. ML models enable the identification of patterns within complicated biological data across multiple scales of analysis and can augment biosystems design applications by predicting new candidates for optimized performance. ML is being used at every stage of biosystems design to help find nonobvious engineering solutions with fewer design iterations. In this review, we first describe commonly used models and modeling paradigms within ML. We then discuss some applications of these models that have already shown success in biotechnological applications. Moreover, we discuss successful applications at all scales of biosystems design, including nucleic acids, genetic circuits, proteins, pathways, genomes, and bioprocesses. Finally, we discuss some limitations of these methods and potential solutions as well as prospects of the combination of ML and biosystems design.
Collapse
|
14
|
Jervis AJ, Carbonell P, Vinaixa M, Dunstan MS, Hollywood KA, Robinson CJ, Rattray NJW, Yan C, Swainston N, Currin A, Sung R, Toogood H, Taylor S, Faulon JL, Breitling R, Takano E, Scrutton NS. Machine Learning of Designed Translational Control Allows Predictive Pathway Optimization in Escherichia coli. ACS Synth Biol 2019; 8:127-136. [PMID: 30563328 DOI: 10.1021/acssynbio.8b00398] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
The field of synthetic biology aims to make the design of biological systems predictable, shrinking the huge design space to practical numbers for testing. When designing microbial cell factories, most optimization efforts have focused on enzyme and strain selection/engineering, pathway regulation, and process development. In silico tools for the predictive design of bacterial ribosome binding sites (RBSs) and RBS libraries now allow translational tuning of biochemical pathways; however, methods for predicting optimal RBS combinations in multigene pathways are desirable. Here we present the implementation of machine learning algorithms to model the RBS sequence-phenotype relationship from representative subsets of large combinatorial RBS libraries allowing the accurate prediction of optimal high-producers. Applied to a recombinant monoterpenoid production pathway in Escherichia coli, our approach was able to boost production titers by over 60% when screening under 3% of a library. To facilitate library screening, a multiwell plate fermentation procedure was developed, allowing increased screening throughput with sufficient resolution to discriminate between high and low producers. High producers from one library did not translate during scale-up, but the reduced screening requirements allowed rapid rescreening at the larger scale. This methodology is potentially compatible with any biochemical pathway and provides a powerful tool toward predictive design of bacterial production chassis.
Collapse
Affiliation(s)
- Adrian J. Jervis
- Manchester Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology and School of Chemistry, University of Manchester, Manchester M1 7DN, United Kingdom
| | - Pablo Carbonell
- Manchester Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology and School of Chemistry, University of Manchester, Manchester M1 7DN, United Kingdom
| | - Maria Vinaixa
- Manchester Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology and School of Chemistry, University of Manchester, Manchester M1 7DN, United Kingdom
| | - Mark S. Dunstan
- Manchester Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology and School of Chemistry, University of Manchester, Manchester M1 7DN, United Kingdom
| | - Katherine A. Hollywood
- Manchester Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology and School of Chemistry, University of Manchester, Manchester M1 7DN, United Kingdom
| | - Christopher J. Robinson
- Manchester Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology and School of Chemistry, University of Manchester, Manchester M1 7DN, United Kingdom
| | - Nicholas J. W. Rattray
- Strathclyde Institute of Pharmacy and Biomedical Sciences, Strathclyde University, 161 Cathedral Street, Glasgow G4 0RE, United Kingdom
| | - Cunyu Yan
- Manchester Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology and School of Chemistry, University of Manchester, Manchester M1 7DN, United Kingdom
| | - Neil Swainston
- Manchester Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology and School of Chemistry, University of Manchester, Manchester M1 7DN, United Kingdom
| | - Andrew Currin
- Manchester Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology and School of Chemistry, University of Manchester, Manchester M1 7DN, United Kingdom
| | - Rehana Sung
- Manchester Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology and School of Chemistry, University of Manchester, Manchester M1 7DN, United Kingdom
| | - Helen Toogood
- Manchester Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology and School of Chemistry, University of Manchester, Manchester M1 7DN, United Kingdom
| | - Sandra Taylor
- Manchester Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology and School of Chemistry, University of Manchester, Manchester M1 7DN, United Kingdom
| | - Jean-Loup Faulon
- Manchester Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology and School of Chemistry, University of Manchester, Manchester M1 7DN, United Kingdom
- MICALIS, INRA-AgroParisTech, Domaine de Vilvert, 78352 Jouy en Josas Cedex, France
| | - Rainer Breitling
- Manchester Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology and School of Chemistry, University of Manchester, Manchester M1 7DN, United Kingdom
| | - Eriko Takano
- Manchester Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology and School of Chemistry, University of Manchester, Manchester M1 7DN, United Kingdom
| | - Nigel S. Scrutton
- Manchester Synthetic Biology Research Centre for Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology and School of Chemistry, University of Manchester, Manchester M1 7DN, United Kingdom
| |
Collapse
|
15
|
Bharanikumar R, Premkumar KAR, Palaniappan A. PromoterPredict: sequence-based modelling of Escherichia coli σ 70 promoter strength yields logarithmic dependence between promoter strength and sequence. PeerJ 2018; 6:e5862. [PMID: 30425888 PMCID: PMC6228582 DOI: 10.7717/peerj.5862] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Accepted: 10/03/2018] [Indexed: 11/20/2022] Open
Abstract
We present PromoterPredict, a dynamic multiple regression approach to predict the strength of Escherichia coli promoters binding the σ70 factor of RNA polymerase. σ70 promoters are ubiquitously used in recombinant DNA technology, but characterizing their strength is demanding in terms of both time and money. We parsed a comprehensive database of bacterial promoters for the -35 and -10 hexamer regions of σ70-binding promoters and used these sequences to construct the respective position weight matrices (PWM). Next we used a well-characterized set of promoters to train a multivariate linear regression model and learn the mapping between PWM scores of the -35 and -10 hexamers and the promoter strength. We found that the log of the promoter strength is significantly linearly associated with a weighted sum of the -10 and -35 sequence profile scores. We applied our model to 100 sets of 100 randomly generated promoter sequences to generate a sampling distribution of mean strengths of random promoter sequences and obtained a mean of 6E-4 ± 1E-7. Our model was further validated by cross-validation and on independent datasets of characterized promoters. PromoterPredict accepts -10 and -35 hexamer sequences and returns the predicted promoter strength. It is capable of dynamic learning from user-supplied data to refine the model construction and yield more robust estimates of promoter strength. PromoterPredict is available as both a web service (https://promoterpredict.com) and standalone tool (https://github.com/PromoterPredict). Our work presents an intuitive generalization applicable to modelling the strength of other promoter classes.
Collapse
Affiliation(s)
- Ramit Bharanikumar
- Biotechnology, Sri Venkateswara College of Engineering (Autonomous), Sriperumbudur, Tamil Nadu, India
| | - Keshav Aditya R Premkumar
- Computer Science and Engineering, Sri Venkateswara College of Engineering (Autonomous), Sriperumbudur, Tamil Nadu, India
| | - Ashok Palaniappan
- Bioinformatics, School of Chemical and Biotechnology, SASTRA Deemed University, Thanjavur, Tamil Nadu, India
| |
Collapse
|
16
|
Hsu KC, Wang FS. Detection of minimum biomarker features via bi-level optimization framework by nested hybrid differential evolution. J Taiwan Inst Chem Eng 2017. [DOI: 10.1016/j.jtice.2017.10.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
17
|
Decoene T, De Paepe B, Maertens J, Coussement P, Peters G, De Maeseneire SL, De Mey M. Standardization in synthetic biology: an engineering discipline coming of age. Crit Rev Biotechnol 2017; 38:647-656. [PMID: 28954542 DOI: 10.1080/07388551.2017.1380600] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
BACKGROUND Leaping DNA read-and-write technologies, and extensive automation and miniaturization are radically transforming the field of biological experimentation by providing the tools that enable the cost-effective high-throughput required to address the enormous complexity of biological systems. However, standardization of the synthetic biology workflow has not kept abreast with dwindling technical and resource constraints, leading, for example, to the collection of multi-level and multi-omics large data sets that end up disconnected or remain under- or even unexploited. PURPOSE In this contribution, we critically evaluate the various efforts, and the (limited) success thereof, in order to introduce standards for defining, designing, assembling, characterizing, and sharing synthetic biology parts. The causes for this success or the lack thereof, as well as possible solutions to overcome these, are discussed. CONCLUSION Akin to other engineering disciplines, extensive standardization will undoubtedly speed-up and reduce the cost of bioprocess development. In this respect, further implementation of synthetic biology standards will be crucial for the field in order to redeem its promise, i.e. to enable predictable forward engineering.
Collapse
Affiliation(s)
- Thomas Decoene
- a Centre for Synthetic Biology, Ghent University , Ghent , Belgium
| | - Brecht De Paepe
- a Centre for Synthetic Biology, Ghent University , Ghent , Belgium
| | - Jo Maertens
- a Centre for Synthetic Biology, Ghent University , Ghent , Belgium
| | | | - Gert Peters
- a Centre for Synthetic Biology, Ghent University , Ghent , Belgium
| | - Sofie L De Maeseneire
- b InBio.be, Centre for Industrial Biotechnology and Biocatalysis, Ghent University , Ghent , Belgium
| | - Marjan De Mey
- a Centre for Synthetic Biology, Ghent University , Ghent , Belgium
| |
Collapse
|